Hi Dennis
This is a great contribution and I personally thank you for making it
available to the community.
I am having a little difficulty getting it to work and possibly you
can provide some assistance in what I'm doing wrong?
A little background first:-
I'm running the python script in the following location:
/hdd2/jobstream/JobStream.py
My master directory is: /hdd2/nutch/master
My backup directory is: /hdd2/nutch/backup
My config in JobStream.py is as follows:-
Line 55 to 60 configured as:
class JobStream:
nutchdir = "/home/nutch/nutch"
masterdir = "/hdd2/nutch/master"
backupdir = "/hdd2/nutch/backup"
log = logging.getLogger("jobstream")
Line 377 onwards configured as:
def main(argv):
# set the default values
resume = 0
execute = 0
checkfile = "jobstream.stop"
logconf = "logging.conf"
jobdir = "/hdd2/jobstream"
nutchdir = "/home/nutch/nutch"
masterdir = "/hdd2/nutch/master"
backupdir = "/hdd2/nutch/backup"
dfsdumpdir = "/hdd2/nutch/dump"
tempdir = "/hdd2/nutch/temp"
splitsize = 500000
fetchmerge = 3
All the above paths are correct and have been created and the master
and backup directories contain zero data and have been created for
usage of the python script.
When executing JobStream.py -e for the first time I got an error
telling me it could not find various directories within the master
directory so I injected the URLs into the /hdd2/nutch/master
directory.
This solved my initial error however now I have this error (below) and
not sure what to do about it:
/usr/bin/python2.4 /hdd2/jobstream/JobStream.py -e
Traceback (most recent call last):
File "/hdd2/jobstream/JobStream.py", line 465, in ?
main(sys.argv[1:])
File "/hdd2/jobstream/JobStream.py", line 444, in main
logging.config.fileConfig(logconf)
File "logging/config.py", line 76, in fileConfig
File "/usr/lib/python2.4/ConfigParser.py", line 511, in get
raise NoSectionError(section)
ConfigParser.NoSectionError: No section: 'formatters'
Do you have any ideas?
Regards
Justin
On 1/29/07, Dennis Kubes <[EMAIL PROTECTED]> wrote:
> It is up on the wiki at the following location.
>
> http://wiki.apache.org/nutch/Automating_Fetches_with_Python
>
> It has also been added to the front page.
>
> Dennis Kubes
>
> Andrzej Bialecki wrote:
> > Dennis Kubes wrote:
> >> We have a python script with logging which fully automates the
> >> fetching and updating process, not the invert links or the indexing
> >> process. If anybody wants a copy, send me an email and I will send
> >> you a copy.
> >>
> >> We are currently working on a more in-depth framework for automating
> >> these types of job streams in python but that is not complete yet.
> >>
> >> Andrzej, do you think this is something we should post to the wiki?
> >
> > Sure, if it's ok for you to release it I'm sure many people would find
> > it useful.
> >
>
--
Regards
Justin Hartman
PGP Key ID: 102CC123
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general