Justin,

Thanks for the update.  I will update the script and the wiki to be able 
to run this from a clean, no previous fetches run.  Currently it did 
assume that there were at least some previous fetches, crawldb, and 
segments to go with it.

As to your error, I think it is looking for the logging.conf file.  Is 
that file in the same directory as the JobStream.py script?  In the top 
of the logging file there is a section called formatters like this:

[formatters]
keys=simple


Dennis Kubes

Justin Hartman wrote:
> Hi Dennis
> 
> This is a great contribution and I personally thank you for making it
> available to the community.
> 
> I am having a little difficulty getting it to work and possibly you
> can provide some assistance in what I'm doing wrong?
> 
> A little background first:-
> I'm running the python script in the following location:
> /hdd2/jobstream/JobStream.py
> My master directory is: /hdd2/nutch/master
> My backup directory is: /hdd2/nutch/backup
> 
> My config in JobStream.py is as follows:-
> 
> Line 55 to 60 configured as:
> class JobStream:
>  nutchdir = "/home/nutch/nutch"
>  masterdir = "/hdd2/nutch/master"
>  backupdir = "/hdd2/nutch/backup"
>  log = logging.getLogger("jobstream")
> 
> Line 377 onwards configured as:
> def main(argv):
>  # set the default values
>  resume = 0
>  execute = 0
>  checkfile = "jobstream.stop"
>  logconf = "logging.conf"
>  jobdir = "/hdd2/jobstream"
>  nutchdir = "/home/nutch/nutch"
>  masterdir = "/hdd2/nutch/master"
>  backupdir = "/hdd2/nutch/backup"
>  dfsdumpdir = "/hdd2/nutch/dump"
>  tempdir = "/hdd2/nutch/temp"
>  splitsize = 500000
>  fetchmerge = 3
> 
> All the above paths are correct and have been created and the master
> and backup directories contain zero data and have been created for
> usage of the python script.
> 
> When executing JobStream.py -e for the first time I got an error
> telling me it could not find various directories within the master
> directory so I injected the URLs into the /hdd2/nutch/master
> directory.
> 
> This solved my initial error however now I have this error (below) and
> not sure what to do about it:
> 
> /usr/bin/python2.4 /hdd2/jobstream/JobStream.py -e
> Traceback (most recent call last):
>  File "/hdd2/jobstream/JobStream.py", line 465, in ?
>    main(sys.argv[1:])
>  File "/hdd2/jobstream/JobStream.py", line 444, in main
>    logging.config.fileConfig(logconf)
>  File "logging/config.py", line 76, in fileConfig
>  File "/usr/lib/python2.4/ConfigParser.py", line 511, in get
>    raise NoSectionError(section)
> ConfigParser.NoSectionError: No section: 'formatters'
> 
> Do you have any ideas?
> 
> Regards
> Justin
> 
> On 1/29/07, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>> It is up on the wiki at the following location.
>>
>> http://wiki.apache.org/nutch/Automating_Fetches_with_Python
>>
>> It has also been added to the front page.
>>
>> Dennis Kubes
>>
>> Andrzej Bialecki wrote:
>> > Dennis Kubes wrote:
>> >> We have a python script with logging which fully automates the
>> >> fetching and updating process, not the invert links or the indexing
>> >> process.  If anybody wants a copy, send me an email and I will send
>> >> you a copy.
>> >>
>> >> We are currently working on a more in-depth framework for automating
>> >> these types of job streams in python but that is not complete yet.
>> >>
>> >> Andrzej, do you think this is something we should post to the wiki?
>> >
>> > Sure, if it's ok for you to release it I'm sure many people would find
>> > it useful.
>> >
>>
> 
> 

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to