Justin,
Thanks for the update. I will update the script and the wiki to be able
to run this from a clean, no previous fetches run. Currently it did
assume that there were at least some previous fetches, crawldb, and
segments to go with it.
As to your error, I think it is looking for the logging.conf file. Is
that file in the same directory as the JobStream.py script? In the top
of the logging file there is a section called formatters like this:
[formatters]
keys=simple
Dennis Kubes
Justin Hartman wrote:
> Hi Dennis
>
> This is a great contribution and I personally thank you for making it
> available to the community.
>
> I am having a little difficulty getting it to work and possibly you
> can provide some assistance in what I'm doing wrong?
>
> A little background first:-
> I'm running the python script in the following location:
> /hdd2/jobstream/JobStream.py
> My master directory is: /hdd2/nutch/master
> My backup directory is: /hdd2/nutch/backup
>
> My config in JobStream.py is as follows:-
>
> Line 55 to 60 configured as:
> class JobStream:
> nutchdir = "/home/nutch/nutch"
> masterdir = "/hdd2/nutch/master"
> backupdir = "/hdd2/nutch/backup"
> log = logging.getLogger("jobstream")
>
> Line 377 onwards configured as:
> def main(argv):
> # set the default values
> resume = 0
> execute = 0
> checkfile = "jobstream.stop"
> logconf = "logging.conf"
> jobdir = "/hdd2/jobstream"
> nutchdir = "/home/nutch/nutch"
> masterdir = "/hdd2/nutch/master"
> backupdir = "/hdd2/nutch/backup"
> dfsdumpdir = "/hdd2/nutch/dump"
> tempdir = "/hdd2/nutch/temp"
> splitsize = 500000
> fetchmerge = 3
>
> All the above paths are correct and have been created and the master
> and backup directories contain zero data and have been created for
> usage of the python script.
>
> When executing JobStream.py -e for the first time I got an error
> telling me it could not find various directories within the master
> directory so I injected the URLs into the /hdd2/nutch/master
> directory.
>
> This solved my initial error however now I have this error (below) and
> not sure what to do about it:
>
> /usr/bin/python2.4 /hdd2/jobstream/JobStream.py -e
> Traceback (most recent call last):
> File "/hdd2/jobstream/JobStream.py", line 465, in ?
> main(sys.argv[1:])
> File "/hdd2/jobstream/JobStream.py", line 444, in main
> logging.config.fileConfig(logconf)
> File "logging/config.py", line 76, in fileConfig
> File "/usr/lib/python2.4/ConfigParser.py", line 511, in get
> raise NoSectionError(section)
> ConfigParser.NoSectionError: No section: 'formatters'
>
> Do you have any ideas?
>
> Regards
> Justin
>
> On 1/29/07, Dennis Kubes <[EMAIL PROTECTED]> wrote:
>> It is up on the wiki at the following location.
>>
>> http://wiki.apache.org/nutch/Automating_Fetches_with_Python
>>
>> It has also been added to the front page.
>>
>> Dennis Kubes
>>
>> Andrzej Bialecki wrote:
>> > Dennis Kubes wrote:
>> >> We have a python script with logging which fully automates the
>> >> fetching and updating process, not the invert links or the indexing
>> >> process. If anybody wants a copy, send me an email and I will send
>> >> you a copy.
>> >>
>> >> We are currently working on a more in-depth framework for automating
>> >> these types of job streams in python but that is not complete yet.
>> >>
>> >> Andrzej, do you think this is something we should post to the wiki?
>> >
>> > Sure, if it's ok for you to release it I'm sure many people would find
>> > it useful.
>> >
>>
>
>
-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general