[Nutch-general] Re: Error at end of MapReduce run with indexing

Florent Gluck Wed, 18 Jan 2006 08:12:57 -0800

Hi Ken,

>>  > 4. Any idea whether 4 hours is a reasonable amount of time for this
>>
>>>  test? It seemed long to me, given that I was starting with a single
>>
>>  > URL as the seed.
>>  >
>> How many crawl passes did you do ?
>
>
> Three deep, as in: bin/nutch crawl seeds -depth 3
>
> This was the same as Doug described in his post here:
>
> http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200509.mbox/[EMAIL 
> PROTECTED]
>


I assume the time it takes depends on your hardware, bandwidth, how many
urls are being fetched and your mapreduce settings.
4 hours seems a bit long when starting from 1 url though.
Are you using 2 or 3 slave machines?
What values are you using for "fetcher.threads.fetch",
"mapred.map.tasks" and "mapred.reduce.tasks"?
When doing a "nutch readdb crawldb -stats", how many DB_unfetched and
DB_fetched do you have?

--Flo


-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Error at end of MapReduce run with indexing

Reply via email to