Re: Error at end of MapReduce run with indexing

Florent Gluck Wed, 18 Jan 2006 07:48:17 -0800

Hi Ken,

>>  > 4. Any idea whether 4 hours is a reasonable amount of time for this
>>
>>>  test? It seemed long to me, given that I was starting with a single
>>
>>  > URL as the seed.
>>  >
>> How many crawl passes did you do ?
>
>
> Three deep, as in: bin/nutch crawl seeds -depth 3
>
> This was the same as Doug described in his post here:
>
> http://mail-archives.apache.org/mod_mbox/lucene-nutch-user/200509.mbox/[EMAIL 
> PROTECTED]
>


I assume the time it takes depends on your hardware, bandwidth, how many
urls are being fetched and your mapreduce settings.
4 hours seems a bit long when starting from 1 url though.
Are you using 2 or 3 slave machines?
What values are you using for "fetcher.threads.fetch",
"mapred.map.tasks" and "mapred.reduce.tasks"?
When doing a "nutch readdb crawldb -stats", how many DB_unfetched and
DB_fetched do you have?

--Flo

Re: Error at end of MapReduce run with indexing

Reply via email to