Hi,

Yeah with 2.x head, generating most certainly takes a good deal longer
on a 2 core machine (with Hadoop 1.0.1) in pseudo distrib over 1 core
in local. I don't have concrete stats however but these are just my
manual observations. This is also noted regardless of the size of the
list to be generated e.g. I still notice a significant increase in CPU
regardless of whether I'm generating fetchlists from a small list of
injected urls (10 for example) or whether I am generating large(er)
lists from iterative crawl cycles (several hundred/thousand).

Do you have any idea suggestion about mitigating against this Markus
in an attempt to drive efficiency during the generate phase?

Thanks

Lewis

On Tue, Oct 2, 2012 at 8:30 AM, Markus Jelsma
<[email protected]> wrote:
> Hi - i don't know 2.0 but Hadoop's Mapred is likely just taking advantage of 
> multiple CPU cores.
>
> -----Original message-----
>> From:[email protected] <[email protected]>
>> Sent: Tue 02-Oct-2012 04:15
>> To: [email protected]
>> Subject: nutch-2.0  generate in  deploy mode
>>
>> Hello,
>>
>> I use nutch-2.0 with hadoop-0.20.2. bin/nutch generate  command takes 87% of 
>> cpu  in deploy mode versus 18% in local mode.
>> Any ideas how to fix this issue?
>>
>> Thanks.
>> Alex.
>>



-- 
Lewis

Reply via email to