Hi, Yeah with 2.x head, generating most certainly takes a good deal longer on a 2 core machine (with Hadoop 1.0.1) in pseudo distrib over 1 core in local. I don't have concrete stats however but these are just my manual observations. This is also noted regardless of the size of the list to be generated e.g. I still notice a significant increase in CPU regardless of whether I'm generating fetchlists from a small list of injected urls (10 for example) or whether I am generating large(er) lists from iterative crawl cycles (several hundred/thousand).
Do you have any idea suggestion about mitigating against this Markus in an attempt to drive efficiency during the generate phase? Thanks Lewis On Tue, Oct 2, 2012 at 8:30 AM, Markus Jelsma <[email protected]> wrote: > Hi - i don't know 2.0 but Hadoop's Mapred is likely just taking advantage of > multiple CPU cores. > > -----Original message----- >> From:[email protected] <[email protected]> >> Sent: Tue 02-Oct-2012 04:15 >> To: [email protected] >> Subject: nutch-2.0 generate in deploy mode >> >> Hello, >> >> I use nutch-2.0 with hadoop-0.20.2. bin/nutch generate command takes 87% of >> cpu in deploy mode versus 18% in local mode. >> Any ideas how to fix this issue? >> >> Thanks. >> Alex. >> -- Lewis

