Hi, This is because fetch's mapper goes over all records and selects those that has the given batchId. Currently mappers of all nutch commands does not have filters. It is interesting to know if you can selects records with a given batchId in cassandra without iterating over all records.
Alex. -----Original Message----- From: Roland <[email protected]> To: user <[email protected]> Sent: Wed, Feb 20, 2013 10:56 am Subject: Re: nutch with cassandra internal network usage Hi Lewis, the GeneratorJob takes only ~5 minutes. I'm running it in standalone mode, like this: ./bin/nutch fetch 1361367698-1708119958 -threads 40 It's configured to fetch & parse, but it makes no difference if it only fetches: FetcherJob: starting FetcherJob: batchId: 1361367698-1708119958 FetcherJob: threads: 40 FetcherJob: parsing: true FetcherJob: resuming: false FetcherJob : timelimit set for : -1 --Roland Am 20.02.2013 19:44, schrieb Lewis John Mcgibbney: > Hi Roland, > > You say you start a fetch run, does this mean the FetcherJob or > GeneratorJob? What kind of settings do you run your zNutch server with?

