Hi,

This is because fetch's mapper goes over all records and selects those that has 
the given batchId. Currently mappers of all nutch commands does not have 
filters.
It is interesting to know if you can selects records with a given batchId in 
cassandra without iterating over all records.


Alex.

 

 

 

-----Original Message-----
From: Roland <[email protected]>
To: user <[email protected]>
Sent: Wed, Feb 20, 2013 10:56 am
Subject: Re: nutch with cassandra internal network usage


Hi Lewis,

the GeneratorJob takes only ~5 minutes.
I'm running it in standalone mode, like this:
./bin/nutch fetch 1361367698-1708119958 -threads 40

It's configured to fetch & parse, but it makes no difference if it only 
fetches:
FetcherJob: starting
FetcherJob: batchId: 1361367698-1708119958
FetcherJob: threads: 40
FetcherJob: parsing: true
FetcherJob: resuming: false
FetcherJob : timelimit set for : -1

--Roland


Am 20.02.2013 19:44, schrieb Lewis John Mcgibbney:
> Hi Roland,
>
> You say you start a fetch run, does this mean the FetcherJob or
> GeneratorJob? What kind of settings do you run your zNutch server with?

 

Reply via email to