Hi Roland,

You say you start a fetch run, does this mean the FetcherJob or
GeneratorJob? What kind of settings do you run your zNutch server with?

On Wednesday, February 20, 2013, Roland <[email protected]> wrote:
> Hi list,
>
> we're experimenting with nutch 2.1 and cassandra 1.2.1 (on ? hosts).
> Our cassandra 'webpage' store has about 31GB right now on disk, we add
URLs by 'injecting' them, about 100k-300k per cycle.
> When starting a 'fetch' run, it now needs about an hour before the queues
are set up / the first page is fetched.
> During this time we can see about 180MBit/s network traffic from the
cassandra host to the nutch host (outgoing of cassandra).
> If I calculate the transferred data during this time (taking only
150Mbit/s into account):
> 150MBit/s*1000*1000/8/1024/1024/1024*3600sec ~= 62GB
>
> So, why does nutch load all data from the db, and not only the relevant
data of this fetch? And why does it happen twice?
>
> Thanks,
> Roland
>

-- 
*Lewis*

Reply via email to