Hi guys,
I've separated both functionalities into separate patches on JIRA (NUTCH-769
/ NUTCH-770).
Julien
--
DigitalPebble Ltd
http://www.digitalpebble.com
2009/11/21 Julien Nioche lists.digitalpeb...@gmail.com
Hi Eran,
There is currently no time limit implemented in the Fetcher. We
Does bin/nutch merge only create a whole new index out of several smaller
indexes, or can it be used to incrementally update a single large index with
newly fetched and indexed smaller segments?
Jesse
int GetRandomNumber()
{
return 4; // Chosen by fair roll of dice
//
Hi, guys,
my goal is to do by crawls at 100 fetches per second, observing, of course,
polite crawling. But, when URLs are all different domains, what
theoretically would stop some software from downloading from 100 domains at
once, achieving the desired speed?
But, whatever I do, I can't make
Thanks Julien,
I can confirm this patch works perfectly and does a good job of keeping a
good crawl rate.
We have doubled the rate of information retrieval by using a time limit on
the fetch queue.
Thanks,
Eran
On Mon, Nov 23, 2009 at 1:28 PM, Julien Nioche
lists.digitalpeb...@gmail.com