Re: Nutch - Focused crawling

2009-11-23 Thread Julien Nioche
Hi guys, I've separated both functionalities into separate patches on JIRA (NUTCH-769 / NUTCH-770). Julien -- DigitalPebble Ltd http://www.digitalpebble.com 2009/11/21 Julien Nioche lists.digitalpeb...@gmail.com Hi Eran, There is currently no time limit implemented in the Fetcher. We

can you incrementally build an index?

2009-11-23 Thread Jesse Hires
Does bin/nutch merge only create a whole new index out of several smaller indexes, or can it be used to incrementally update a single large index with newly fetched and indexed smaller segments? Jesse int GetRandomNumber() { return 4; // Chosen by fair roll of dice //

100 fetches per second?

2009-11-23 Thread Mark Kerzner
Hi, guys, my goal is to do by crawls at 100 fetches per second, observing, of course, polite crawling. But, when URLs are all different domains, what theoretically would stop some software from downloading from 100 domains at once, achieving the desired speed? But, whatever I do, I can't make

Re: Nutch - Focused crawling

2009-11-23 Thread Eran Zinman
Thanks Julien, I can confirm this patch works perfectly and does a good job of keeping a good crawl rate. We have doubled the rate of information retrieval by using a time limit on the fetch queue. Thanks, Eran On Mon, Nov 23, 2009 at 1:28 PM, Julien Nioche lists.digitalpeb...@gmail.com