One other thing which I thing might be the case (I'm not sure though) If you are fetching a segments with 1000 links let say, and 50% is error when you finish the segment. These pages won't be placed in the next segments for fetching, but will instead wait on the next refetch date (default 30 days). This way, you can end up with 1000 pages in the segment but 500 will actually be there to use.
What website are you working on? -----Original Message----- From: Raymond Creel [mailto:[EMAIL PROTECTED] Sent: Tuesday, July 26, 2005 5:58 PM To: [email protected] Subject: RE: fetch bandwidth settings Yes, thanks, you seem to be right. If I use more threads on the same host although the process seems to go faster I get alot more http errors so it ends up being slower (and probably more disruptive to the site.) --- EM <[EMAIL PROTECTED]> wrote: > Go with 1 thread per host. > > For my small area of internet where I fetch my > pages, almost all hosts > starts refusing requests on 3+ threads, some of > them, even at 1+. > > Bandwidth-wise, if you go with higher value on > fetcher.threads.per.host, > your fetcher will have hard time connecting and the > target server will save > on bandwidth in fact ;) > > > -----Original Message----- > From: Raymond Creel [mailto:[EMAIL PROTECTED] > Sent: Monday, July 25, 2005 4:00 PM > To: [email protected] > Subject: fetch bandwidth settings > > I have read that you don't want to make more than 1 > or > 2 requests per second to the same host, or else you > will start adversely affecting their bandwidth. Is > this a good rule of thumb? > > Along those lines, what would be the best values to > put in the nutch config file to maximize speed of > fetching without hammering the site? I'm thinking > something like this: > > fetcher.server.delay: 1.0 > fetcher.threads.per.host: 2 > > thanks, > raymond > > > > ____________________________________________________ > Start your day with Yahoo! - make it your home page > http://www.yahoo.com/r/hs > > > > __________________________________ Yahoo! Mail Stay connected, organized, and protected. Take the tour: http://tour.mail.yahoo.com/mailtour.html ------------------------------------------------------- SF.Net email is sponsored by: Discover Easy Linux Migration Strategies from IBM. Find simple to follow Roadmaps, straightforward articles, informative Webcasts and more! Get everything you need to get up to speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
