One other thing which I thing might be the case (I'm not sure though)

If you are fetching a segments with 1000 links let say, and 50% is error
when you finish the segment. These pages won't be placed in the next
segments for fetching, but will instead wait on the next refetch date
(default 30 days). This way, you can end up with 1000 pages in the segment
but 500 will actually be there to use.

What website are you working on?

-----Original Message-----
From: Raymond Creel [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, July 26, 2005 5:58 PM
To: [email protected]
Subject: RE: fetch bandwidth settings

Yes, thanks, you seem to be right.  If I use more
threads on the same host although the process seems to
go faster I get alot more http errors so it ends up
being slower (and probably more disruptive to the
site.)

--- EM <[EMAIL PROTECTED]> wrote:

> Go with 1 thread per host. 
> 
> For my small area of internet where I fetch my
> pages, almost all hosts
> starts refusing requests on 3+ threads, some of
> them, even at 1+. 
> 
> Bandwidth-wise, if you go with higher value on
> fetcher.threads.per.host,
> your fetcher will have hard time connecting and the
> target server will save
> on bandwidth in fact ;)
> 
> 
> -----Original Message-----
> From: Raymond Creel [mailto:[EMAIL PROTECTED] 
> Sent: Monday, July 25, 2005 4:00 PM
> To: [email protected]
> Subject: fetch bandwidth settings
> 
> I have read that you don't want to make more than 1
> or
> 2 requests per second to the same host, or else you
> will start adversely affecting their bandwidth.  Is
> this a good rule of thumb?  
> 
> Along those lines, what would be the best values to
> put in the nutch config file to maximize speed of
> fetching without hammering the site?  I'm thinking
> something like this:
> 
> fetcher.server.delay: 1.0
> fetcher.threads.per.host: 2
> 
> thanks,
> raymond
> 
> 
>               
> ____________________________________________________
> Start your day with Yahoo! - make it your home page 
> http://www.yahoo.com/r/hs 
>  
> 
> 
> 



                
__________________________________ 
Yahoo! Mail 
Stay connected, organized, and protected. Take the tour: 
http://tour.mail.yahoo.com/mailtour.html 





-------------------------------------------------------
SF.Net email is sponsored by: Discover Easy Linux Migration Strategies
from IBM. Find simple to follow Roadmaps, straightforward articles,
informative Webcasts and more! Get everything you need to get up to
speed, fast. http://ads.osdn.com/?ad_id=7477&alloc_id=16492&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to