I'm not aware of any way to do this within Nutch (yet). I could be wrong, 'tho.

If you have the time and inclination to set up a Linux-based router, you could point your crawlers through it and use iproute2 to shape outbound traffic from that box.

http://lartc.org/howto/ is a pretty definitive writeup on this sort of stuff. Look at the sample config in section 9.2.2.2.

--Matt

On Jan 16, 2006, at 6:02 PM, Insurance Squared Inc. wrote:

My ISP called and said my nutch crawler is chewing up 20mbits on a line he's only supposed to be using 10. Is there an easy way to tinker with how much bandwidth we're using at once? I know we can change the number of open threads the crawler has, but it seems to me this won't make a huge difference. If I chop the number of open threads in half, it'll just download half the pages, twice as fast? I stand to be corrected on this.

Any other thoughts? doesn't have to be correct or elegant as long as it works. Failing a reasonable solution in nutch, is there some sort of linux level tool that will easily allow me to throttle how much bandwidth the crawl is using at once?

Thanks.

--
Matt Kangas / [EMAIL PROTECTED]




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to