I'm not aware of any way to do this within Nutch (yet). I could be wrong, 'tho.

If you have the time and inclination to set up a Linux-based router, you could point your crawlers through it and use iproute2 to shape outbound traffic from that box.

http://lartc.org/howto/ is a pretty definitive writeup on this sort of stuff. Look at the sample config in section 9.2.2.2.

--Matt

On Jan 16, 2006, at 6:02 PM, Insurance Squared Inc. wrote:

My ISP called and said my nutch crawler is chewing up 20mbits on a line he's only supposed to be using 10. Is there an easy way to tinker with how much bandwidth we're using at once? I know we can change the number of open threads the crawler has, but it seems to me this won't make a huge difference. If I chop the number of open threads in half, it'll just download half the pages, twice as fast? I stand to be corrected on this.

Any other thoughts? doesn't have to be correct or elegant as long as it works. Failing a reasonable solution in nutch, is there some sort of linux level tool that will easily allow me to throttle how much bandwidth the crawl is using at once?

Thanks.

--
Matt Kangas / [EMAIL PROTECTED]


Reply via email to