I'm not aware of any way to do this within Nutch (yet). I could be
wrong, 'tho.
If you have the time and inclination to set up a Linux-based router,
you could point your crawlers through it and use iproute2 to shape
outbound traffic from that box.
http://lartc.org/howto/ is a pretty definitive writeup on this sort
of stuff. Look at the sample config in section 9.2.2.2.
--Matt
On Jan 16, 2006, at 6:02 PM, Insurance Squared Inc. wrote:
My ISP called and said my nutch crawler is chewing up 20mbits on a
line he's only supposed to be using 10. Is there an easy way to
tinker with how much bandwidth we're using at once? I know we can
change the number of open threads the crawler has, but it seems to
me this won't make a huge difference. If I chop the number of open
threads in half, it'll just download half the pages, twice as
fast? I stand to be corrected on this.
Any other thoughts? doesn't have to be correct or elegant as long
as it works.
Failing a reasonable solution in nutch, is there some sort of linux
level tool that will easily allow me to throttle how much bandwidth
the crawl is using at once?
Thanks.
--
Matt Kangas / [EMAIL PROTECTED]