My ISP called and said my nutch crawler is chewing up 20mbits on a line
he's only supposed to be using 10. Is there an easy way to tinker with
how much bandwidth we're using at once? I know we can change the number
of open threads the crawler has, but it seems to me this won't make a
huge difference. If I chop the number of open threads in half, it'll
just download half the pages, twice as fast? I stand to be corrected on
this.
Any other thoughts? doesn't have to be correct or elegant as long as it
works.
Failing a reasonable solution in nutch, is there some sort of linux
level tool that will easily allow me to throttle how much bandwidth the
crawl is using at once?
Thanks.
- throttling bandwidth Insurance Squared Inc.
-