Hi,
I had a similair problem and installed a squid-proxy-server. The squid
has the ability to limit the bandwidth and the integration in nutch was
pretty simple (just to enter a proxy). Further more there is an other
place to block the crawling of special websites.
If needed, I can assist you with the squid configuration.
Regards
Michael
Insurance Squared Inc. wrote:
My ISP called and said my nutch crawler is chewing up 20mbits on a line
he's only supposed to be using 10. Is there an easy way to tinker with
how much bandwidth we're using at once? I know we can change the number
of open threads the crawler has, but it seems to me this won't make a
huge difference. If I chop the number of open threads in half, it'll
just download half the pages, twice as fast? I stand to be corrected on
this.
Any other thoughts? doesn't have to be correct or elegant as long as it
works.
Failing a reasonable solution in nutch, is there some sort of linux
level tool that will easily allow me to throttle how much bandwidth the
crawl is using at once?
Thanks.
--
Michael Nebel
http://www.nebel.de/
http://www.netluchs.de/
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general