Hi folks,

We have a nutch 1.6 as a crawler works on hadoop cluster. It has been
working for quite long time w/o any problem. However, yesterday I figured
out that total # of open ports were nearly 65K, none of the connections
were closed. So the crawler and our other applications were stalled. And
all the connections were established to 8 unique domain, all owned by
wordpress. I couldn't put wordpress to blacklist that we crawl many docs
from there.. What do you think I should suppose to do?

-- 
TO

Reply via email to