What is  java.net.SocketTimeoutException?

Can not connect to the server.

In general you hammer your webserver and it may block the ip of your server. You can setup how many threads per host are loading from one host server. For a intranet crawl it is a good idea to have less less thread (may just as much you plan to use at the same time for the host) e.g. fetcherThreads = 2 maxThreadsPerHost = 2 If you have more threads you should increase the retry / delay configuration since in case a host is busy with the maximal threads per host the thread is delayed. If a thread is delayed to often than you get a Exceeded http.max.delays: retry later....

Sometimes I'm asking myself if not a queue based fetching would be better the actually implementation, however this is difficult to change.
HTH
Stefan 

Reply via email to