Hi folks,
When I try crawling, there are many Read Timeout error. It seems that this
error is not caught as properly as http.max.delays. I would like to catch
this error in the same manner with http.max.delays, that is to retry the
page with this error. Can anyone suggest a way? Any can anyone explain for
me why does this error happen?
PS: I'm running Nutch on Windows XP SP2.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read(BufferedInputStream.java:235)
at org.apache.commons.httpclient.HttpParser.readRawLine(
HttpParser.java:
77)
at org.apache.commons.httpclient.HttpParser.readLine(HttpParser.java
:105
)
at org.apache.commons.httpclient.HttpConnection.readLine
(HttpConnection.
java:1110)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Http
ConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:1391)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine
(HttpMetho
dBase.java:1824)
at org.apache.commons.httpclient.HttpMethodBase.readResponse
(HttpMethodB
ase.java:1584)
at org.apache.commons.httpclient.HttpMethodBase.execute(
HttpMethodBase.j
ava:995)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry
(Htt
pMethodDirector.java:393)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod
(HttpMe
thodDirector.java:168)
at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.jav
a:393)
at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.jav
a:324)
at org.apache.nutch.protocol.httpclient.HttpResponse
.<init>(HttpResponse
.java:102)
at org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(
Http.java
:204)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java
:151)