Sounds like a network problem (to slow?) any proxy / firewall in use?
Can you manually check if you can reach connect this urls from this box.
Also try to increase http.timeout in nutch-default/site.xml
I'm not sure but I think these kind of failed urls are also tried to
refech another time (db.fetch.retry.max)
HTH
Stefan
Am 21.12.2005 um 09:00 schrieb Nguyen Ngoc Giang:
Hi folks,
When I try crawling, there are many Read Timeout error. It seems
that this
error is not caught as properly as http.max.delays. I would like to
catch
this error in the same manner with http.max.delays, that is to
retry the
page with this error. Can anyone suggest a way? Any can anyone
explain for
me why does this error happen?
PS: I'm running Nutch on Windows XP SP2.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill
(BufferedInputStream.java:218)
at java.io.BufferedInputStream.read
(BufferedInputStream.java:235)
at org.apache.commons.httpclient.HttpParser.readRawLine(
HttpParser.java:
77)
at org.apache.commons.httpclient.HttpParser.readLine
(HttpParser.java
:105
)
at org.apache.commons.httpclient.HttpConnection.readLine
(HttpConnection.
java:1110)
at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Http
ConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java:
1391)
at org.apache.commons.httpclient.HttpMethodBase.readStatusLine
(HttpMetho
dBase.java:1824)
at org.apache.commons.httpclient.HttpMethodBase.readResponse
(HttpMethodB
ase.java:1584)
at org.apache.commons.httpclient.HttpMethodBase.execute(
HttpMethodBase.j
ava:995)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry
(Htt
pMethodDirector.java:393)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod
(HttpMe
thodDirector.java:168)
at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.jav
a:393)
at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.jav
a:324)
at org.apache.nutch.protocol.httpclient.HttpResponse
.<init>(HttpResponse
.java:102)
at
org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(
Http.java
:204)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run
(Fetcher.java
:151)
-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems? Stop! Download the new AJAX search engine that makes
searching your log files as easy as surfing the web. DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general