Sounds like a network problem (to slow?) any proxy / firewall in use?
Can you manually check if you can reach connect this urls from this box.
Also try to increase http.timeout in nutch-default/site.xml

I'm not sure but I think these kind of failed urls are also tried to refech another time (db.fetch.retry.max)

HTH
Stefan

Am 21.12.2005 um 09:00 schrieb Nguyen Ngoc Giang:

  Hi folks,

When I try crawling, there are many Read Timeout error. It seems that this error is not caught as properly as http.max.delays. I would like to catch this error in the same manner with http.max.delays, that is to retry the page with this error. Can anyone suggest a way? Any can anyone explain for
me why does this error happen?

  PS: I'm running Nutch on Windows XP SP2.

java.net.SocketTimeoutException: Read timed out
        at java.net.SocketInputStream.socketRead0(Native Method)
        at java.net.SocketInputStream.read(SocketInputStream.java:129)
at java.io.BufferedInputStream.fill (BufferedInputStream.java:218) at java.io.BufferedInputStream.read (BufferedInputStream.java:235)
        at org.apache.commons.httpclient.HttpParser.readRawLine(
HttpParser.java:
77)
at org.apache.commons.httpclient.HttpParser.readLine (HttpParser.java
:105
)
        at org.apache.commons.httpclient.HttpConnection.readLine
(HttpConnection.
java:1110)
        at
org.apache.commons.httpclient.MultiThreadedHttpConnectionManager$Http
ConnectionAdapter.readLine(MultiThreadedHttpConnectionManager.java: 1391)
        at org.apache.commons.httpclient.HttpMethodBase.readStatusLine
(HttpMetho
dBase.java:1824)
        at org.apache.commons.httpclient.HttpMethodBase.readResponse
(HttpMethodB
ase.java:1584)
        at org.apache.commons.httpclient.HttpMethodBase.execute(
HttpMethodBase.j
ava:995)
at org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry
(Htt
pMethodDirector.java:393)
at org.apache.commons.httpclient.HttpMethodDirector.executeMethod
(HttpMe
thodDirector.java:168)
        at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.jav
a:393)
        at org.apache.commons.httpclient.HttpClient.executeMethod(
HttpClient.jav
a:324)
        at org.apache.nutch.protocol.httpclient.HttpResponse
.<init>(HttpResponse
.java:102)
at org.apache.nutch.protocol.httpclient.Http.getProtocolOutput(
Http.java
:204)
at org.apache.nutch.fetcher.Fetcher$FetcherThread.run (Fetcher.java
:151)



-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://ads.osdn.com/?ad_id=7637&alloc_id=16865&op=click
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to