Hello.

I am running ManifoldCF 0.6 as a web crawler and indexing into Solr4.

When I run against a local website running on my local machine, things work
well.

However, when I am crawling a different site, a remote one, I get the
warning below and nothing get indexed.

- Any idea about what my be causing this?

- I thought that this may be because of my slow network connection:
Is there a way I could change the default timeout/readTimeout for HTTP
connection in manifoldCF?


Thanks.

Arcadius.


----
 WARN 2012-07-21 19:04:55,602 (Worker thread '20') - Socket timeout
exception reading socket stream: Read timed out
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at org.apache.commons.httpclient.ContentLengthInputStream.read(Unknown
Source)
at java.io.FilterInputStream.read(Unknown Source)
at org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown Source)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.basicRead(ThrottledFetcher.java:2010)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.read(ThrottledFetcher.java:1974)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache.addData(DataCache.java:95)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.getDocumentVersions(WebcrawlerConnector.java:745)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:318)
 WARN 2012-07-21 19:05:10,867 (Worker thread '24') - Socket timeout
exception reading socket stream: Read timed out
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream.socketRead0(Native Method)
at java.net.SocketInputStream.read(Unknown Source)
at java.net.SocketInputStream.read(Unknown Source)
at java.io.BufferedInputStream.read1(Unknown Source)
at java.io.BufferedInputStream.read(Unknown Source)
at org.apache.commons.httpclient.ChunkedInputStream.read(Unknown Source)
at java.io.FilterInputStream.read(Unknown Source)
at org.apache.commons.httpclient.AutoCloseInputStream.read(Unknown Source)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.basicRead(ThrottledFetcher.java:2010)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.ThrottledFetcher$ThrottledInputstream.read(ThrottledFetcher.java:1974)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.DataCache.addData(DataCache.java:95)
at
org.apache.manifoldcf.crawler.connectors.webcrawler.WebcrawlerConnector.getDocumentVersions(WebcrawlerConnector.java:745)
at
org.apache.manifoldcf.crawler.system.WorkerThread.run(WorkerThread.java:318)
 WARN 2012-07-21 19:09:55,612 (Worker thread '20') - Pre-ingest service
interruption reported for job 1342882564711 connection
'MyRemoteWebConnector': Socket timeout: Read timed out
 WARN 2012-07-21 19:10:10,876 (Worker thread '24') - Pre-ingest service
interruption reported for job 1342882564711 connection
'MyRemoteWebConnector': Socket timeout: Read timed out

----

Reply via email to