I am running nutch 2.2.1 (not distributed) with hbase and solr 3.5.  When I
run 10-15 crawls in a row, all on one machine, it seems one crawl will
randomly fail at index time with the following log messages, but the
subsequent crawls will work and index fine:

013-08-13 12:58:57,430 INFO  collection.CollectionManager - file has23
elements
2013-08-13 12:58:58,914 INFO  solr.SolrWriter - Adding 158 documents
2013-08-13 12:58:59,034 INFO  httpclient.HttpMethodDirector - I/O exception
(java.net.SocketException) caught when processing request: Connection reset
2013-08-13 12:58:59,035 INFO  httpclient.HttpMethodDirector - Retrying
request
2013-08-13 12:58:59,037 INFO  solr.SolrWriter - Adding 158 documents
2013-08-13 12:58:59,076 INFO  httpclient.HttpMethodDirector - I/O exception
(java.net.SocketException) caught when processing request: Connection reset
2013-08-13 12:58:59,076 INFO  httpclient.HttpMethodDirector - Retrying
request
2013-08-13 12:58:59,077 WARN  mapred.FileOutputCommitter - Output path is
null in cleanup
2013-08-13 12:58:59,078 WARN  mapred.LocalJobRunner -
job_local899249969_0001
java.lang.Exception: java.io.IOException:
org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
request can not be repeated.
        at 
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.io.IOException:
org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
request can not be repeated.
        at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:95)
        at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:53)
        at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
        at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
        at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
        at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
        at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
        at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
        at java.util.concurrent.FutureTask.run(FutureTask.java:166)
        at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
        at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
        at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
request can not be repeated.
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:475)
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
        at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
        at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
        at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:91)
        ... 11 more
Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
entity enclosing request can not be repeated.
        at
org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
        at
org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
        at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
        at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
        at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
        at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
        at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
        at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:422)
        ... 15 more
2013-08-13 12:58:59,640 ERROR solr.SolrIndexerJob - SolrIndexerJob:
java.lang.RuntimeException: job failed: name=[events_crawl]solr-index,
jobid=job_local899249969_0001
        at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
        at 
org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:46)
        at
org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:54)
        at 
org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:76)
        at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
        at
org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:85)

Any ideas why?  Even if for some reason the server resets the connection
randomly, it should be able to reconnect and continue adding documents
right?  Maybe I need to modify the solr indexer code to reconnect?

I saw the following similar issue in 1.x, but it was dropped as not
reproducible.  However I can reproduce this fairly consistently.

https://issues.apache.org/jira/browse/NUTCH-1348



--
View this message in context: 
http://lucene.472066.n3.nabble.com/SolrIndexerJob-connection-reset-job-failed-tp4084373.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to