I am running nutch 2.2.1 (not distributed) with hbase and solr 3.5. When I
run 10-15 crawls in a row, all on one machine, it seems one crawl will
randomly fail at index time with the following log messages, but the
subsequent crawls will work and index fine:
013-08-13 12:58:57,430 INFO collection.CollectionManager - file has23
elements
2013-08-13 12:58:58,914 INFO solr.SolrWriter - Adding 158 documents
2013-08-13 12:58:59,034 INFO httpclient.HttpMethodDirector - I/O exception
(java.net.SocketException) caught when processing request: Connection reset
2013-08-13 12:58:59,035 INFO httpclient.HttpMethodDirector - Retrying
request
2013-08-13 12:58:59,037 INFO solr.SolrWriter - Adding 158 documents
2013-08-13 12:58:59,076 INFO httpclient.HttpMethodDirector - I/O exception
(java.net.SocketException) caught when processing request: Connection reset
2013-08-13 12:58:59,076 INFO httpclient.HttpMethodDirector - Retrying
request
2013-08-13 12:58:59,077 WARN mapred.FileOutputCommitter - Output path is
null in cleanup
2013-08-13 12:58:59,078 WARN mapred.LocalJobRunner -
job_local899249969_0001
java.lang.Exception: java.io.IOException:
org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
request can not be repeated.
at
org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:354)
Caused by: java.io.IOException:
org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
request can not be repeated.
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:95)
at
org.apache.nutch.indexer.IndexerOutputFormat$1.close(IndexerOutputFormat.java:53)
at
org.apache.hadoop.mapred.MapTask$NewDirectOutputCollector.close(MapTask.java:650)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:767)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:364)
at
org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:223)
at
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
at java.lang.Thread.run(Thread.java:722)
Caused by: org.apache.solr.client.solrj.SolrServerException:
org.apache.commons.httpclient.ProtocolException: Unbuffered entity enclosing
request can not be repeated.
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:475)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:244)
at
org.apache.solr.client.solrj.request.AbstractUpdateRequest.process(AbstractUpdateRequest.java:105)
at org.apache.solr.client.solrj.SolrServer.add(SolrServer.java:49)
at org.apache.nutch.indexer.solr.SolrWriter.close(SolrWriter.java:91)
... 11 more
Caused by: org.apache.commons.httpclient.ProtocolException: Unbuffered
entity enclosing request can not be repeated.
at
org.apache.commons.httpclient.methods.EntityEnclosingMethod.writeRequestBody(EntityEnclosingMethod.java:487)
at
org.apache.commons.httpclient.HttpMethodBase.writeRequest(HttpMethodBase.java:2114)
at
org.apache.commons.httpclient.HttpMethodBase.execute(HttpMethodBase.java:1096)
at
org.apache.commons.httpclient.HttpMethodDirector.executeWithRetry(HttpMethodDirector.java:398)
at
org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:171)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:397)
at
org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:323)
at
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:422)
... 15 more
2013-08-13 12:58:59,640 ERROR solr.SolrIndexerJob - SolrIndexerJob:
java.lang.RuntimeException: job failed: name=[events_crawl]solr-index,
jobid=job_local899249969_0001
at org.apache.nutch.util.NutchJob.waitForCompletion(NutchJob.java:54)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:46)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.indexSolr(SolrIndexerJob.java:54)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.run(SolrIndexerJob.java:76)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at
org.apache.nutch.indexer.solr.SolrIndexerJob.main(SolrIndexerJob.java:85)
Any ideas why? Even if for some reason the server resets the connection
randomly, it should be able to reconnect and continue adding documents
right? Maybe I need to modify the solr indexer code to reconnect?
I saw the following similar issue in 1.x, but it was dropped as not
reproducible. However I can reproduce this fairly consistently.
https://issues.apache.org/jira/browse/NUTCH-1348
--
View this message in context:
http://lucene.472066.n3.nabble.com/SolrIndexerJob-connection-reset-job-failed-tp4084373.html
Sent from the Nutch - User mailing list archive at Nabble.com.