I know this might be more of a SOLR question, but I bet some of you know the 
answer.

I've been using Nutch1.12 + SOLR 5.4.1 successfully for several weeks, but 
suddenly I am having frequent problems. My recent changes have been (1) 
indexing two segments at a time, instead of one, and (2) indexing larger 
segments than before.

The segments are still not terribly large, just 24000 each, for a total of 
48000 in the two-segment job.

Here is the exception I get
17/05/01 07:29:34 INFO mapreduce.Job:  map 100% reduce 67%
17/05/01 07:29:42 INFO mapreduce.Job: Task Id : 
attempt_1491521848897_3507_r_000000_2, Status : FAILED
Error: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://coderox.xxx.com:8984/solr/popular: Exception 
writing document id http://0-0.ooo/ to the index; possible analysis error.
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at 
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:367)
at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:56)
at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:415)
at 
org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)


Of course, the document URL is different each time.

It looks to me like it's complaining about an individual document. This is 
surprising because it didn't happen at all for the first two million documents 
I indexed.

Have you nay suggestions on how to debug this? Or how to make it ignore 
occasional single-document errors without freaking out??

Reply via email to