Hello - this means you have a broken analyzer, one of the token filters or 
charfilters in your chain is broken. It is usually about a startOffset being 
ahead of a endOffset, which is not possible indeed. Lucene detects this 
proactive and won't allow you to add erroneous input.

Fix your analyzer and prepare for a tedious job, even if you're familiar with 
token filter implementations. The steps are to narrow down to the document in 
the segment, then the filter in the chain. Good luck!

Regards,
Markus
 
-----Original message-----
> From:Michael Coffey <[email protected]>
> Sent: Tuesday 2nd May 2017 1:45
> To: User <[email protected]>
> Subject: idexer &quot;possible analysis error&quot;
> 
> I know this might be more of a SOLR question, but I bet some of you know the 
> answer.
> 
> I've been using Nutch1.12 + SOLR 5.4.1 successfully for several weeks, but 
> suddenly I am having frequent problems. My recent changes have been (1) 
> indexing two segments at a time, instead of one, and (2) indexing larger 
> segments than before.
> 
> The segments are still not terribly large, just 24000 each, for a total of 
> 48000 in the two-segment job.
> 
> Here is the exception I get
> 17/05/01 07:29:34 INFO mapreduce.Job:  map 100% reduce 67%
> 17/05/01 07:29:42 INFO mapreduce.Job: Task Id : 
> attempt_1491521848897_3507_r_000000_2, Status : FAILED
> Error: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
> Error from server at http://coderox.xxx.com:8984/solr/popular: Exception 
> writing document id http://0-0.ooo/ to the index; possible analysis error.
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
> at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
> at 
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
> at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422)
> at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:367)
> at org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:56)
> at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444)
> at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392)
> at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164)
> at java.security.AccessController.doPrivileged(Native Method)
> at javax.security.auth.Subject.doAs(Subject.java:415)
> at 
> org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657)
> at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158)
> 
> 
> Of course, the document URL is different each time.
> 
> It looks to me like it's complaining about an individual document. This is 
> surprising because it didn't happen at all for the first two million 
> documents I indexed.
> 
> Have you nay suggestions on how to debug this? Or how to make it ignore 
> occasional single-document errors without freaking out??
> 

Reply via email to