Thank you, Furkan, for the excellent suggestion. Once I found the solr logs, it was not too hard to discover that there were OutOfMemory exceptions neighboring the "possible analysis" errors. I was able to fix it by boosting the Java heap size (not easy to do using docker). Blaming solr for the misleading messages!
Hi Michael, What do you have in your Solr logs? Kind Regards, Furkan KAMACI 2 May 2017 Sal, saat 02:45 tarihinde Michael Coffey <[email protected]> şunu yazdı: > I know this might be more of a SOLR question, but I bet some of you know > the answer. > > I've been using Nutch1.12 + SOLR 5.4.1 successfully for several weeks, but > suddenly I am having frequent problems. My recent changes have been (1) > indexing two segments at a time, instead of one, and (2) indexing larger > segments than before. > > The segments are still not terribly large, just 24000 each, for a total of > 48000 in the two-segment job. > > Here is the exception I get > 17/05/01 07:29:34 INFO mapreduce.Job: map 100% reduce 67% > 17/05/01 07:29:42 INFO mapreduce.Job: Task Id : > attempt_1491521848897_3507_r_000000_2, Status : FAILED > Error: > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error > from server at http://coderox.xxx.com:8984/solr/popular: Exception > writing document id http://0-0.ooo/ to the index; possible analysis error. > at > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) > at > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209) > at > org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173) > at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50) > at > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41) > at > org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493) > at org.apache.hadoop.mapred.ReduceTask$3.collect(ReduceTask.java:422) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:367) > at > org.apache.nutch.indexer.IndexerMapReduce.reduce(IndexerMapReduce.java:56) > at org.apache.hadoop.mapred.ReduceTask.runOldReducer(ReduceTask.java:444) > at org.apache.hadoop.mapred.ReduceTask.run(ReduceTask.java:392) > at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:164) > at java.security.AccessController.doPrivileged(Native Method) > at javax.security.auth.Subject.doAs(Subject.java:415) > at > org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1657) > at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:158) > > > Of course, the document URL is different each time. > > It looks to me like it's complaining about an individual document. This is > surprising because it didn't happen at all for the first two million > documents I indexed. > > Have you nay suggestions on how to debug this? Or how to make it ignore > occasional single-document errors without freaking out?? >

