RE: invalid utf8 chars when indexing or cleaning

2017-09-01 Thread Markus Jelsma
Set logging to debug, HttpClient then logs what's being sent over the wire so 
you can catch the data. It is less tedious than Wireshark. 

 
 
-Original message-
> From:Michael Coffey <mcof...@yahoo.com.INVALID>
> Sent: Friday 1st September 2017 5:12
> To: user@nutch.apache.org
> Subject: Re: invalid utf8 chars when indexing or cleaning
> 
> It sounds like a good suggestion, but I don't know what you mean by "verify 
> the output Nutch generates and inspect it manually." How do I get a look at 
> that XML?
> 
> 
>   From: 
>  To: "user@nutch.apache.org" <user@nutch.apache.org> 
>  Sent: Thursday, August 31, 2017 11:59 AM
>  Subject: RE: invalid utf8 chars when indexing or cleaning
>
> The bug is identical, but i fixed it! You should verify the output Nutch 
> generates and inspect it manually, there should be a 0x at that byte. If 
> it really is there, we need to check the fix once more, despite that i am 
> sure the patch works as intended.
> 
> Get the XML, pass it through the method and see what it does to the output.
> 
>  
>  
> -Original message-
> > From:Jorge Betancourt <betancourt.jo...@gmail.com>
> > Sent: Tuesday 29th August 2017 21:54
> > To: user@nutch.apache.org
> > Subject: Re: invalid utf8 chars when indexing or cleaning
> > 
> >  From the logs looks like the error is coming from the Solr side, do you 
> > mind checking/sharing the logs on your Solr server? Can you pin point which 
> > URL is causing the issue?
> > Best Regards, Jorge
> > 
> > On Tue, Aug 29, 2017 at 9:25 PM, Michael Coffey <mcof...@yahoo.com.invalid> 
> > wrote:
> > Does anybody have any thoughts on this? It seems similar to the NUTCH-1016 
> > bug that was fixed in version 1.4.
> > Some more bits of information: the indexer job rarely fails (only 1 of the 
> > last 99 segments) but the cleaning job fails every time now. Once again, 
> > this is Nutch 1.12 and Solr 5.4.1. I recently upgraded to hadoop 2.7.4 and 
> > Java 1.8 from Hadoop 2.7.2 and java 1.7. Could this be some kind of 
> > mismatch of versions?
> > 
> > 
> > To: User <user@nutch.apache.org>
> > Sent: Thursday, August 24, 2017 7:42 PM
> > Subject: invalid utf8 chars when indexing or cleaning
> > 
> > Lately, I have seen many tasks and jobs fail in Solr when doing nutch index 
> > and nutch clean.
> > Messages during indexing look like this.
> > 17/08/24 19:18:59 INFO mapreduce.Job: map 100% reduce 99%
> > 17/08/24 19:19:36 INFO mapreduce.Job: Task Id : 
> > attempt_1502929850483_1329_r_07_2, Status : FAILED
> > Error: 
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> > from server at http://codero4.neocortix.com:8984/solr/popular: [
> > com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
> > #104705, byte #219135)
> > at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
> > at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
> > at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
> > at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
> > at 
> > org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
> > at 
> > org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
> > at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
> > at 
> > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
> > at 
> > org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
> > at 
> > org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
> > 
> > Messages during cleaning look like this.
> > 17/08/22 09:24:01 INFO mapreduce.Job: map 100% reduce 92%17/08/22 09:25:57 
> > INFO mapreduce.Job: Task Id : attempt_1502929850483_1016_r_03_1, Status 
> > : FAILEDError: 
> > org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> > from server at http://codero4.neocortix.com:8984/solr/popular: 
> > [com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
> > #16099, byte #16383) at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
> >  
> > at 
> > org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
> >  
> > at 
> > org.apache.solr.client.solrj.

Re: invalid utf8 chars when indexing or cleaning

2017-08-31 Thread Michael Coffey
It sounds like a good suggestion, but I don't know what you mean by "verify the 
output Nutch generates and inspect it manually." How do I get a look at that 
XML?


  From: 
 To: "user@nutch.apache.org" <user@nutch.apache.org> 
 Sent: Thursday, August 31, 2017 11:59 AM
 Subject: RE: invalid utf8 chars when indexing or cleaning
   
The bug is identical, but i fixed it! You should verify the output Nutch 
generates and inspect it manually, there should be a 0x at that byte. If it 
really is there, we need to check the fix once more, despite that i am sure the 
patch works as intended.

Get the XML, pass it through the method and see what it does to the output.

 
 
-Original message-
> From:Jorge Betancourt <betancourt.jo...@gmail.com>
> Sent: Tuesday 29th August 2017 21:54
> To: user@nutch.apache.org
> Subject: Re: invalid utf8 chars when indexing or cleaning
> 
>  From the logs looks like the error is coming from the Solr side, do you 
> mind checking/sharing the logs on your Solr server? Can you pin point which 
> URL is causing the issue?
> Best Regards, Jorge
> 
> On Tue, Aug 29, 2017 at 9:25 PM, Michael Coffey <mcof...@yahoo.com.invalid> 
> wrote:
> Does anybody have any thoughts on this? It seems similar to the NUTCH-1016 
> bug that was fixed in version 1.4.
> Some more bits of information: the indexer job rarely fails (only 1 of the 
> last 99 segments) but the cleaning job fails every time now. Once again, 
> this is Nutch 1.12 and Solr 5.4.1. I recently upgraded to hadoop 2.7.4 and 
> Java 1.8 from Hadoop 2.7.2 and java 1.7. Could this be some kind of 
> mismatch of versions?
> 
> 
> To: User <user@nutch.apache.org>
> Sent: Thursday, August 24, 2017 7:42 PM
> Subject: invalid utf8 chars when indexing or cleaning
> 
> Lately, I have seen many tasks and jobs fail in Solr when doing nutch index 
> and nutch clean.
> Messages during indexing look like this.
> 17/08/24 19:18:59 INFO mapreduce.Job: map 100% reduce 99%
> 17/08/24 19:19:36 INFO mapreduce.Job: Task Id : 
> attempt_1502929850483_1329_r_07_2, Status : FAILED
> Error: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://codero4.neocortix.com:8984/solr/popular: [
> com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
> #104705, byte #219135)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
> at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
> at 
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
> 
> Messages during cleaning look like this.
> 17/08/22 09:24:01 INFO mapreduce.Job: map 100% reduce 92%17/08/22 09:25:57 
> INFO mapreduce.Job: Task Id : attempt_1502929850483_1016_r_03_1, Status 
> : FAILEDError: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://codero4.neocortix.com:8984/solr/popular: 
> [com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
> #16099, byte #16383) at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
>  
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>  
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>  
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) 
> at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:825) 
> at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:788) 
> at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:803) 
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:222)
>  
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:187)
>  
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>  
> at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) at 
> org.apache.nutch.indexer.CleaningJob$DeleterRed

RE: invalid utf8 chars when indexing or cleaning

2017-08-31 Thread Markus Jelsma
The bug is identical, but i fixed it! You should verify the output Nutch 
generates and inspect it manually, there should be a 0x at that byte. If it 
really is there, we need to check the fix once more, despite that i am sure the 
patch works as intended.

Get the XML, pass it through the method and see what it does to the output.

 
 
-Original message-
> From:Jorge Betancourt <betancourt.jo...@gmail.com>
> Sent: Tuesday 29th August 2017 21:54
> To: user@nutch.apache.org
> Subject: Re: invalid utf8 chars when indexing or cleaning
> 
>  From the logs looks like the error is coming from the Solr side, do you 
> mind checking/sharing the logs on your Solr server? Can you pin point which 
> URL is causing the issue?
> Best Regards, Jorge
> 
> On Tue, Aug 29, 2017 at 9:25 PM, Michael Coffey <mcof...@yahoo.com.invalid> 
> wrote:
> Does anybody have any thoughts on this? It seems similar to the NUTCH-1016 
> bug that was fixed in version 1.4.
> Some more bits of information: the indexer job rarely fails (only 1 of the 
> last 99 segments) but the cleaning job fails every time now. Once again, 
> this is Nutch 1.12 and Solr 5.4.1. I recently upgraded to hadoop 2.7.4 and 
> Java 1.8 from Hadoop 2.7.2 and java 1.7. Could this be some kind of 
> mismatch of versions?
> 
> 
> To: User <user@nutch.apache.org>
> Sent: Thursday, August 24, 2017 7:42 PM
> Subject: invalid utf8 chars when indexing or cleaning
> 
> Lately, I have seen many tasks and jobs fail in Solr when doing nutch index 
> and nutch clean.
> Messages during indexing look like this.
> 17/08/24 19:18:59 INFO mapreduce.Job: map 100% reduce 99%
> 17/08/24 19:19:36 INFO mapreduce.Job: Task Id : 
> attempt_1502929850483_1329_r_07_2, Status : FAILED
> Error: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://codero4.neocortix.com:8984/solr/popular: [
> com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
> #104705, byte #219135)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
> at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
> at 
> org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
> at 
> org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)
> 
> Messages during cleaning look like this.
> 17/08/22 09:24:01 INFO mapreduce.Job: map 100% reduce 92%17/08/22 09:25:57 
> INFO mapreduce.Job: Task Id : attempt_1502929850483_1016_r_03_1, Status 
> : FAILEDError: 
> org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
> from server at http://codero4.neocortix.com:8984/solr/popular: 
> [com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
> #16099, byte #16383) at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
>  
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
>  
> at 
> org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
>  
> at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) 
> at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:825) 
> at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:788) 
> at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:803) 
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:222)
>  
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:187)
>  
> at 
> org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
>  
> at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) at 
> org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120)
>  
> at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:245)
> Can anyone suggest a way to fix this? I am using nutch 1.12 and Solr 5.4.1. 
> I recently upgraded to hadoop 2.7.4 and Java 1.8. I don't remember noticing 
> this happening with Hadoop 2.7.2 and java 1.7. It happens very often now.


Re: invalid utf8 chars when indexing or cleaning

2017-08-29 Thread Jorge Betancourt
From the logs looks like the error is coming from the Solr side, do you 
mind checking/sharing the logs on your Solr server? Can you pin point which 
URL is causing the issue?

Best Regards, Jorge

On Tue, Aug 29, 2017 at 9:25 PM, Michael Coffey  
wrote:
Does anybody have any thoughts on this? It seems similar to the NUTCH-1016 
bug that was fixed in version 1.4.
Some more bits of information: the indexer job rarely fails (only 1 of the 
last 99 segments) but the cleaning job fails every time now. Once again, 
this is Nutch 1.12 and Solr 5.4.1. I recently upgraded to hadoop 2.7.4 and 
Java 1.8 from Hadoop 2.7.2 and java 1.7. Could this be some kind of 
mismatch of versions?



To: User 
Sent: Thursday, August 24, 2017 7:42 PM
Subject: invalid utf8 chars when indexing or cleaning

Lately, I have seen many tasks and jobs fail in Solr when doing nutch index 
and nutch clean.

Messages during indexing look like this.
17/08/24 19:18:59 INFO mapreduce.Job: map 100% reduce 99%
17/08/24 19:19:36 INFO mapreduce.Job: Task Id : 
attempt_1502929850483_1329_r_07_2, Status : FAILED
Error: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://codero4.neocortix.com:8984/solr/popular: [
com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
#104705, byte #219135)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)

at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)

at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
at 
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)


Messages during cleaning look like this.
17/08/22 09:24:01 INFO mapreduce.Job: map 100% reduce 92%17/08/22 09:25:57 
INFO mapreduce.Job: Task Id : attempt_1502929850483_1016_r_03_1, Status 
: FAILEDError: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://codero4.neocortix.com:8984/solr/popular: 
[com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
#16099, byte #16383) at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575) 
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241) 
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230) 
at org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150) 
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:825) 
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:788) 
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:803) 
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:222) 
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:187) 
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178) 
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115) at 
org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120) 
at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:245)
Can anyone suggest a way to fix this? I am using nutch 1.12 and Solr 5.4.1. 
I recently upgraded to hadoop 2.7.4 and Java 1.8. I don't remember noticing 
this happening with Hadoop 2.7.2 and java 1.7. It happens very often now.

Re: invalid utf8 chars when indexing or cleaning

2017-08-29 Thread Michael Coffey
Does anybody have any thoughts on this? It seems similar to the NUTCH-1016 bug 
that was fixed in version 1.4.
Some more bits of information: the indexer job rarely fails (only 1 of the last 
99 segments) but the cleaning job fails every time now. Once again, this is 
Nutch 1.12 and Solr 5.4.1. I recently upgraded to hadoop 2.7.4 and Java 1.8 
from Hadoop 2.7.2 and java 1.7. Could this be some kind of mismatch of versions?


 To: User  
 Sent: Thursday, August 24, 2017 7:42 PM
 Subject: invalid utf8 chars when indexing or cleaning
   
Lately, I have seen many tasks and jobs fail in Solr when doing nutch index and 
nutch clean.
Messages during indexing look like this.
17/08/24 19:18:59 INFO mapreduce.Job:  map 100% reduce 99%
17/08/24 19:19:36 INFO mapreduce.Job: Task Id : 
attempt_1502929850483_1329_r_07_2, Status : FAILED
Error: org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: 
Error from server at http://codero4.neocortix.com:8984/solr/popular: [
com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
#104705, byte #219135)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
    at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
    at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1220)
    at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:209)
    at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.write(SolrIndexWriter.java:173)
    at org.apache.nutch.indexer.IndexWriters.write(IndexWriters.java:85)
    at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:50)
    at 
org.apache.nutch.indexer.IndexerOutputFormat$1.write(IndexerOutputFormat.java:41)
    at 
org.apache.hadoop.mapred.ReduceTask$OldTrackingRecordWriter.write(ReduceTask.java:493)

Messages during cleaning look like this.
17/08/22 09:24:01 INFO mapreduce.Job:  map 100% reduce 92%17/08/22 09:25:57 
INFO mapreduce.Job: Task Id : attempt_1502929850483_1016_r_03_1, Status : 
FAILEDError: 
org.apache.solr.client.solrj.impl.HttpSolrClient$RemoteSolrException: Error 
from server at http://codero4.neocortix.com:8984/solr/popular: 
[com.ctc.wstx.exc.WstxLazyException] Invalid UTF-8 character 0x at char 
#16099, byte #16383)at 
org.apache.solr.client.solrj.impl.HttpSolrClient.executeMethod(HttpSolrClient.java:575)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:241)
at 
org.apache.solr.client.solrj.impl.HttpSolrClient.request(HttpSolrClient.java:230)
at 
org.apache.solr.client.solrj.SolrRequest.process(SolrRequest.java:150)
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:825)  
  at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:788)
at org.apache.solr.client.solrj.SolrClient.deleteById(SolrClient.java:803)  
  at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.push(SolrIndexWriter.java:222)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.commit(SolrIndexWriter.java:187)
at 
org.apache.nutch.indexwriter.solr.SolrIndexWriter.close(SolrIndexWriter.java:178)
at org.apache.nutch.indexer.IndexWriters.close(IndexWriters.java:115)   
 at 
org.apache.nutch.indexer.CleaningJob$DeleterReducer.close(CleaningJob.java:120) 
   at org.apache.hadoop.io.IOUtils.cleanup(IOUtils.java:245)
Can anyone suggest a way to fix this? I am using nutch 1.12 and Solr 5.4.1. I 
recently upgraded to hadoop 2.7.4 and Java 1.8. I don't remember noticing this 
happening with Hadoop 2.7.2 and java 1.7. It happens very often now.