RE: invalid utf8 chars when indexing or cleaning

2017-09-01 Thread Markus Jelsma
e.org > Subject: Re: invalid utf8 chars when indexing or cleaning > > It sounds like a good suggestion, but I don't know what you mean by "verify > the output Nutch generates and inspect it manually." How do I get a look at > that XML? > > > From: >

Re: invalid utf8 chars when indexing or cleaning

2017-08-31 Thread Michael Coffey
e patch works as intended. Get the XML, pass it through the method and see what it does to the output. -Original message- > From:Jorge Betancourt <betancourt.jo...@gmail.com> > Sent: Tuesday 29th August 2017 21:54 > To: user@nutch.apache.org > Subject: Re: invalid utf

RE: invalid utf8 chars when indexing or cleaning

2017-08-31 Thread Markus Jelsma
the method and see what it does to the output. -Original message- > From:Jorge Betancourt <betancourt.jo...@gmail.com> > Sent: Tuesday 29th August 2017 21:54 > To: user@nutch.apache.org > Subject: Re: invalid utf8 chars when indexing or cleaning > > From the l

Re: invalid utf8 chars when indexing or cleaning

2017-08-29 Thread Jorge Betancourt
graded to hadoop 2.7.4 and Java 1.8 from Hadoop 2.7.2 and java 1.7. Could this be some kind of mismatch of versions? To: User <user@nutch.apache.org> Sent: Thursday, August 24, 2017 7:42 PM Subject: invalid utf8 chars when indexing or cleaning Lately, I have seen many tasks and jobs fail

Re: invalid utf8 chars when indexing or cleaning

2017-08-29 Thread Michael Coffey
recently upgraded to hadoop 2.7.4 and Java 1.8 from Hadoop 2.7.2 and java 1.7. Could this be some kind of mismatch of versions? To: User <user@nutch.apache.org> Sent: Thursday, August 24, 2017 7:42 PM Subject: invalid utf8 chars when indexing or cleaning Lately, I have seen many

invalid utf8 chars when indexing or cleaning

2017-08-24 Thread Michael Coffey
Lately, I have seen many tasks and jobs fail in Solr when doing nutch index and nutch clean. Messages during indexing look like this. 17/08/24 19:18:59 INFO mapreduce.Job:  map 100% reduce 99% 17/08/24 19:19:36 INFO mapreduce.Job: Task Id : attempt_1502929850483_1329_r_07_2, Status : FAILED