https://issues.apache.org/jira/browse/NUTCH-1016 The patch applies to a 1.3 checkout.
> Hello, > > I have been trying to index some segments using solrindex with nutch 1.3 > and solr 3.1. > > Most of the time the indexing goes well, but sometimes I get an error: > > SEVERE: java.lang.RuntimeException: [was class > java.io.CharConversionException] Invalid UTF-8 character 0xffff at char > #172317, byte #175887) > > There is an interesting thread on the solr lists, but it doesn't really > address the root issue: > http://lucene.472066.n3.nabble.com/Solr-3-1-indexing-error-Invalid-UTF-8-ch > aracter-0xffff-td3113191.html > > How can I fix this? Is this something that could be filtered out by the > solrindex class, or alternately be filtered out with perl or sed? It seems > like this is a problem with solr, has anyone else experienced this? Is > this fixed on solr 3.2? > > Thanks in advance, > > Jason

