Hello,

I have been trying to index some segments using solrindex with nutch 1.3 and
solr 3.1.

Most of the time the indexing goes well, but sometimes I get an error:

SEVERE: java.lang.RuntimeException: [was class
java.io.CharConversionException] Invalid UTF-8 character 0xffff at char
#172317, byte #175887)

There is an interesting thread on the solr lists, but it doesn't really
address the root issue:
http://lucene.472066.n3.nabble.com/Solr-3-1-indexing-error-Invalid-UTF-8-character-0xffff-td3113191.html

How can I fix this?  Is this something that could be filtered out by the
solrindex class, or alternately be filtered out with perl or sed?  It seems
like this is a problem with solr, has anyone else experienced this?  Is this
fixed on solr 3.2?

Thanks in advance,

Jason

Reply via email to