I crawl a german website with Nutch 1.8 to a Solr 4.8.0 index since a couple of weeks and everything was fine with Umlauts.
Now I have a lot (but not all) documents in the Solr index with garbled umlauts. I’m not aware of any changes that have been made to the website (which uses UTF-8) or the Nutch crawler settings. What puzzles me is that there are documents where Umlauts are correct and others, where the Umlauts are broken. Do you have any hints for me where I can start debugging this strange issue? Cheers Peter
signature.asc
Description: Message signed with OpenPGP using GPGMail

