I crawl a german website with Nutch 1.8 to a Solr 4.8.0 index since a couple of 
weeks and everything was fine with Umlauts.

Now I have a lot (but not all) documents in the Solr index with garbled 
umlauts. I’m not aware of any changes that have been made to the website (which 
uses UTF-8) or the Nutch crawler settings. What puzzles me is that there are 
documents where Umlauts are correct and others, where the Umlauts are broken.

Do you have any hints for me where I can start debugging this strange issue?

Cheers
Peter

Attachment: signature.asc
Description: Message signed with OpenPGP using GPGMail

Reply via email to