On 11 November 2015 at 19:06, Peter Kraume <[email protected]> wrote: > I crawl a german website with Nutch 1.8 to a Solr 4.8.0 index since a couple > of weeks and everything was fine with Umlauts. > > Now I have a lot (but not all) documents in the Solr index with garbled > umlauts. I’m not aware of any changes that have been made to the website > (which uses UTF-8) or the Nutch crawler settings. What puzzles me is that > there are documents where Umlauts are correct and others, where the Umlauts > are broken. > > Do you have any hints for me where I can start debugging this strange issue?
Are there examples of publicly available web pages where the umlaut is correctly processed, and where it is not? Regards, Gora

