Re: umlaut problem

Gora Mohanty Wed, 11 Nov 2015 05:58:28 -0800

On 11 November 2015 at 19:06, Peter Kraume <[email protected]> wrote:
> I crawl a german website with Nutch 1.8 to a Solr 4.8.0 index since a couple 
> of weeks and everything was fine with Umlauts.
>
> Now I have a lot (but not all) documents in the Solr index with garbled 
> umlauts. I’m not aware of any changes that have been made to the website 
> (which uses UTF-8) or the Nutch crawler settings. What puzzles me is that 
> there are documents where Umlauts are correct and others, where the Umlauts 
> are broken.
>
> Do you have any hints for me where I can start debugging this strange issue?


Are there examples of publicly available web pages where the umlaut is
correctly processed, and where it is not?

Regards,
Gora

Re: umlaut problem

Reply via email to