J B wrote:
Hello,

Is there anyone who can help me configure Nutch so that I can use it for Swedics or German websites containing characters like "�" and "�"? Crawling and indexing seems to work fine, it's just the searching that goes wrong. When I enter a searchstring like "K�ln", knowing that it appears in the text, the resultpage says that there are no matching results, and the "�" is replaced by random characters...

I have searched the docs and the web, but I can't find the answer to my problem.


The characters are not random - they correspond to a url-encoding of utf-8 encoding of latin1 characters, whereas they should be a url-encoding of utf-8 encoding of utf-8 characters.

;-)

For the US-Ascii range each of the above gives the same result, but for all other characters it gives wrong results.

Please make sure that you set the page encoding to utf-8 in your JSPs, htmls, and preferably the same as the default character encoding, somewhere in the configuration of your servlet engine. As the old hands say: "choose UTF-8 and stick to it religiously".

--
Best regards,
Andrzej Bialecki     <><
 ___. ___ ___ ___ _ _   __________________________________
[__ || __|__/|__||\/|  Information Retrieval, Semantic Web
___|||__||  \|  ||  |  Embedded Unix, System Integration
http://www.sigram.com  Contact: info at sigram dot com

Reply via email to