J B wrote:
Hello,
Is there anyone who can help me configure Nutch so that I can use it for
Swedics or German websites containing characters like "�" and "�"?
Crawling and indexing seems to work fine, it's just the searching that
goes wrong. When I enter a searchstring like "K�ln", knowing that it
appears in the text, the resultpage says that there are no matching
results, and the "�" is replaced by random characters...
I have searched the docs and the web, but I can't find the answer to my
problem.
The characters are not random - they correspond to a url-encoding of
utf-8 encoding of latin1 characters, whereas they should be a
url-encoding of utf-8 encoding of utf-8 characters.
;-)
For the US-Ascii range each of the above gives the same result, but for
all other characters it gives wrong results.
Please make sure that you set the page encoding to utf-8 in your JSPs,
htmls, and preferably the same as the default character encoding,
somewhere in the configuration of your servlet engine. As the old hands
say: "choose UTF-8 and stick to it religiously".
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com