Re: UTF-8 indexing and searching

2005-07-01 Thread Paul Libbrecht
Careful that in the http world, there's an amibuity: x-www-form-url-encoded does not specify the content-encoding that the byts represented in the %-escaped sequences are written with. That's fixed by the very recent URI spec where absence means utf-8... My experience was that Tomcat simply con

Re: UTF-8 indexing and searching

2005-07-01 Thread pierre.conti
Did you check that the request string you get at the analyzer level is corectly encoded as UTF-8? We had the same problem with french accentuated char encoded also as UTF-8, and transmited by tomcat as ISO-8859-1. It was just for a test, also we didn't investgated a lot, but re-encode in URL/ISO-8