On 10/2/07, Maximilian Hütter <[EMAIL PROTECTED]> wrote:
> Are you sure, they are wrong in the index?

It's not an issue with Jetty output encoding since the python writer
takes the string and converts it to ascii before that.  Since Solr
does no charset encoding itself on output, that must mean that it's in
the index incorrectly.

> When I use the Lucene Index
> Monitor (http://limo.sourceforge.net/) to look at the document in the
> index the Japanese is displayed correctly.

I've never really used limo, but it's possible it's incorrectly
interpreting what's in the index (and by luck doing the reverse
transformation that got the data in there incorrectly).

Try indexing a document with a unicode character specified via an
entity, to remove the issues of input char encodings.  For example if
a Japanese char has a unicode value of \u1234, then in the XML doc,
use &#x1234

-Yonik

Reply via email to