On 10/24/07, Ian Holsman <[EMAIL PROTECTED]> wrote:
> Hi.
>
> I'm in the middle of bringing up a new solr server and am using the
> trunk. (where I was using an earlier nightly release of about 2-3 weeks
> ago on my old server)
>
> now, when I do a search for "日本" (japan) it used to show the kanji in
> the <q> area, but now it shows gibberish instead "日本"
>
>
> any hints on where I should start investigating on why this is happening?

My standard answer is to use the python writer (wt=python) to see what
the actual unicode values are when debugging an issue like this.

When I try your URL with the example server from the solr trunk, I get
        'q':u'\u65e5\u672c',
And when I try your server, I get
        'q':u'\u00e6\u0097\u00a5\u00e6\u009c\u00ac',

So the answer is that your app-server isn't correctly handling UTF-8
encoded URLs.
I see you are using Tomcat... see
http://wiki.apache.org/solr/SolrTomcat

------------------------------------------------
URI Charset Config

If you are going to query Solr using international characters (>127)
using HTTP-GET, you must configure Tomcat to conform to the URI
standard by accepting percent-encoded UTF-8.

Edit Tomcat's conf/server.xml and add the following attribute to the
correct Connector element: URIEncoding="UTF-8".

<Server ...>
 <Service ...>
   <Connector ... URIEncoding="UTF-8"/>
     ...
   </Connector>
 </Service>
</Server>

This is only an issue when sending non-ascii characters in a query
request... no configuration is needed for Solr/Tomcat to return
non-ascii chars in a response, or accept non-ascii chars in an
HTTP-POST body.
------------------------------------------------

-Yonik

Reply via email to