On 10/24/07, Ian Holsman <[EMAIL PROTECTED]> wrote: > Hi. > > I'm in the middle of bringing up a new solr server and am using the > trunk. (where I was using an earlier nightly release of about 2-3 weeks > ago on my old server) > > now, when I do a search for "日本" (japan) it used to show the kanji in > the <q> area, but now it shows gibberish instead "日本" > > > any hints on where I should start investigating on why this is happening?
My standard answer is to use the python writer (wt=python) to see what the actual unicode values are when debugging an issue like this. When I try your URL with the example server from the solr trunk, I get 'q':u'\u65e5\u672c', And when I try your server, I get 'q':u'\u00e6\u0097\u00a5\u00e6\u009c\u00ac', So the answer is that your app-server isn't correctly handling UTF-8 encoded URLs. I see you are using Tomcat... see http://wiki.apache.org/solr/SolrTomcat ------------------------------------------------ URI Charset Config If you are going to query Solr using international characters (>127) using HTTP-GET, you must configure Tomcat to conform to the URI standard by accepting percent-encoded UTF-8. Edit Tomcat's conf/server.xml and add the following attribute to the correct Connector element: URIEncoding="UTF-8". <Server ...> <Service ...> <Connector ... URIEncoding="UTF-8"/> ... </Connector> </Service> </Server> This is only an issue when sending non-ascii characters in a query request... no configuration is needed for Solr/Tomcat to return non-ascii chars in a response, or accept non-ascii chars in an HTTP-POST body. ------------------------------------------------ -Yonik