The query seems fine - as far as the URL being UTF-8. It seems that the documents are not being passed to Solr with UTF-8 encoding. The document is not part of the URL. It is HTTP POST data.

Try an explicit curl command to add a document and see if it is indexed with the accents.

-- Jack Krupansky

-----Original Message----- From: couto.vicente
Sent: Monday, May 28, 2012 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Accent Characters

Hi, Jack.
First of all thank you for your help.
Well, I tried again then I realized that my problem is not really with solr.
I did run this query against solr after start it up with the command "java
-jar start.jar":
http://localhost:8983/solr/coreFR/spell?q=content:pr%C3%A9senta&spellcheck=true&spellcheck.collate=true&rows=0&spellcheck.count=10

It gives me the result:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
 <int name="status">0</int>
 <int name="QTime">31</int>
 </lst>
 <result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="présenta">
 <int name="numFound">10</int>
 <int name="startOffset">8</int>
 <int name="endOffset">16</int>
<arr name="suggestion">
 <str>présente</str>
 <str>présent</str>
 <str>présenté</str>
 <str>présents</str>
 <str>présentant</str>
 <str>présentera</str>
 <str>présentait</str>
 <str>présentes</str>
 <str>présenter</str>
 <str>présentée</str>
 </arr>
 </lst>
 <str name="collation">content:présente</str>
 </lst>
 </lst>
</response>

And I did run exactly the same query after deploy solr.war in tomcat 7. Here
is my result:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
 <int name="status">0</int>
 <int name="QTime">16</int>
 </lst>
 <result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="présenta">
 <int name="numFound">10</int>
 <int name="startOffset">8</int>
 <int name="endOffset">16</int>
<arr name="suggestion">
 <str>present</str>
 <str>prbsent</str>
 <str>presentant</str>
 <str>presentait</str>
 <str>puisent</str>
 <str>pasent</str>
 <str>pensent</str>
 <str>posent</str>
 <str>dresent</str>
 <str>resenti</str>
 </arr>
 </lst>
 <str name="collation">content:present</str>
 </lst>
 </lst>
</response>

As my application is running under tomcat, it means that I have some issue
with tomcat, but the weird stuff is that I already google it looking for a
fix and find out that we have to set up a parameter into server.xml tomcat
config file:

<Connector port="5443" protocol="HTTP/1.1"
              connectionTimeout="20000"
              redirectPort="8443"
              URIEncoding="UTF-8" />

But it's not working as you "can see".
I'm feeling a little stupid because it doesn't look like a big problem. For
sure people around the world are using solr with accents queries running
under tomcat properly!

Thank you

--
View this message in context: http://lucene.472066.n3.nabble.com/Accent-Characters-tp3985931p3986423.html Sent from the Solr - User mailing list archive at Nabble.com.

Reply via email to