The query seems fine - as far as the URL being UTF-8. It seems that the
documents are not being passed to Solr with UTF-8 encoding. The document is
not part of the URL. It is HTTP POST data.
Try an explicit curl command to add a document and see if it is indexed with
the accents.
-- Jack Krupansky
-----Original Message-----
From: couto.vicente
Sent: Monday, May 28, 2012 9:58 AM
To: solr-user@lucene.apache.org
Subject: Re: Accent Characters
Hi, Jack.
First of all thank you for your help.
Well, I tried again then I realized that my problem is not really with solr.
I did run this query against solr after start it up with the command "java
-jar start.jar":
http://localhost:8983/solr/coreFR/spell?q=content:pr%C3%A9senta&spellcheck=true&spellcheck.collate=true&rows=0&spellcheck.count=10
It gives me the result:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">31</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="présenta">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">16</int>
<arr name="suggestion">
<str>présente</str>
<str>présent</str>
<str>présenté</str>
<str>présents</str>
<str>présentant</str>
<str>présentera</str>
<str>présentait</str>
<str>présentes</str>
<str>présenter</str>
<str>présentée</str>
</arr>
</lst>
<str name="collation">content:présente</str>
</lst>
</lst>
</response>
And I did run exactly the same query after deploy solr.war in tomcat 7. Here
is my result:
<?xml version="1.0" encoding="UTF-8" ?>
<response>
<lst name="responseHeader">
<int name="status">0</int>
<int name="QTime">16</int>
</lst>
<result name="response" numFound="0" start="0" />
<lst name="spellcheck">
<lst name="suggestions">
<lst name="présenta">
<int name="numFound">10</int>
<int name="startOffset">8</int>
<int name="endOffset">16</int>
<arr name="suggestion">
<str>present</str>
<str>prbsent</str>
<str>presentant</str>
<str>presentait</str>
<str>puisent</str>
<str>pasent</str>
<str>pensent</str>
<str>posent</str>
<str>dresent</str>
<str>resenti</str>
</arr>
</lst>
<str name="collation">content:present</str>
</lst>
</lst>
</response>
As my application is running under tomcat, it means that I have some issue
with tomcat, but the weird stuff is that I already google it looking for a
fix and find out that we have to set up a parameter into server.xml tomcat
config file:
<Connector port="5443" protocol="HTTP/1.1"
connectionTimeout="20000"
redirectPort="8443"
URIEncoding="UTF-8" />
But it's not working as you "can see".
I'm feeling a little stupid because it doesn't look like a big problem. For
sure people around the world are using solr with accents queries running
under tomcat properly!
Thank you
--
View this message in context:
http://lucene.472066.n3.nabble.com/Accent-Characters-tp3985931p3986423.html
Sent from the Solr - User mailing list archive at Nabble.com.