[
https://issues.apache.org/jira/browse/SOLR-443?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12606615#action_12606615
]
Lars Kotthoff commented on SOLR-443:
------------------------------------
{quote}
Did you try setting URIEncoding="UTF-8" on the connector?
Without that, you can't even correctly do a query that contains international
chars.
{quote}
Yes. A lot of the queries I issue are in Japanese ;)
I should add that I'm using the debian flavour of Tomcat, the exact version
number is 5.5.26-3. I don't know whether this version is patched in a way that
affects this, but the Tomcat documentation
([http://tomcat.apache.org/tomcat-5.5-doc/config/http.html]) specifically
mentions decoding the *URL* for that setting. That may or may not be
intentional, but I'm pretty sure that the behaviour you're seeing is
"accidental".
As for the NPE, it occurs when a request for facet counts returns something for
a facet value which wasn't in the request. I think that it should only be
handled more gracefully to the extent of giving a more meaningful error
message. But there's no need to if the underlying issue is fixed :)
> POST queries don't declare its charset
> --------------------------------------
>
> Key: SOLR-443
> URL: https://issues.apache.org/jira/browse/SOLR-443
> Project: Solr
> Issue Type: Bug
> Components: clients - java
> Affects Versions: 1.2
> Environment: Tomcat 6.0.14
> Reporter: Andrew Schurman
> Priority: Minor
> Attachments: SOLR-443-multipart.patch, solr-443.patch,
> solr-443.patch, SolrDispatchFilter.patch
>
>
> When sending a query via POST, the content-type is not set. The content
> charset for the POST parameters are set, but this only appears to be used for
> creating the Content-Length header in the commons library. Since a query is
> encoded in UTF-8, the http headers should also specify content type charset.
> On Tomcat, this causes problems when the query string contains non-ascii
> characters (characters with accents and such) as it tries to parse the POST
> body in its default ISO-9886-1. There appears to be no way to set/change the
> default encoding for a message body on Tomcat.
--
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.