[ http://issues.apache.org/jira/browse/SOLR-38?page=all ]
Yonik Seeley resolved SOLR-38.
------------------------------
Resolution: Fixed
Committed. Thanks Bertrand!
Yes, the encoding right now is controlled by the content-type header, not any
possible XML charset declaration in the XML itself. The servlet asks for a
Reader, not for an InputStream, so we get chars that have already been decoded.
Maybe a future enhancement would use a Reader if content-type were specified,
otherwise use an InputStream and let the XML parser try and figure out the
encoding?
> PATCH: demonstrate correct handling of UTF-8 encoded input documents
> --------------------------------------------------------------------
>
> Key: SOLR-38
> URL: http://issues.apache.org/jira/browse/SOLR-38
> Project: Solr
> Issue Type: Improvement
> Components: update
> Reporter: Bertrand Delacretaz
> Priority: Minor
> Attachments: utf8-example.xml
>
>
> Here's an UTF-8 example with accented chars that can go in
> example/exampledocs, to demonstrate correct handling of accented chars.
> After posting this to SOLR, searching for "êâîôû" from
> http://localhost:8983/solr/admin/ correctly finds this document.
> Needs a small patch to example/exampledocs/post.sh (enclosed below), to
> specifiy the encoding for the POST.
> The XML pull parser seems to be able to handle the encoding declaration
> correctly, but if the encoding is not specified in the POST, the servlet
> container might get in the way (Jetty does with the current configuration).
> Index: example/exampledocs/post.sh
> ===================================================================
> --- example/exampledocs/post.sh (revision 424529)
> +++ example/exampledocs/post.sh (working copy)
> @@ -4,7 +4,7 @@
>
> for f in $FILES; do
> echo Posting file $f to $URL
> - curl $URL --data-binary @$f
> + curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
> echo
> done
>
--
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators:
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira