[ http://issues.apache.org/jira/browse/SOLR-38?page=all ]

Yonik Seeley resolved SOLR-38.
------------------------------

    Resolution: Fixed

Committed.  Thanks Bertrand!

Yes, the encoding right now is controlled by the content-type header, not any 
possible XML charset declaration in the XML itself.  The servlet asks for a 
Reader, not for an InputStream, so we get chars that have already been decoded.

Maybe a future enhancement would use a Reader if content-type were specified, 
otherwise use an InputStream and let the XML parser try and figure out the 
encoding?



> PATCH: demonstrate correct handling of UTF-8 encoded input documents
> --------------------------------------------------------------------
>
>                 Key: SOLR-38
>                 URL: http://issues.apache.org/jira/browse/SOLR-38
>             Project: Solr
>          Issue Type: Improvement
>          Components: update
>            Reporter: Bertrand Delacretaz
>            Priority: Minor
>         Attachments: utf8-example.xml
>
>
> Here's an UTF-8 example with accented chars that can go in 
> example/exampledocs, to demonstrate correct handling of accented chars.
> After posting this to SOLR, searching for "êâîôû" from 
> http://localhost:8983/solr/admin/ correctly finds this document.
> Needs a small patch to example/exampledocs/post.sh (enclosed below), to 
> specifiy the encoding for the POST. 
> The XML pull parser seems to be able to handle the encoding declaration 
> correctly, but if the encoding is not specified in the POST, the servlet 
> container might get in the way (Jetty does with the current configuration).
> Index: example/exampledocs/post.sh
> ===================================================================
> --- example/exampledocs/post.sh (revision 424529)
> +++ example/exampledocs/post.sh (working copy)
> @@ -4,7 +4,7 @@
>  
>  for f in $FILES; do
>    echo Posting file $f to $URL
> -  curl $URL --data-binary @$f
> +  curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
>    echo
>  done
>   

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to