PATCH: demonstrate correct handling of UTF-8 encoded input documents
--------------------------------------------------------------------

                 Key: SOLR-38
                 URL: http://issues.apache.org/jira/browse/SOLR-38
             Project: Solr
          Issue Type: Improvement
          Components: update
            Reporter: Bertrand Delacretaz
            Priority: Minor


Here's an UTF-8 example with accented chars that can go in example/exampledocs, 
to demonstrate correct handling of accented chars.

After posting this to SOLR, searching for "êâîôû" from 
http://localhost:8983/solr/admin/ correctly finds this document.

Needs a small patch to example/exampledocs/post.sh (enclosed below), to 
specifiy the encoding for the POST. 

The XML pull parser seems to be able to handle the encoding declaration 
correctly, but if the encoding is not specified in the POST, the servlet 
container might get in the way (Jetty does with the current configuration).

Index: example/exampledocs/post.sh
===================================================================
--- example/exampledocs/post.sh (revision 424529)
+++ example/exampledocs/post.sh (working copy)
@@ -4,7 +4,7 @@
 
 for f in $FILES; do
   echo Posting file $f to $URL
-  curl $URL --data-binary @$f
+  curl $URL --data-binary @$f -H 'Content-type:text/xml; charset=utf-8'
   echo
 done
  

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to