[ http://issues.apache.org/jira/browse/SOLR-32?page=all ]

Yonik Seeley resolved SOLR-32.
------------------------------

    Resolution: Fixed
      Assignee: Yonik Seeley

Yes, we had been having problems all along with Jetty and it's UTF-8 writer.
I just committed this (correctness before performance...)
Thanks for tracking down the problem!

> Result of select request is not well-formed XML when text field contains 
> non-ASCII chars and ampersand
> ------------------------------------------------------------------------------------------------------
>
>                 Key: SOLR-32
>                 URL: http://issues.apache.org/jira/browse/SOLR-32
>             Project: Solr
>          Issue Type: Bug
>          Components: search
>         Environment: Seen when running with the supplied Jetty container, 
> macosx, JDK  1.5.0_06
>            Reporter: Bertrand Delacretaz
>         Assigned To: Yonik Seeley
>
> Starting with the supplied start.jar, the ampersand from this field is not 
> correctly escaped in the XML search results provided by the select page:
> <?xml version="1.0" encoding="UTF-8"?>
> <add>
>   <doc>
>     <field name="id">amp-test-one</field>
>     <field name="content">Les événements chez Bonnie &amp; Clyde.</field>
>   </doc>
> </add>
> </stuff>
> The "content" field is defined as a "text" field in the schema.
> Adding this document to the index and querying on "id:amp-test-one" returns
> ...
>  <doc>
>   <str name="content">Les événements chez Bonnie & Clyde.&amp; Clyde.</str>
>   <str name="id">amp-test-one</str>
>  </doc>
> With first "Bonnie & Clyde" unescaped and then the correct escaped &amp;
> Browsing the index with Luke shows that the field is correctly stored.
> I think this might be a Jetty bug: patching the util/XML class of SOLR to 
> avoid the use of Writer.write(String,start,len) fixes the problem. Maybe the 
> Jetty ServletWriter gets confused by the presence of non-ascii chars?
> Here are my changes in util/XML.java. It looks like the class did use 
> String.substring(...) before, Writer.write might be faster but it seems like 
> it's broken in that environment.
> Here are my patches to util/XML.java:
> Index: src/java/org/apache/solr/util/XML.java
> ===================================================================
> --- src/java/org/apache/solr/util/XML.java      (revision 422655)
> +++ src/java/org/apache/solr/util/XML.java      (working copy)
> @@ -159,8 +159,8 @@
>        }
>        if (subst != null) {
>          if (start<i) {
> -          // out.write(str.substring(start,i));
> -          out.write(str, start, i-start);
> +          out.write(str.substring(start,i));
> +          // out.write(str, start, i-start);
>            // n+=i-start;
>          }
>          out.write(subst);
> @@ -172,8 +172,8 @@
>        out.write(str);
>        // n += str.length();
>      } else if (start<str.length()) {
> -      // out.write(str.substring(start));
> -      out.write(str, start, str.length()-start);
> +      out.write(str.substring(start));
> +      // out.write(str, start, str.length()-start);
>        // n += str.length()-start;
>      }
>      // return n;

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira


Reply via email to