On 8 June 2011 11:47, Daniel Veillard <veill...@redhat.com> wrote:
> On Tue, Jun 07, 2011 at 11:35:10PM +0100, Laurence Rowe wrote:
>> I've found that libxml2/libxslt will quote carriage return characters
>> in output is &#13;. When outputting (X)HTML this causes rendering
>> errors on at least Firefox, as the html:
>> "<pre>Line1&#13;\nLine2</pre>" is rendered differently to the html
>> "<pre>Line1\r\nLine2</pre>".
>>
>> This is a problem because some of our page content originates from
>> browser text areas, and as such the text is submitted to the server
>> with CRLF line endings. We use libxslt in front of several systems,
>> not all of which we control. It's not really practical to change the
>> application behaviour here.
>>
>> Is it possible to switch off this quoting behaviour on serialization?
>
>  For XML, no, the reason is here:
>
> http://www.w3.org/TR/REC-xml/#sec-line-ends
>
> If an XML parser finds \r\n in the input it automatically remove the
> first character. XHTML being an XML language it should behave the same.
>
> If libxml2 sees \r\n sequence in an XML text node, then it assume the
> user wants its data back intact after XML parsing of the output. Which
> is why it outputs &#13;\n to avoid the \r from being stripped when the
> consuming XML parser(s) will find the sequence.
>
> Maybe your data didn't come from XML parsing, but really we can't avoid
> this a priori in libxml2 serializer (or we would have to extend the
> xmlsave APIs to allow this specifically, but anyway XSLT output should
> not use the libxml2 serialization directly but the libxslt ones - unless
> you really know what you are doing)

Thanks, that helped me to understand what was going on. I'm seeing the
interaction of the HTMLParser and XSLT method="xml", the HTMLParser
does not perform the same substitution of '\r\n' -> '\n' as the
XMLParser. I can reproduce it using xsltproc (see below).

I can perform the string replacement myself before feeding data to the
HTMLParser, though I guess it would be more efficient to make this an
HTMLParser option.

$ cat identity.xsl
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>
    <xsl:output method="xml" omit-xml-declaration="yes"
        doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";
        doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"/>
    <xsl:template match="@*|node()">
        <xsl:copy>
            <xsl:apply-templates select="@*|node()"/>
        </xsl:copy>
    </xsl:template>
</xsl:stylesheet>

$ cat in.html  # <pre> text is "Line1\r\nLine2"
<html>
<body>
<pre>Line1
Line2</pre>
</body>
</html>

$ xsltproc --html identity.xsl in.html
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd";>
<html xmlns="http://www.w3.org/1999/xhtml";><body>
<pre>Line1&#13;
Line2</pre>
</body></html>


Laurence
_______________________________________________
xml mailing list, project page  http://xmlsoft.org/
xml@gnome.org
http://mail.gnome.org/mailman/listinfo/xml

Reply via email to