On 8 June 2011 11:47, Daniel Veillard <veill...@redhat.com> wrote: > On Tue, Jun 07, 2011 at 11:35:10PM +0100, Laurence Rowe wrote: >> I've found that libxml2/libxslt will quote carriage return characters >> in output is . When outputting (X)HTML this causes rendering >> errors on at least Firefox, as the html: >> "<pre>Line1 \nLine2</pre>" is rendered differently to the html >> "<pre>Line1\r\nLine2</pre>". >> >> This is a problem because some of our page content originates from >> browser text areas, and as such the text is submitted to the server >> with CRLF line endings. We use libxslt in front of several systems, >> not all of which we control. It's not really practical to change the >> application behaviour here. >> >> Is it possible to switch off this quoting behaviour on serialization? > > For XML, no, the reason is here: > > http://www.w3.org/TR/REC-xml/#sec-line-ends > > If an XML parser finds \r\n in the input it automatically remove the > first character. XHTML being an XML language it should behave the same. > > If libxml2 sees \r\n sequence in an XML text node, then it assume the > user wants its data back intact after XML parsing of the output. Which > is why it outputs \n to avoid the \r from being stripped when the > consuming XML parser(s) will find the sequence. > > Maybe your data didn't come from XML parsing, but really we can't avoid > this a priori in libxml2 serializer (or we would have to extend the > xmlsave APIs to allow this specifically, but anyway XSLT output should > not use the libxml2 serialization directly but the libxslt ones - unless > you really know what you are doing)
Thanks, that helped me to understand what was going on. I'm seeing the interaction of the HTMLParser and XSLT method="xml", the HTMLParser does not perform the same substitution of '\r\n' -> '\n' as the XMLParser. I can reproduce it using xsltproc (see below). I can perform the string replacement myself before feeding data to the HTMLParser, though I guess it would be more efficient to make this an HTMLParser option. $ cat identity.xsl <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"> <xsl:output method="xml" omit-xml-declaration="yes" doctype-system="http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd" doctype-public="-//W3C//DTD XHTML 1.0 Transitional//EN"/> <xsl:template match="@*|node()"> <xsl:copy> <xsl:apply-templates select="@*|node()"/> </xsl:copy> </xsl:template> </xsl:stylesheet> $ cat in.html # <pre> text is "Line1\r\nLine2" <html> <body> <pre>Line1 Line2</pre> </body> </html> $ xsltproc --html identity.xsl in.html <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"><body> <pre>Line1 Line2</pre> </body></html> Laurence _______________________________________________ xml mailing list, project page http://xmlsoft.org/ xml@gnome.org http://mail.gnome.org/mailman/listinfo/xml