Hi,
I got some problems using Xalan (and Xerces with the old
org.apache.xml.serialize package as well) for serializing to the HTML format. It
does *NOT* escape ampersands either as "&" or &" if it occurs in
attributes designated to hold URLs, like the "href" attribute of the "a"
element. Looking at the source code, it is clear that this is intentional. This
puzzles me a lot. Due to a complaint of a customer I reviewed this issue and
discovered that the HTML specifications clearly say that of course the
ampersand, which is typically used to separate the form values, *MUST* be
escaped in attributes containing URLs. I even discovered a respective note in
the HTML 2.0 specification from the year 1995. Can anyone explain to me why this
wrong handling exist and tell me whether this will be removed in future releases?
HTML 4.0: http://www.w3.org/TR/html401/appendix/notes.html section B.2.2
HTML 2.0: http://www.ietf.org/rfc/rfc1866.txt section 8.2.1 (page 46)
Sample:
test.xml:
- - - 8< - - -
<html>
<body>
<a href="a&b" title="a&b"/>
</body>
</html>
- - - 8< - - -
test.xsl:
- - - 8< - - -
<xsl:stylesheet
version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:copy-of select="/"/>
</xsl:template>
</xsl:stylesheet>
- - - 8< - - -
command arguments: -in test.xml -xsl test.xsl -HTML
output:
- - - 8< - - -
<html>
<body>
<a href="a&b" title="a&b"></a>
</body>
</html>
- - - 8< - - -
Regards,
Klaus