Hi,

I got some problems using Xalan (and Xerces with the old org.apache.xml.serialize package as well) for serializing to the HTML format. It does *NOT* escape ampersands either as "&" or &" if it occurs in attributes designated to hold URLs, like the "href" attribute of the "a" element. Looking at the source code, it is clear that this is intentional. This puzzles me a lot. Due to a complaint of a customer I reviewed this issue and discovered that the HTML specifications clearly say that of course the ampersand, which is typically used to separate the form values, *MUST* be escaped in attributes containing URLs. I even discovered a respective note in the HTML 2.0 specification from the year 1995. Can anyone explain to me why this wrong handling exist and tell me whether this will be removed in future releases?

HTML 4.0: http://www.w3.org/TR/html401/appendix/notes.html section B.2.2
HTML 2.0: http://www.ietf.org/rfc/rfc1866.txt section 8.2.1 (page 46)

Sample:

test.xml:
- - - 8< - - -
<html>
<body>
<a href="a&amp;b" title="a&amp;b"/>
</body>
</html>
- - - 8< - - -

test.xsl:
- - - 8< - - -
<xsl:stylesheet
  version="1.0"
  xmlns:xsl="http://www.w3.org/1999/XSL/Transform";>

  <xsl:template match="/">
    <xsl:copy-of select="/"/>
  </xsl:template>

</xsl:stylesheet>
- - - 8< - - -

command arguments: -in test.xml -xsl test.xsl -HTML

output:
- - - 8< - - -
<html>

<body>

<a href="a&b" title="a&amp;b"></a>

</body>

</html>
- - - 8< - - -


Regards,

Klaus





Reply via email to