mauro castelli wrote:
David,
thank you very much for your precious hints and sorry if my question was not clear.

I don't know what "xalan-SAX" is.  Can you be more specific?  If you mean
using the class >"XalanDocumentBuilder," then you can send events that
contains Unicode characters that would be illegal in XML 1.0, but legal in
XML 1.1.
Yes, I meant "using XalanDocumentBuilder" and you hit the centre of the question. I tried to introduce a single Unicode control character in the DocumentBuilder sample
in this way:
     [...]
       theAttributes.clear();
       theAttributeName = XALAN_STATIC_UCODE_STRING("attribute2");
       // I replaced the string "value2" with ""
       theAttributeValue = XALAN_STATIC_UCODE_STRING("");
theAttributes.addAttribute(c_wstr(theAttributeName), c_wstr(theAttributeType), c_wstr(theAttributeValue));
       [...]


This adds an attribute with a value that has 7 characters: "&" "#" "x" "0" "0" "4" ";". This is not what you wanted, I expect.

You probably want to do something like this:

const XalanDOMChar  val[] = { 4, 0 };

theAttributes.addAttribute(
    theAttributeName.c_wstr(),
    theAttributeType.c_wstr(),
    val);

You are assuming that if you create an attribute value that looks like a numeric character reference, that you'll get the character specified, but you won't.

Applying a xslt such as
  <?xml version="1.1" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"; version="1.0">
      <xsl:output version="1.1"/>
       <xsl:template match="node() | @*">
               <xsl:copy>
               <xsl:apply-templates select="node() | @*"/>
            </xsl:copy>
       </xsl:template>
 </xsl:stylesheet>

the output presents attribute2 with a string valued as "&amp;#x004;"
instead of the corresponding control character.
The same behaviour if I use "&#000;" even if I would expect an error, since that
is the only unsupported unicode control character in XML 1.1.

That is the expected result, since the value of the attribute is the not the character U+0004.

You should be very careful doing this sort of thing. Xalan-C expects that the source tree you give it has been properly constructed. That means it cannot contain any invalid characters for the version of XML you mean to support.

Dave

Reply via email to