[jira] Created: (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

Henri Sivonen (JIRA) Wed, 02 Jan 2008 03:09:21 -0800

Astral characters written as a pair of NCRs with the surrogate scalar values 
when using UTF-8
---------------------------------------------------------------------------------------------


                 Key: XALANJ-2419
                 URL: https://issues.apache.org/jira/browse/XALANJ-2419
             Project: XalanJ2
          Issue Type: Bug
          Components: Serialization
    Affects Versions: 2.7.1
            Reporter: Henri Sivonen


org.apache.xml.serializer.ToStream contains the following code:
                    else if (m_encodingInfo.isInEncoding(ch)) {
                        // If the character is in the encoding, and
                        // not in the normal ASCII range, we also
                        // just leave it get added on to the clean characters
                        
                    }
                    else {
                        // This is a fallback plan, we should never get here
                        // but if the character wasn't previously handled
                        // (i.e. isn't in the encoding, etc.) then what
                        // should we do?  We choose to write out an entity
                        writeOutCleanChars(chars, i, lastDirtyCharProcessed);
                        writer.write("&#");
                        writer.write(Integer.toString(ch));
                        writer.write(';');
                        lastDirtyCharProcessed = i;
                    }

This leads to the wrong (latter) if branch running for surrogates, because 
isInEncoding() for UTF-8 returns false for surrogates. It is always wrong 
(regardless of encoding) to escape a surrogate as an NCR.

The practical effect of this bug is that any document with astral characters in 
it ends up in an ill-formed serialization and does not parse back using an XML 
parser.

-- 
This message is automatically generated by JIRA.
-
You can reply to this email to add a comment to the issue online.


---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

[jira] Created: (XALANJ-2419) Astral characters written as a pair of NCRs with the surrogate scalar values when using UTF-8

Reply via email to