Andreas Krantz created XERCESC-2130:
---------------------------------------

             Summary: UTF16 Surrgate values 0xD800-0xDFFF can not longer be 
written with xerces 3.2.0
                 Key: XERCESC-2130
                 URL: https://issues.apache.org/jira/browse/XERCESC-2130
             Project: Xerces-C++
          Issue Type: Bug
          Components: DOM
    Affects Versions: 3.2.0
            Reporter: Andreas Krantz
            Priority: Critical
         Attachments: reproduce.cpp

Solution for XERCESC-1854 introduced method
{{DOMLSSerializerImpl::ensureValidString}}
which has an error in validation. 
The method validates XMLCh which represent UTF16.

[Valid Characters|https://www.w3.org/TR/REC-xml/#NT-Char] #x9 | #xA | #xD | 
[#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF]
are the valid UTF32 characters.

The UTF16 surrogate range from xD800 - xDFFF is used to represent 
[#x10000-#x10FFFF] and should not be handled as nvalid.

*The reader threads this correctly and does not complain, which leads to an 
asmetric behavior*

Reading DOM => OK
Save back DOM => Exception

I tried to attach an example to show the behavior.

The used methods
{{bool XMLChar1_1::isXMLChar(const XMLCh toCheck, const XMLCh toCheck2)}}
already have a second optional parameter to check surrogate values.




--
This message was sent by Atlassian JIRA
(v6.4.14#64029)

---------------------------------------------------------------------
To unsubscribe, e-mail: c-dev-unsubscr...@xerces.apache.org
For additional commands, e-mail: c-dev-h...@xerces.apache.org

Reply via email to