DO NOT REPLY [Bug 5443] New: - Characters outside the basic mutlilingual plane are not handled by XMLSerializer

bugzilla Sat, 15 Dec 2001 13:11:10 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5443>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5443

Characters outside the basic mutlilingual plane are not handled by XMLSerializer

           Summary: Characters outside the basic mutlilingual plane are not
                    handled by XMLSerializer
           Product: Xerces-J
           Version: 1.4.3
          Platform: All
        OS/Version: All
            Status: NEW
          Severity: Major
          Priority: Other
         Component: Serialization
        AssignedTo: [EMAIL PROTECTED]
        ReportedBy: [EMAIL PROTECTED]


AN XML document may contain characters from outside of Unicode's basic
multilingual plane; i.e. with code points > 65535. Such characters have been
defined in Unicode 3.1, and I just used my first one, and XMLSerializer promptly
broke.

However when a text node containing such a character is output in the ISO-8859-1
encoding using XMLSerializer, such a chaarcter is omitted as two entity
referneces, one for each half of the corresponding surrogate pair, rather than a
single entity reference. That is what should be &#x1D122; instead becomes
&#xd834;&#xdd22; This is malformed. These characters are not allowed in XML
documents. 

For the moment, a work-around is to use UTF-8 instead of ISO-8859-1. Then no
entity references at all are generated and the output is well-formed.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 5443] New: - Characters outside the basic mutlilingual plane are not handled by XMLSerializer

Reply via email to