DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5443>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=5443 Characters outside the basic mutlilingual plane are not handled by XMLSerializer Summary: Characters outside the basic mutlilingual plane are not handled by XMLSerializer Product: Xerces-J Version: 1.4.3 Platform: All OS/Version: All Status: NEW Severity: Major Priority: Other Component: Serialization AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] AN XML document may contain characters from outside of Unicode's basic multilingual plane; i.e. with code points > 65535. Such characters have been defined in Unicode 3.1, and I just used my first one, and XMLSerializer promptly broke. However when a text node containing such a character is output in the ISO-8859-1 encoding using XMLSerializer, such a chaarcter is omitted as two entity referneces, one for each half of the corresponding surrogate pair, rather than a single entity reference. That is what should be 𝄢 instead becomes �� This is malformed. These characters are not allowed in XML documents. For the moment, a work-around is to use UTF-8 instead of ISO-8859-1. Then no entity references at all are generated and the output is well-formed. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
