DO NOT REPLY [Bug 27083] - Four byte UTF-8 encodings can encode UCS-4 characters which are beyond the range of legal XML characters (and can't be expressed in Unicode surrogate pairs).

bugzilla Fri, 05 Mar 2004 14:31:04 -0800

DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG 
RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT
<http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27083>.
ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND 
INSERTED IN THE BUG DATABASE.


http://nagoya.apache.org/bugzilla/show_bug.cgi?id=27083

Four byte UTF-8 encodings can encode UCS-4 characters which are beyond the range of 
legal XML characters (and can't be expressed in Unicode surrogate pairs).

[EMAIL PROTECTED] changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|NEW                         |RESOLVED
         Resolution|                            |FIXED



------- Additional Comments From [EMAIL PROTECTED]  2004-03-05 22:31 -------
The 2.6.1 parser does detect such out of band characters. It was allowing the 
IOException thrown by the reader to propogate up the call stack. Fixing Bug 
#27422 also fixed this behaviour. Now when the parser detects malformed UTF-8 
byte sequences it is reported to the error handler.

---------------------------------------------------------------------
To unsubscribe, e-mail: [EMAIL PROTECTED]
For additional commands, e-mail: [EMAIL PROTECTED]

DO NOT REPLY [Bug 27083] - Four byte UTF-8 encodings can encode UCS-4 characters which are beyond the range of legal XML characters (and can't be expressed in Unicode surrogate pairs).

Reply via email to