DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6082>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=6082 Many encodings are broken in Xerces Summary: Many encodings are broken in Xerces Product: Xerces-J Version: 1.4.4 Platform: All OS/Version: Other Status: NEW Severity: Critical Priority: Other Component: Serialization AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] The ISO-8859-n (n>1) encodings are broken because the "lastPrintable" character is set to 0xFF, when it should be set to 0x7F (see Encodings.java, line 110- 118). The Windows-31J encoding is broken because the Java encoder is broken. It cannot correctly round-trip several characters. The characters that it cannot round-trip are: 0xa2 0xa3 0xa5 0xab 0xac 0xaf 0xb5 0xb7 0xb8 0xbb 0x203e 0x3094 The reason is because the same encoded byte patterns are used by different code points: The byte pattern (92,) is used by: 5c, a5 The byte pattern (126,) is used by: 7e, 203e The byte pattern (-127, -111) is used by: a2, ffe0 The byte pattern (-127, -110) is used by: a3, ffe1 The byte pattern (-127, -31) is used by: ab, 226a The byte pattern (-127, -54) is used by: ac, ffe2 The byte pattern (-127, 80) is used by: af, ffe3 The byte pattern (-125, -54) is used by: b5, 3bc The byte pattern (-127, 69) is used by: b7, 30fb The byte pattern (-127, 67) is used by: b8, ff0c The byte pattern (-127, -30) is used by: bb, 226b The byte pattern (-125, -108) is used by: 3094, 30f4 You can fix this by adding these characters to the "JIS_DANGER_CHARS" (Encodings.java, line 99) or by creating a new list of danager characters just for Windows-31J. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
