DO NOT REPLY TO THIS EMAIL, BUT PLEASE POST YOUR BUG RELATED COMMENTS THROUGH THE WEB INTERFACE AVAILABLE AT <http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7806>. ANY REPLY MADE TO THIS MESSAGE WILL NOT BE COLLECTED AND INSERTED IN THE BUG DATABASE.
http://nagoya.apache.org/bugzilla/show_bug.cgi?id=7806 Non-BMP Unicode block names in regexes Summary: Non-BMP Unicode block names in regexes Product: Xerces2-J Version: 2.0.0 Platform: Other OS/Version: Other Status: NEW Severity: Normal Priority: Other Component: XML Schema datatypes AssignedTo: [EMAIL PROTECTED] ReportedBy: [EMAIL PROTECTED] There's a bug with handling the Unicode block names that are outside the BMP (i.e. with codes > 0xFFFF). Something like \p{IsGothic} doesn't work as it should. The bug is in org.apache.xerces.impl.xpath.regex.Token. In the declaration of blockNames, there's a comment: //missing Specials add manually But it doesn't do this. The blockRanges string includes things like \u10300\u1032F which is completely bogus, since \u only takes 4 hex digits. The fix is to add a table of non-BMP block ranges static final int[] nonBmpBlockRanges = { 0x10330, 0x1032F, ... }; Then in Token.getRange(), do addRange for each of the ranges in nonBmpBlockRanges. --------------------------------------------------------------------- To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED]
