PLEASE DO NOT REPLY TO THIS MESSAGE. TO FURTHER COMMENT ON THE STATUS OF THIS BUG PLEASE FOLLOW THE LINK BELOW AND USE THE ON-LINE APPLICATION. REPLYING TO THIS MESSAGE DOES NOT UPDATE THE DATABASE, AND SO YOUR COMMENT WILL BE LOST SOMEWHERE. http://nagoya.apache.org/bugzilla/show_bug.cgi?id=3303 *** shadow/3303 Tue Aug 28 06:17:49 2001 --- shadow/3303.tmp.7127 Tue Aug 28 06:17:49 2001 *************** *** 0 **** --- 1,44 ---- + +============================================================================+ + | Unicode 3.0 character \\uFFFD | + +----------------------------------------------------------------------------+ + | Bug #: 3303 Product: Regexp | + | Status: NEW Version: unspecified | + | Resolution: Platform: PC | + | Severity: Minor OS/Version: Windows NT/2K | + | Priority: Other Component: Other | + +----------------------------------------------------------------------------+ + | Assigned To: [EMAIL PROTECTED] | + | Reported By: [EMAIL PROTECTED] | + | CC list: Cc: | + +----------------------------------------------------------------------------+ + | URL: | + +============================================================================+ + | DESCRIPTION | + http://www.unicode.org/Public/3.0-Update/UnicodeData-3.0.0.txt: + >FFFD;REPLACEMENT CHARACTER;So;0;ON;;;;;N;;;;; + + For some reason when the above character is in any regex character class it + causes a RESyntaxException with description 'Bad Character Class'. I attempted + to use it in the following context: + + private static String XMLescape(String s) + throws RESyntaxException + { + if (s==null) return s; + if (s.length() == 0) return s; + + // XML 1.0 standard actually says: + // Char ::= #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | + [#x10000-10FFFF] + // For some reason this library doesn't like the Unicode character + \\uFFFD. + RE r = new RE("[^\\u0009\\u0010\\u0013\\u0020-\\uD7FF\\uE000-\\uFFFC]"); + + return r.subst(s, ""); + } + + I'm using the JRE Standard Edition 3.0. + + Regards, + + Tasuki.