On Tue, 03 Jun 2014 15:06:30 -0700 Xueming Shen <[email protected]> wrote:
> On 06/02/2014 01:01 PM, Richard Wordingham wrote: > > On Mon, 2 Jun 2014 11:29:09 +0200 > > Mark Davis ☕️<[email protected]> wrote: > > > >>> \uD808\uDF45 specifies a sequence of two codepoints. > >> That is simply incorrect. > > The above is in the sample notation of UTS #18 Version 17 Section > > 1.1. > > > > From what I can make out, the corresponding Java notation would be > > \x{D808}\x{DF45}. I don't *know* what \x{D808} and \x{DF45} match > > in Java, or whether they are even acceptable. The only thing UTS > > #18 RL1.7 permits them to match in Java is lone surrogates, but I > > don't know if Java complies. > > The notation for "\uD808\uDF45" is interpreted as a supplementary > codepoint and is represent internally as a pair of surrogates in > String. > > Pattern.compile("\\x{D808}\\x{DF45}").matcher("\ud808\udf45").find()); > -> false > Pattern.compile("\uD808\uDF45").matcher("\ud808\udf45").find()); > -> true > Pattern.compile("\\x{D808}").matcher("\ud808\udf45").find()); > -> false > Pattern.compile("\\x{D808}").matcher("\ud808_\udf45").find()); > -> true Thank you for providing examples confirming that what in the UTS #18 *sample* notation would be written \uD808\uDF45, i.e. \x{D808}\x{DF45} in Java notation, matches nothing in any 16-bit Unicode string. Richard. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

