Is the number of codepoints in a UTF-16 string well defined? For example, which of the following two statements are true?
(a) The ill-formed three code-unit Unicode 16-bit string <0xDC00, 0xD800, 0xDC20> contains two codepoints, U+DC00 and U+10020. (b) The ill-formed three code-unit Unicode 16-bit string <0xDC00, 0xD800, 0xDC20> contains three codepoints, U+DC00, U+D800 and U+DC20. Statement (a) is probably more useful, but I couldn't find anything to rule that statement (b) is false. Richard.

