On Sat, 9 May 2015 15:11:51 +0200 Philippe Verdy <[email protected]> wrote:
> Except that you are explaining something else. You are speaking about > "Unicode strings" which are bound to a given UTF, I was speaking ONLY > about "16-bit strings" which were NOT bound to Unicode (and did not > have to). So TUS is compeltely not relevant here I have NOT written > "Unicode 16-bit strings", only "16-bit strings" and I clearly opposed > the two DISTINCT concepts in the SAME sentence so that no confusion > was possible. The long sentence of yours I am responding to is: "And here you're wrong: a 16-bit string is just a sequence of arbitrary 16-bit code units, but an Unicode string (whatever the size of its code units) adds restrictions for validity (the only restriction being in fact that surrogates (when present in 16-bit strings, i.e. UTF-16) must be paired, and in 32-bit (UTF-32) and 8-bit (UTF-8) strings, surrogates are forbidden." The point I made is that every string of 16-bit values is (valid as) a Unicode string. Do you accept that? If not, please exhibit a counter-example. In particular, I claim that all 6 permutations of <D800, 0054, DCC1> are Unicode strings, but that only two, namely <D800, DCC1, 0054> and <0054, D800, DCC1>, are UTF-16 strings. Richard.

