Note: I used "16-bit string" in my sentence, NOT "Unicode 16-bit string" which I used in the later part of my sentence (but also including 8-bit and 32-bit for the same restrictions in "Unicode strings")... So no contradiction.
2015-05-09 7:55 GMT+02:00 Philippe Verdy <[email protected]>: > > > 2015-05-09 6:37 GMT+02:00 Markus Scherer <[email protected]>: > >> On Fri, May 8, 2015 at 9:13 PM, Philippe Verdy <[email protected]> >> wrote: >> >>> 2015-05-09 5:13 GMT+02:00 Richard Wordingham < >>> [email protected]>: >>> >>>> I can't think of a practical use for the specific concepts of Unicode >>>> 8-bit, 16-bit and 32-bit strings. Unicode 16-bit strings are >>>> essentially the same as 16-bit strings, and Unicode 32-bit strings are >>>> UTF-32 strings. 'Unicode 8-bit string' strikes me as an exercise in >>>> pedantry; there are more useful categories of 8-bit strings that are >>>> not UTF-8 strings. >>>> >>> >>> And here you're wrong: a 16-bit string is just a sequence of arbitrary >>> 16-bit code units, but an Unicode string (whatever the size of its code >>> units) adds restrictions for validity (the only restriction being in fact >>> that surrogates (when present in 16-bit strings, i.e. UTF-16) must be >>> paired, and in 32-bit (UTF-32) and 8-bit (UTF-8) strings, surrogates are >>> forbidden. >>> >> >> No, Richard had it right. See for example definition D82 "Unicode 16-bit >> string" in the standard. (Section 3.9 Unicode Encoding Forms, >> http://www.unicode.org/versions/Unicode7.0.0/ch03.pdf) >> > > I was right, D82 refers to "UTF-16", which implies the restriction of > validity, i.e. NO isolated/unpaired surrogates,(but no exclusion of > non-characters). > > I was right, You and Richard were wrong. > >

