On Thu, 12 Jun 2014 01:37:49 -0700 Markus Scherer <[email protected]> wrote:
> On Wed, Jun 11, 2014 at 9:29 PM, Karl Williamson > <[email protected]> wrote: > > The FAQ mentions using 0x7FFFFFFF as a possible sentinel. I did not > > realize that that was considered representable in any UTF. > > Likewise -1. > No, and that's the point of using those. Integer values that are not > code points make for great sentinels in API functions, such as a > next() iterator returning -1 when there is no next character. They work fine as alternatives to scalar values. They don't work so well in 8-bit and 16-bit Unicode strings. A general purpose routine extracting scalar values from Unicode strings is likely to treat them as errors rather than just returning the scalar value as it would for a non-character. The only way to use them directly in 8- and 16-bit Unicode strings is to deliberately create ill-formed Unicode strings. Thus, these 'sentinels' are not full blown sentinels like U+0000 in the C conventions for 'strings', as opposed to arrays of char. There is a get-out clause - just never accept that a Unicode string is purported to be in a Unicode character encoding form. Richard. _______________________________________________ Unicode mailing list [email protected] http://unicode.org/mailman/listinfo/unicode

