Mark Leisher <[EMAIL PROTECTED]> writes:
> Nick> Following the first page will be all the other pages, each in the
> Nick> same format as the first: one number identifying the page followed
> Nick> by 256 double-byte Unicode characters. If a character in the
> Nick> encoding maps to the Unicode character 0000, it means that the
> Nick> character doesn't actually exist. If all characters on a page would
> Nick> map to 0000, that page can be omitted.
>
>There may some day be a use for the Unicode codepoint 0x0000. It might be
>better to make this 0xFFFF, which is a guaranteed non-character in Unicode and
>probably in ISO10646.
Documentation not withstanding, the original Tcl C code does permit the
Unicode code point 0x0000 to exist iff in the 0 slot of of the other encoding.
e.g. ASCII NUL is mapped to it.
I made at least an attempt at this in the OO perl stuff as well.
0x0000 has a nice C/perl "falseness" which 0xFFFF lacks - but as we don't
use the .enc tables directly anyway using 0xFFFF and converting to C<undef>
at load time (Tcl does not have undef equivalent) would be a reasonable
approach.
--
Nick Ing-Simmons