Mike Newhall wrote:
> 1. What are "plane 14 language tags"?

they are a mirror set of the ascii graphic characters to what iso 10646 calls plane 
14. plane 14 means for unicoders that the code points are from U+e0000 to U+effff - 
note that hex 'e' is decimal 14. these characters are used for tagging when you don't 
have markup or out-of-band information, and are intended for language tags like 
"de-AT". the ietf wanted such in-band info, while most unicoders don't like them...

> 2. Just out of historical curiosity, what is a codepage / how did the name
> / numbers originate?  I have over the years used the inferred definition
> that it is an 8-bit character set selected by a number, but...

it can be double-byte, mixed-byte, etc., up to four bytes/char.

every company and standards organization basically mapped their favorite sets of 
characters to their favorite byte combinations, and each such mapping or association 
is called a codepage. i am not sure how to compare it with modern terminology, but i 
believe that most people see it equivalent to "charset" or "character encoding 
form/scheme", sometimes only "coded character set" (see rfc's and the unicode tech 
report about the character encoding model).

>         - Is this really the complete and accurate definition of what a 'codepage'
> is?
>         - Are these #'s always OS-specific, or sometimes standardized?

ibm has a list, microsoft has one, apple has one, the iana list has both names and 
"MIB enums", MIME has a list, ...
sometimes the same number means something more or less related, but without real 
coordination, and other times the same number is something entirely different. this 
basically makes it hard to exchange text in many of them.

unicode was created to be so well-defined and all-appropriate that we don't need 
"legacy" codepages any more, but the above institutions (except microsoft?) continue 
to create new ones...

>         - Is there a rhyme or reason to the number assignments?  It seems that
> they were not assigned in sequence, unless each OS has hundreds or
> thousands of code pages.

each organization has at least dozens, if not hundreds or thousands :-)
i don't know a particular reason for the number values. the ibm values need to fit 
into 16 bits with a few reserved values.

>         - Where did the term originate?  It seems to have a hardware flavor, as if
> an old piece of display hardware had selectable ROMed fonts.

i am guessing it is from printed manuals?

> Mike Newhall
> AltaVista

markus scherer
ibm (icu)

Reply via email to