Philippe Verdy wrote:

> Actually not all U+E0020 through U+E007E are "un-deprecated" for this
> use.

Characters in Unicode are not "deprecated" for some purposes and not for 
others. "Deprecated" is a clearly defined property in Unicode. The only 
reference that matters here is in PropList.txt:

E0000         ; Other_Default_Ignorable_Code_Point # Cn       <reserved-E0000>
E0001         ; Deprecated # Cf       LANGUAGE TAG
E0002..E001F  ; Other_Default_Ignorable_Code_Point # Cn  [30] 
<reserved-E0002>..<reserved-E001F>
E0020..E007F  ; Other_Grapheme_Extend # Cf  [96] TAG SPACE..CANCEL TAG
E0080..E00FF  ; Other_Default_Ignorable_Code_Point # Cn [128] 
<reserved-E0080>..<reserved-E00FF>

Note carefully that the code point marked "Deprecated" is deprecated, and the 
others listed here are not. (My earlier post saying that U+E007F was still 
deprecated was incorrect, as Andrew noted.)

> For now emoji flags only use:
> - U+E0041 through U+E005A (mapping to ASCII letters A through Z used
> in 2-letter ISO3166-1 codes). These are usable in pairs, without
> requiring any modifier (and only for ISO3166-1 registered codes).

Section C.1 of UTS #51 says otherwise:

tag_base    U+1F3F4 BLACK FLAG
tag_spec    (U+E0030 TAG DIGIT ZERO .. U+E0039 TAG DIGIT NINE,
            U+E0061 TAG LATIN SMALL LETTER A .. U+E007A TAG LATIN SMALL LETTER 
Z)+

Emoji flags use lowercase tag letters, not uppercase, and may also use digits. 
The digits are for CLDR subdivision IDs containing ISO 3166-2 code elements 
that happen to be numeric, and there are plenty of these. For example, "fr75" 
is the subdivision ID for Paris. Almost all ISO 3166-2 code elements in France 
are numeric.

> - I think that U+0030 through U+E0039 (mapping to ASCII digits 0
> through 9) are reserved for ISO3166 extensions, started with only the
> 3 "countries" added in the United Kingdom ("ENENG", "ENSCO" and
> "ENWLS"), with possible pending additions for other ISO3166-2, but not
> mapping any dash separator).

There is no top-level country "EN", and if there were, I doubt Scotland and 
Wales would be enthusiastic to be considered part of it.

In any case, "gbeng" and "gbsco" and "gbwls" are merely the only subdivision 
IDs that are designated "RGI," or "recommended for general interchange," in 
CLDR. Any other subdivision ID can be used in a flag tag sequence, although the 
lack of RGI designation may cause vendors to think the sequence is "recommended 
against" and not support it in fonts.

As shown above, tag digits are not reserved for "ISO 3166 extensions" (possibly 
implying ISO 3166-1), but are already usable for ISO 3166-2 code elements.

> These tags are used as modifiers in sequences starting by a leading
> U+1F3F4
> <http://unicode.org/emoji/charts/full-emoji-list.html#1f3f4_e0067_e0062_e0065_e006e_e0067_e007f>
> (WAVING BLACK FLAG) emoji.

This is true. (Note the lowercase tag letters.)

> - U+E007F (CANCEL TAG) is already used too for the regional extensions
> as a mandatory terminator, as seen in the three British countries.

This is true.

> It is not used for country flags made of 2-letter emoji codes without
> any leading flag emoji.

This is true, but not particularly relevant, as these use Regional Indicator 
Symbols and have nothing to do with the "emoji codes" discussed elsewhere.

> And the proposal discussed here to use U+E003C, mapped to the ASCII
> "<" LOWER THAN

LESS-THAN SIGN

> as a leading tag sequence for reencoding HTML tags in sequences
> terminated by U+E003E ">" (and containing HTML element names using
> lowercase letter tags,

Only "b", "i", "u", and "s" by definition.

> possibly digit tags in these names,

No.

> and "/" for HTML tags terminator, possibly also U+E0020 SPACE TAG for
> separating HTML attributes, U+003D "=" for attribute values, U+E0022
> (') or U+E0027 (") around attribute values, but a problem if the
> mapped element names or attributes contain non-ASCII characters...)

None of these are part of Andrew's mechanism. It's just b, i, u, and s.

> is not standard

Neither Andrew nor anyone else claimed it was.

> (it's just an experiment in one font),

It applies to any TrueType font, because the rendering engine can apply these 
four styles (in any combination) to any TrueType font.

> and would in fact not be compatible with the existing specification
> for tags.

Good thing nobody claimed they were.

> So only E+E0020 through U+E0040, and U+E005B through U+E007E remain
> deprecated.

Da capo.

--
Doug Ewell | Thornton, CO, US | ewellic.org



Reply via email to