Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Markus Scherer
On Mon, May 18, 2015 at 11:19 AM, Doug Ewell d...@ewellic.org wrote:

 Is the new mechanism intended to allow flag tags that include either
 subtype values or contains values?


As far as I can tell from your quotes, CLDR will say what's valid (plus
containment info), and Unicode permits you to show a flag for any valid tag.
North Lanarkshire seems perfectly fine.

I am curious to see if the redundant hyphen will be part of the syntax.

markus


Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
The hyphen is not redundant in ISO 3166 that defines primary codes with
variable length (even if ISO 3166 part 1 for now only use two-letter codes).
Sometime in a future, two letters will not be enough even in ISO 3166-1, if
countries continue to split/merge (this does not happen frequently but is
occurs every few years; and it will not be possible to reuse old codes that
are maintained for a long period). May be then we'll have ISO 3166-1 codes
using digits (such as A1 or 1A), but this will cause some problems to
map them to IETF ccTLD codes (within the DNS root registry).
As well the UN M.49 numeric codes will get full if it continues with its
current allocation scheme (using ranges of numbers by continental regions).
Or the other solution will be to extend the set of allowed letters.

2015-05-18 20:28 GMT+02:00 Markus Scherer markus@gmail.com:

 On Mon, May 18, 2015 at 11:19 AM, Doug Ewell d...@ewellic.org wrote:

 Is the new mechanism intended to allow flag tags that include either
 subtype values or contains values?


 As far as I can tell from your quotes, CLDR will say what's valid (plus
 containment info), and Unicode permits you to show a flag for any valid tag.
 North Lanarkshire seems perfectly fine.

 I am curious to see if the redundant hyphen will be part of the syntax.

 markus



Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
L2/15-145R says:

 In CLDR 28, LDML will define a unicode_subdivision_subtag which also
 provides validity criteria for the codes used for regional
 subdivisions (see CLDR ticket #8423). When representing regional
 subdivisions using ISO 3166-2 codes, only those codes that are valid
 for the LDML unicode_subdivision_subtag should be used.

The preliminary subdivisions.xml file includes entries like this:

subgroup type=GB contains=UKM GBN SCT EAW ENG WLS NIR/
subgroup type=GB subtype=SCT contains=NLK RFW PKN ANS FAL [...] /
subgroup type=GB subtype=ENG contains=GRE HAL HRY KHL NEL [...] /
subgroup type=GB subtype=WLS contains=NTL RCT BGE NWP BGW [...] /
subgroup type=GB subtype=NIR contains=NDN NYM ANT DOW DRY [...] /

In the United Kingdom case above, four of the subtypes are identified
with the four countries that make up the UK, and have counties
(districts, boroughs, etc.) contained below them. The other three
subtypes (UKM, GBN, EAW) don't really apply to flags and aren't
discussed further here.

Several of the nations in ISO 3166 have this kind of hierarchy. I
haven't checked whether any of them extend to more than two levels of
subdivisions.

Is the new mechanism intended to allow flag tags that include either
subtype values or contains values? For example:

1F3F3 E0047 E0042 E002D E0053 E0043 E0054 (GB-SCT)
for the Scottish flag

and

1F3F3 E0047 E0042 E002D E004E E004C E004B (GB-NLK)
for the North Lanarkshire council area flag

--
Doug Ewell | http://ewellic.org | Thornton, CO 




Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Andrew West
On 18 May 2015 at 19:19, Doug Ewell d...@ewellic.org wrote:

 Is the new mechanism intended to allow flag tags that include either
 subtype values or contains values? For example:

That is my understanding.

 1F3F3 E0047 E0042 E002D E0053 E0043 E0054 (GB-SCT)
 for the Scottish flag

 and

 1F3F3 E0047 E0042 E002D E004E E004C E004B (GB-NLK)
 for the North Lanarkshire council area flag

I don't believe that North Lanarkshire has an associated flag, which I
think is the case for most UK counties and councils (Cornwall, Devon
and Dorset all have flags, but they may be the exceptions).  In fact
not all of the four nations comprising the UK have a flag -- for
political reasons there is no official flag for Northern Ireland, so I
do not know what an implementation would display for 1F3F3 E0047
E0042 E002D E004E E0049 E0052 (GB-NIR), perhaps just a plain flag
emblazoned with GB-NIR.

Andrew


Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 19:37:06 +0100
Andrew West andrewcw...@gmail.com wrote:

  1F3F3 E0047 E0042 E002D E004E E004C E004B (GB-NLK)
  for the North Lanarkshire council area flag
 
 I don't believe that North Lanarkshire has an associated flag, which I
 think is the case for most UK counties and councils (Cornwall, Devon
 and Dorset all have flags, but they may be the exceptions).  In fact
 not all of the four nations comprising the UK have a flag -- for
 political reasons there is no official flag for Northern Ireland, so I
 do not know what an implementation would display for 1F3F3 E0047
 E0042 E002D E004E E0049 E0052 (GB-NIR), perhaps just a plain flag
 emblazoned with GB-NIR.

As the Ulster Banner is still in use, and still does unofficially
represent Northern Ireland, perhaps it should have its own codepoint.

I'm not sure of the strength of the argument for St Patrick's Cross.
Perhaps it too should have its own codepoint, especially if it is
evolving from being a flag of Ireland (apparently not used by the Irish
rugby union team) to a flag of Northern Ireland.

Richard.


RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Markus Scherer markus dot icu at gmail dot com wrote:

 As far as I can tell from your quotes, CLDR will say what's valid
 (plus containment info), and Unicode permits you to show a flag for
 any valid tag. North Lanarkshire seems perfectly fine.

I'm under the impression that this will be a standard Unicode mechanism,
defined in principle by TUS and in detail by the upcoming revision of
UTR #51, with data (but no additional rules) supplied by CLDR.

 I am curious to see if the redundant hyphen will be part of the
 syntax.

Like Philippe, I don't believe the hyphen is redundant. ISO 3166-2
requires it (Section 5.2), and the syntax diagram at the end of
L2/15-145R shows it:

B ((TL{2} (TH (TL|TD){3})?) | (TD{3}))

where TH is TAG HYPHEN-MINUS.

--
Doug Ewell | http://ewellic.org | Thornton, CO 




[OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 If ever the country codes used in BCP47 becomes full (all pairs of
 letters used), just some time before this happens, we could see new
 prefixes added before a new range of code. It is possible to use a
 1-letter prefix for new country/territory code extensions, but with
 some maintenance of BCP47 parsing rules (notably the letter used
 should not be reordered with other singleton prefixes)

This would be a major revision to BCP 47, it would have nothing to do
with reordering, and it would not in any case involve 1-letter prefixes,
which already have a different meaning. And the time frame we are
talking about is reminiscent of Ken's estimate of when 17 planes will no
longer be enough for Unicode.

 But I feel it will first be simpler to assign a special 2-letter code
 like C1- followed by a new new series of 2-letters country codes

We actually thought about this stuff over in LTRU. Really.

I'm not the least bit concerned about the DNS. Five years from now they
could be assigning TLDs consisting entirely of emoji.

This is no longer relevant to flag tags or anything else Unicode.
 
--
Doug Ewell | http://ewellic.org | Thornton, CO 




Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 23:38 GMT+02:00 Doug Ewell d...@ewellic.org:

 Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

  So country codes cannot be reassigned (and we can expect many more
  merges/splits or changes of regimes in the many troubled areas of the
  world.

 Changes of regimes don't usually result in new 3166 code elements. The
 same is true for merges (look at DE/DD or YE/YD). New and changed
 country names usually do.


I just included merges only to be complete because they frequently occur a
little time after a split (and not with the former part).

But of course merges are much less frequent than splits. An in today's
globalized world, splits are even easier than they were in the past (where
merges were the results of invasions/wars/conquests).

The rate of splits is in fact accelerating in history, even in countries
living in peace, this does not mean that they terminate all their
partnerships, just that they take the right to create their own alliances.
There are reasons for them: cultural (language), national taxes, economic
difficulties in some regions, unemployment, management of resources (water,
constructible or cultivable soils) but the most important reasons is
political (defiance between political parties, or brutality against
minorities and mutual misunderstanding)...

In the last 50 years the most important changes came from decolonialisation
and its independances (that was completed at end the the 1970's). But now
we are seeing splits for much smaller entities, and this can occur in many
more places.

With ISO 3166-2 the situation within countries is much more complex and
more frequent (in Europe most countries are undergoing large changes in
their administrative divisions, the changes that will occur next year in
French regions is still not taken into account in ISO 3166-2, as well as
the change that is already effective within one department, splitted in two
parts with only one which remains as a department, the other one being a
group of communes erected into a new territorial collectivity taking all
powers of its former department, for local adminsitration only, but with
the national power still not divided in what is now a circonscription
départementale with the same departmental prefecture as before the split.

The hierarchical model of subdivisions has in fact lots of exceptions (look
into Spain, UK, Germany, it was already true for France and US, but now it
is also occuring even in the Metropolitan area). In fact we can see several
parallel layers of subdivisions, but for different legal roles/missions.

The ISO 3166-1 also assumes that everything is a country, but it is already
wrong with some dependant territories (not all) of France, UK, US, the
Netherlands, Spain and possibly some islands of China. And these codes also
don't map correctly to effective national divisions (the encoding for
claims in Antartica remains ambiguous, depending on who uses the data).
There are also reserves for things that are not countries but groups of
countries (EU, WIPO areas...), and there could exist new codes for other
international alliances (these look like merges except that they are not
full merges and the entities continue to coexist separately).


Re: [OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 23:55 GMT+02:00 Doug Ewell d...@ewellic.org:

 Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

  If ever the country codes used in BCP47 becomes full (all pairs of
  letters used), just some time before this happens, we could see new
  prefixes added before a new range of code. It is possible to use a
  1-letter prefix for new country/territory code extensions, but with
  some maintenance of BCP47 parsing rules (notably the letter used
  should not be reordered with other singleton prefixes)

 This would be a major revision to BCP 47, it would have nothing to do
 with reordering,


It woiuld have to do because all subtags after the pricmary language subtag
in BCP47 are optional, and you can distincguish them only by their length
*or* by the role assigned to specific singletons: there's already the x
singleton exception (that is ordered at end), but other singletons are
currently described to use a canonical order but it is used only for
encoding variants unrelated to region subtags or even to the languages.

Very few singletons are used in fact (the singleton subtags occuring at
start of ther tag are also treated separately from others: it could also be
used to support new syntaxes for BCP47 tags, but fow we just have i-,
deprecated but still valid, and x- for private use; for all other letters
there's no parsing defined for now, their syntax is unknown and they are
not interchangeable without a standard, so they are used only for private
use; another constraint comes from the length limit of subtags: the first
subtag is either a special singleton, or a primary language code using 2 or
3 letters for now; some BCP47 use an empty first subtag, i.e. the tag
starts by an hyphen; double hyphens could be used as extensions to chhange
locally the parsing rules and possibly return to the next logical subtag
and could be used to encode international organization without needing a
formal exceptional reservation in ISO 3166-1; for example *-EU in could
have been encoded as --O-EU and we could have the same system for NATO,
EEA, EFTA... There's still ample space for extensions of parsing rules in
BCP47, but not in ISO3166.)

ISO 3166 also encodes some 4-letter codes but they are not used in BCP47
(so there's no confusion with 4-letter script codes).


Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 22:14 GMT+02:00 Doug Ewell d...@ewellic.org:

 I know I'll regret this...

You should not


 Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

  Sometime in a future, two letters will not be enough even in ISO
  3166-1, if countries continue to split/merge (this does not happen
  frequently but is occurs every few years; and it will not be possible
  to reuse old codes that are maintained for a long period).

 ISO 3166-1 already defines alpha-3 and numeric code elements, as well as
 alpha-2.


But how to work with the 2 letters limitation when the world wants more
stability in codes (this was an important reason why ISO 639 was not fully
integrated in IETF tags, and why the IETF tags have chosen the stability by
keeping also the codes that hbave been deleted in ISO 639, but only
deprecated in IETF language tags (BCP47).

We've already seen the famous reuse before 50 years (do you remember when
CS was reassigned just a few months after it was discarded after an initial
introduction for some months in Serbia-Montenegro?)

ISO coding standard are known to be unstable. This would also be true of
the UCS if Unicode did not push its stability pact with ISO!

But now let's remembers that parts of ISO 3166 are also included (not
fully) in BCP47 tags that require the stability. IT will prohibit
reassignments by ISO (or if this happens, this will break BCP47 and et IETF
will reject the change and will use another subtag if needed.

So country codes cannot be reassigned (and we can expect many more
merges/splits or changes of regimes in the many troubled areas of the world.


RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
I know I'll regret this...

Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 Sometime in a future, two letters will not be enough even in ISO
 3166-1, if countries continue to split/merge (this does not happen
 frequently but is occurs every few years; and it will not be possible
 to reuse old codes that are maintained for a long period).

ISO 3166-1 already defines alpha-3 and numeric code elements, as well as
alpha-2.

ISO 3166/MA has added approximately one code element per year on average
since the breakup of the Soviet Union. There are approximately 336
unassigned alpha-2 code elements, and if any of the assigned ones is
withdrawn, it can be recycled in 50 years.

 May be then we'll have ISO 3166-1 codes using digits (such as A1 or
 1A), but this will cause some problems to map them to IETF ccTLD
 codes (within the DNS root registry).

Adapting to this challenge, if and when it arises, should be child's
play for the DNS, which has recently introduced TLDs like
.சிங்கப்பூர் (or .xn--clchc0ea0b2g2a9gcd if
one prefers).

 As well the UN M.49 numeric codes will get full if it continues with
 its current allocation scheme (using ranges of numbers by continental
 regions). Or the other solution will be to extend the set of allowed
 letters.

UN M.49 numeric code elements (equivalent to ISO 3166-1) are assigned
alphabetically by English country name, or as close as possible, with
some exceptions related to historical names. There are no allocations by
geographical region.

--
Doug Ewell | http://ewellic.org | Thornton, CO 




Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
If ever the country codes used in BCP47 becomes full (all pairs of letters
used), just some time before this happens, we could see new prefixes added
before a new range of code. It is possible to use a 1-letter prefix for new
country/territory code extensions, but with some maintenance of BCP47
parsing rules (notably the letter used should not be reordered with other
singleton prefixes)

But I feel it will first be simpler to assign a special 2-letter code like
C1- followed by a new new series of 2-letters country codes (ccTLDs will
survive, in fact with the development of new gTLDs not limited to 2
characters, the new countries will prefer asking for a more descriptive
gTLD, even if they don't have a 2-letter ccTLD.

Or 2-letter codes will be deprecated in favor of 3-letter codes (but the
IETF will keep all the existing 2-letter ccTLDs as long as their sponsors
support them (and don't require changing it to another TLD, even if this
breaks existing URLs encoded throughout the web).

There's no requirement for ISO 3166 codes to match exactly with a TLD in
the global DNS (this is already the case since long for the .uk ccTLD,
because .gb is almost unused). But the stability of couintry codes is
desirable as well in URLs (stored within encoded documented and for which
it will be hard to make global substitutions: the solution could be to use
tracking dates to resolve domain names, but the worldwide DNS currently
does not support this type of query by date and registrars would not like
to have to keep history files for long, and software/OS developers don't
want to include and maintain such data for their domain name resolving
clients).

It is however possible that in some future the existing URLs requiring
domain names will be deprecated in favor of unique IDs (e.g. based on
IPv6): users won't see ndomain names, but labels retreived from some
whois-like database, or shown by search engines and possibly translated. It
would be also an improvement even if this breaks the business of existing
registrars (however registrars will still have business for selling
PKI-related services). These IDs can also be used in URIs. In fact the DNS
system is already antique in its design (and its very strange and complex
encoding for IDNA that no one can read).


2015-05-18 22:10 GMT+02:00 Doug Ewell d...@ewellic.org:

 Markus Scherer markus dot icu at gmail dot com wrote:

  As far as I can tell from your quotes, CLDR will say what's valid
  (plus containment info), and Unicode permits you to show a flag for
  any valid tag. North Lanarkshire seems perfectly fine.

 I'm under the impression that this will be a standard Unicode mechanism,
 defined in principle by TUS and in detail by the upcoming revision of
 UTR #51, with data (but no additional rules) supplied by CLDR.

  I am curious to see if the redundant hyphen will be part of the
  syntax.

 Like Philippe, I don't believe the hyphen is redundant. ISO 3166-2
 requires it (Section 5.2), and the syntax diagram at the end of
 L2/15-145R shows it:

 B ((TL{2} (TH (TL|TD){3})?) | (TD{3}))

 where TH is TAG HYPHEN-MINUS.

 --
 Doug Ewell | http://ewellic.org | Thornton, CO 





RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 ISO 3166-1 already defines alpha-3 and numeric code elements, as well
 as alpha-2.

 But how to work with the 2 letters limitation when the world wants
 more stability in codes (this was an important reason why ISO 639 was
 not fully integrated in IETF tags, and why the IETF tags have chosen
 the stability by keeping also the codes that hbave been deleted in ISO
 639, but only deprecated in IETF language tags (BCP47).

I assume you're aware of the extent of my involvement in BCP 47, so this
is a semi-rhetorical question.

If and when ISO 3166/MA manages to use up all of the remaining 336
unassigned code elements -- nearly half of the TOTAL possible code space
of 676 two-letter combinations -- the corresponding numeric code
elements will be assigned as BCP 47 region subtags instead.

 We've already seen the famous reuse before 50 years (do you remember
 when CS was reassigned just a few months after it was discarded after
 an initial introduction for some months in Serbia-Montenegro?)

What actually happened was, 'CS' was withdrawn for Czechoslovakia and
then assigned to Serbia and Montenegro. At that time, the waiting period
was five years; the 'CS' incident is what resulted in the change to 50
years.

 But now let's remembers that parts of ISO 3166 are also included (not
 fully) in BCP47 tags that require the stability. IT will prohibit
 reassignments by ISO (or if this happens, this will break BCP47 and et
 IETF will reject the change and will use another subtag if needed.

Again, I'm guessing you already know that I know how BCP 47 works.

ISO 3166/MA can recycle alpha-2 code elements 50 years after withdrawal
if they feel like it. BCP 47 can't prevent that. That's why BCP 47 has a
mechanism to work around that possibility.

 So country codes cannot be reassigned (and we can expect many more
 merges/splits or changes of regimes in the many troubled areas of the
 world.

Changes of regimes don't usually result in new 3166 code elements. The
same is true for merges (look at DE/DD or YE/YD). New and changed
country names usually do.

--
Doug Ewell | http://ewellic.org | Thornton, CO 



[OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
This is why I knew I would regret it.

Clearing up some errors here. No more posts from me on this non-Unicode
topic after this one.

Philippe Verdy verdy underscore p at wanadoo dot fr wrote:

 This would be a major revision to BCP 47, it would have nothing to do
 with reordering,

 It woiuld have to do because all subtags after the pricmary language
 subtag in BCP47 are optional, and you can distincguish them only by
 their length *or* by the role assigned to specific singletons: there's
 already the x singleton exception (that is ordered at end), but
 other singletons are currently described to use a canonical order but
 it is used only for encoding variants unrelated to region subtags or
 even to the languages.

All non-initial singletons introduce an extension, except for 'x' which
introduces a private-use sequence, and which must be last.

Even if an extension were defined to hold top-level region information,
WHICH WILL NEVER HAPPEN, it would not matter whether that extension
appeared before or after other extensions, because it would be an
extension and not a region subtag.

 but fow we just have i-, deprecated but still valid,

i- is not deprecated.

 for all other letters there's no parsing defined for now, their syntax
 is unknown and they are not interchangeable without a standard, so
 they are used only for private use

Extension 't' was defined in 2011 and 'u' in 2010. They have
well-defined syntax, specified in RFC 6497 and 6067 respectively.

Undefined singletons may not be used for private use.

 some BCP47 use an empty first subtag, i.e. the tag starts by an
 hyphen;

Absolutely, utterly false.

--
Doug Ewell | http://ewellic.org | Thornton, CO 