Re: Flag tags with U+1F3F3 and subtypes
On Mon, May 18, 2015 at 11:19 AM, Doug Ewell d...@ewellic.org wrote: Is the new mechanism intended to allow flag tags that include either subtype values or contains values? As far as I can tell from your quotes, CLDR will say what's valid (plus containment info), and Unicode permits you to show a flag for any valid tag. North Lanarkshire seems perfectly fine. I am curious to see if the redundant hyphen will be part of the syntax. markus
Re: Flag tags with U+1F3F3 and subtypes
The hyphen is not redundant in ISO 3166 that defines primary codes with variable length (even if ISO 3166 part 1 for now only use two-letter codes). Sometime in a future, two letters will not be enough even in ISO 3166-1, if countries continue to split/merge (this does not happen frequently but is occurs every few years; and it will not be possible to reuse old codes that are maintained for a long period). May be then we'll have ISO 3166-1 codes using digits (such as A1 or 1A), but this will cause some problems to map them to IETF ccTLD codes (within the DNS root registry). As well the UN M.49 numeric codes will get full if it continues with its current allocation scheme (using ranges of numbers by continental regions). Or the other solution will be to extend the set of allowed letters. 2015-05-18 20:28 GMT+02:00 Markus Scherer markus@gmail.com: On Mon, May 18, 2015 at 11:19 AM, Doug Ewell d...@ewellic.org wrote: Is the new mechanism intended to allow flag tags that include either subtype values or contains values? As far as I can tell from your quotes, CLDR will say what's valid (plus containment info), and Unicode permits you to show a flag for any valid tag. North Lanarkshire seems perfectly fine. I am curious to see if the redundant hyphen will be part of the syntax. markus
Flag tags with U+1F3F3 and subtypes
L2/15-145R says: In CLDR 28, LDML will define a unicode_subdivision_subtag which also provides validity criteria for the codes used for regional subdivisions (see CLDR ticket #8423). When representing regional subdivisions using ISO 3166-2 codes, only those codes that are valid for the LDML unicode_subdivision_subtag should be used. The preliminary subdivisions.xml file includes entries like this: subgroup type=GB contains=UKM GBN SCT EAW ENG WLS NIR/ subgroup type=GB subtype=SCT contains=NLK RFW PKN ANS FAL [...] / subgroup type=GB subtype=ENG contains=GRE HAL HRY KHL NEL [...] / subgroup type=GB subtype=WLS contains=NTL RCT BGE NWP BGW [...] / subgroup type=GB subtype=NIR contains=NDN NYM ANT DOW DRY [...] / In the United Kingdom case above, four of the subtypes are identified with the four countries that make up the UK, and have counties (districts, boroughs, etc.) contained below them. The other three subtypes (UKM, GBN, EAW) don't really apply to flags and aren't discussed further here. Several of the nations in ISO 3166 have this kind of hierarchy. I haven't checked whether any of them extend to more than two levels of subdivisions. Is the new mechanism intended to allow flag tags that include either subtype values or contains values? For example: 1F3F3 E0047 E0042 E002D E0053 E0043 E0054 (GB-SCT) for the Scottish flag and 1F3F3 E0047 E0042 E002D E004E E004C E004B (GB-NLK) for the North Lanarkshire council area flag -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Flag tags with U+1F3F3 and subtypes
On 18 May 2015 at 19:19, Doug Ewell d...@ewellic.org wrote: Is the new mechanism intended to allow flag tags that include either subtype values or contains values? For example: That is my understanding. 1F3F3 E0047 E0042 E002D E0053 E0043 E0054 (GB-SCT) for the Scottish flag and 1F3F3 E0047 E0042 E002D E004E E004C E004B (GB-NLK) for the North Lanarkshire council area flag I don't believe that North Lanarkshire has an associated flag, which I think is the case for most UK counties and councils (Cornwall, Devon and Dorset all have flags, but they may be the exceptions). In fact not all of the four nations comprising the UK have a flag -- for political reasons there is no official flag for Northern Ireland, so I do not know what an implementation would display for 1F3F3 E0047 E0042 E002D E004E E0049 E0052 (GB-NIR), perhaps just a plain flag emblazoned with GB-NIR. Andrew
Re: Flag tags with U+1F3F3 and subtypes
On Mon, 18 May 2015 19:37:06 +0100 Andrew West andrewcw...@gmail.com wrote: 1F3F3 E0047 E0042 E002D E004E E004C E004B (GB-NLK) for the North Lanarkshire council area flag I don't believe that North Lanarkshire has an associated flag, which I think is the case for most UK counties and councils (Cornwall, Devon and Dorset all have flags, but they may be the exceptions). In fact not all of the four nations comprising the UK have a flag -- for political reasons there is no official flag for Northern Ireland, so I do not know what an implementation would display for 1F3F3 E0047 E0042 E002D E004E E0049 E0052 (GB-NIR), perhaps just a plain flag emblazoned with GB-NIR. As the Ulster Banner is still in use, and still does unofficially represent Northern Ireland, perhaps it should have its own codepoint. I'm not sure of the strength of the argument for St Patrick's Cross. Perhaps it too should have its own codepoint, especially if it is evolving from being a flag of Ireland (apparently not used by the Irish rugby union team) to a flag of Northern Ireland. Richard.
RE: Flag tags with U+1F3F3 and subtypes
Markus Scherer markus dot icu at gmail dot com wrote: As far as I can tell from your quotes, CLDR will say what's valid (plus containment info), and Unicode permits you to show a flag for any valid tag. North Lanarkshire seems perfectly fine. I'm under the impression that this will be a standard Unicode mechanism, defined in principle by TUS and in detail by the upcoming revision of UTR #51, with data (but no additional rules) supplied by CLDR. I am curious to see if the redundant hyphen will be part of the syntax. Like Philippe, I don't believe the hyphen is redundant. ISO 3166-2 requires it (Section 5.2), and the syntax diagram at the end of L2/15-145R shows it: B ((TL{2} (TH (TL|TD){3})?) | (TD{3})) where TH is TAG HYPHEN-MINUS. -- Doug Ewell | http://ewellic.org | Thornton, CO
[OT] RE: Flag tags with U+1F3F3 and subtypes
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: If ever the country codes used in BCP47 becomes full (all pairs of letters used), just some time before this happens, we could see new prefixes added before a new range of code. It is possible to use a 1-letter prefix for new country/territory code extensions, but with some maintenance of BCP47 parsing rules (notably the letter used should not be reordered with other singleton prefixes) This would be a major revision to BCP 47, it would have nothing to do with reordering, and it would not in any case involve 1-letter prefixes, which already have a different meaning. And the time frame we are talking about is reminiscent of Ken's estimate of when 17 planes will no longer be enough for Unicode. But I feel it will first be simpler to assign a special 2-letter code like C1- followed by a new new series of 2-letters country codes We actually thought about this stuff over in LTRU. Really. I'm not the least bit concerned about the DNS. Five years from now they could be assigning TLDs consisting entirely of emoji. This is no longer relevant to flag tags or anything else Unicode. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Flag tags with U+1F3F3 and subtypes
2015-05-18 23:38 GMT+02:00 Doug Ewell d...@ewellic.org: Philippe Verdy verdy underscore p at wanadoo dot fr wrote: So country codes cannot be reassigned (and we can expect many more merges/splits or changes of regimes in the many troubled areas of the world. Changes of regimes don't usually result in new 3166 code elements. The same is true for merges (look at DE/DD or YE/YD). New and changed country names usually do. I just included merges only to be complete because they frequently occur a little time after a split (and not with the former part). But of course merges are much less frequent than splits. An in today's globalized world, splits are even easier than they were in the past (where merges were the results of invasions/wars/conquests). The rate of splits is in fact accelerating in history, even in countries living in peace, this does not mean that they terminate all their partnerships, just that they take the right to create their own alliances. There are reasons for them: cultural (language), national taxes, economic difficulties in some regions, unemployment, management of resources (water, constructible or cultivable soils) but the most important reasons is political (defiance between political parties, or brutality against minorities and mutual misunderstanding)... In the last 50 years the most important changes came from decolonialisation and its independances (that was completed at end the the 1970's). But now we are seeing splits for much smaller entities, and this can occur in many more places. With ISO 3166-2 the situation within countries is much more complex and more frequent (in Europe most countries are undergoing large changes in their administrative divisions, the changes that will occur next year in French regions is still not taken into account in ISO 3166-2, as well as the change that is already effective within one department, splitted in two parts with only one which remains as a department, the other one being a group of communes erected into a new territorial collectivity taking all powers of its former department, for local adminsitration only, but with the national power still not divided in what is now a circonscription départementale with the same departmental prefecture as before the split. The hierarchical model of subdivisions has in fact lots of exceptions (look into Spain, UK, Germany, it was already true for France and US, but now it is also occuring even in the Metropolitan area). In fact we can see several parallel layers of subdivisions, but for different legal roles/missions. The ISO 3166-1 also assumes that everything is a country, but it is already wrong with some dependant territories (not all) of France, UK, US, the Netherlands, Spain and possibly some islands of China. And these codes also don't map correctly to effective national divisions (the encoding for claims in Antartica remains ambiguous, depending on who uses the data). There are also reserves for things that are not countries but groups of countries (EU, WIPO areas...), and there could exist new codes for other international alliances (these look like merges except that they are not full merges and the entities continue to coexist separately).
Re: [OT] RE: Flag tags with U+1F3F3 and subtypes
2015-05-18 23:55 GMT+02:00 Doug Ewell d...@ewellic.org: Philippe Verdy verdy underscore p at wanadoo dot fr wrote: If ever the country codes used in BCP47 becomes full (all pairs of letters used), just some time before this happens, we could see new prefixes added before a new range of code. It is possible to use a 1-letter prefix for new country/territory code extensions, but with some maintenance of BCP47 parsing rules (notably the letter used should not be reordered with other singleton prefixes) This would be a major revision to BCP 47, it would have nothing to do with reordering, It woiuld have to do because all subtags after the pricmary language subtag in BCP47 are optional, and you can distincguish them only by their length *or* by the role assigned to specific singletons: there's already the x singleton exception (that is ordered at end), but other singletons are currently described to use a canonical order but it is used only for encoding variants unrelated to region subtags or even to the languages. Very few singletons are used in fact (the singleton subtags occuring at start of ther tag are also treated separately from others: it could also be used to support new syntaxes for BCP47 tags, but fow we just have i-, deprecated but still valid, and x- for private use; for all other letters there's no parsing defined for now, their syntax is unknown and they are not interchangeable without a standard, so they are used only for private use; another constraint comes from the length limit of subtags: the first subtag is either a special singleton, or a primary language code using 2 or 3 letters for now; some BCP47 use an empty first subtag, i.e. the tag starts by an hyphen; double hyphens could be used as extensions to chhange locally the parsing rules and possibly return to the next logical subtag and could be used to encode international organization without needing a formal exceptional reservation in ISO 3166-1; for example *-EU in could have been encoded as --O-EU and we could have the same system for NATO, EEA, EFTA... There's still ample space for extensions of parsing rules in BCP47, but not in ISO3166.) ISO 3166 also encodes some 4-letter codes but they are not used in BCP47 (so there's no confusion with 4-letter script codes).
Re: Flag tags with U+1F3F3 and subtypes
2015-05-18 22:14 GMT+02:00 Doug Ewell d...@ewellic.org: I know I'll regret this... You should not Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Sometime in a future, two letters will not be enough even in ISO 3166-1, if countries continue to split/merge (this does not happen frequently but is occurs every few years; and it will not be possible to reuse old codes that are maintained for a long period). ISO 3166-1 already defines alpha-3 and numeric code elements, as well as alpha-2. But how to work with the 2 letters limitation when the world wants more stability in codes (this was an important reason why ISO 639 was not fully integrated in IETF tags, and why the IETF tags have chosen the stability by keeping also the codes that hbave been deleted in ISO 639, but only deprecated in IETF language tags (BCP47). We've already seen the famous reuse before 50 years (do you remember when CS was reassigned just a few months after it was discarded after an initial introduction for some months in Serbia-Montenegro?) ISO coding standard are known to be unstable. This would also be true of the UCS if Unicode did not push its stability pact with ISO! But now let's remembers that parts of ISO 3166 are also included (not fully) in BCP47 tags that require the stability. IT will prohibit reassignments by ISO (or if this happens, this will break BCP47 and et IETF will reject the change and will use another subtag if needed. So country codes cannot be reassigned (and we can expect many more merges/splits or changes of regimes in the many troubled areas of the world.
RE: Flag tags with U+1F3F3 and subtypes
I know I'll regret this... Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Sometime in a future, two letters will not be enough even in ISO 3166-1, if countries continue to split/merge (this does not happen frequently but is occurs every few years; and it will not be possible to reuse old codes that are maintained for a long period). ISO 3166-1 already defines alpha-3 and numeric code elements, as well as alpha-2. ISO 3166/MA has added approximately one code element per year on average since the breakup of the Soviet Union. There are approximately 336 unassigned alpha-2 code elements, and if any of the assigned ones is withdrawn, it can be recycled in 50 years. May be then we'll have ISO 3166-1 codes using digits (such as A1 or 1A), but this will cause some problems to map them to IETF ccTLD codes (within the DNS root registry). Adapting to this challenge, if and when it arises, should be child's play for the DNS, which has recently introduced TLDs like .சிங்கப்பூர் (or .xn--clchc0ea0b2g2a9gcd if one prefers). As well the UN M.49 numeric codes will get full if it continues with its current allocation scheme (using ranges of numbers by continental regions). Or the other solution will be to extend the set of allowed letters. UN M.49 numeric code elements (equivalent to ISO 3166-1) are assigned alphabetically by English country name, or as close as possible, with some exceptions related to historical names. There are no allocations by geographical region. -- Doug Ewell | http://ewellic.org | Thornton, CO
Re: Flag tags with U+1F3F3 and subtypes
If ever the country codes used in BCP47 becomes full (all pairs of letters used), just some time before this happens, we could see new prefixes added before a new range of code. It is possible to use a 1-letter prefix for new country/territory code extensions, but with some maintenance of BCP47 parsing rules (notably the letter used should not be reordered with other singleton prefixes) But I feel it will first be simpler to assign a special 2-letter code like C1- followed by a new new series of 2-letters country codes (ccTLDs will survive, in fact with the development of new gTLDs not limited to 2 characters, the new countries will prefer asking for a more descriptive gTLD, even if they don't have a 2-letter ccTLD. Or 2-letter codes will be deprecated in favor of 3-letter codes (but the IETF will keep all the existing 2-letter ccTLDs as long as their sponsors support them (and don't require changing it to another TLD, even if this breaks existing URLs encoded throughout the web). There's no requirement for ISO 3166 codes to match exactly with a TLD in the global DNS (this is already the case since long for the .uk ccTLD, because .gb is almost unused). But the stability of couintry codes is desirable as well in URLs (stored within encoded documented and for which it will be hard to make global substitutions: the solution could be to use tracking dates to resolve domain names, but the worldwide DNS currently does not support this type of query by date and registrars would not like to have to keep history files for long, and software/OS developers don't want to include and maintain such data for their domain name resolving clients). It is however possible that in some future the existing URLs requiring domain names will be deprecated in favor of unique IDs (e.g. based on IPv6): users won't see ndomain names, but labels retreived from some whois-like database, or shown by search engines and possibly translated. It would be also an improvement even if this breaks the business of existing registrars (however registrars will still have business for selling PKI-related services). These IDs can also be used in URIs. In fact the DNS system is already antique in its design (and its very strange and complex encoding for IDNA that no one can read). 2015-05-18 22:10 GMT+02:00 Doug Ewell d...@ewellic.org: Markus Scherer markus dot icu at gmail dot com wrote: As far as I can tell from your quotes, CLDR will say what's valid (plus containment info), and Unicode permits you to show a flag for any valid tag. North Lanarkshire seems perfectly fine. I'm under the impression that this will be a standard Unicode mechanism, defined in principle by TUS and in detail by the upcoming revision of UTR #51, with data (but no additional rules) supplied by CLDR. I am curious to see if the redundant hyphen will be part of the syntax. Like Philippe, I don't believe the hyphen is redundant. ISO 3166-2 requires it (Section 5.2), and the syntax diagram at the end of L2/15-145R shows it: B ((TL{2} (TH (TL|TD){3})?) | (TD{3})) where TH is TAG HYPHEN-MINUS. -- Doug Ewell | http://ewellic.org | Thornton, CO
RE: Flag tags with U+1F3F3 and subtypes
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: ISO 3166-1 already defines alpha-3 and numeric code elements, as well as alpha-2. But how to work with the 2 letters limitation when the world wants more stability in codes (this was an important reason why ISO 639 was not fully integrated in IETF tags, and why the IETF tags have chosen the stability by keeping also the codes that hbave been deleted in ISO 639, but only deprecated in IETF language tags (BCP47). I assume you're aware of the extent of my involvement in BCP 47, so this is a semi-rhetorical question. If and when ISO 3166/MA manages to use up all of the remaining 336 unassigned code elements -- nearly half of the TOTAL possible code space of 676 two-letter combinations -- the corresponding numeric code elements will be assigned as BCP 47 region subtags instead. We've already seen the famous reuse before 50 years (do you remember when CS was reassigned just a few months after it was discarded after an initial introduction for some months in Serbia-Montenegro?) What actually happened was, 'CS' was withdrawn for Czechoslovakia and then assigned to Serbia and Montenegro. At that time, the waiting period was five years; the 'CS' incident is what resulted in the change to 50 years. But now let's remembers that parts of ISO 3166 are also included (not fully) in BCP47 tags that require the stability. IT will prohibit reassignments by ISO (or if this happens, this will break BCP47 and et IETF will reject the change and will use another subtag if needed. Again, I'm guessing you already know that I know how BCP 47 works. ISO 3166/MA can recycle alpha-2 code elements 50 years after withdrawal if they feel like it. BCP 47 can't prevent that. That's why BCP 47 has a mechanism to work around that possibility. So country codes cannot be reassigned (and we can expect many more merges/splits or changes of regimes in the many troubled areas of the world. Changes of regimes don't usually result in new 3166 code elements. The same is true for merges (look at DE/DD or YE/YD). New and changed country names usually do. -- Doug Ewell | http://ewellic.org | Thornton, CO
[OT] RE: Flag tags with U+1F3F3 and subtypes
This is why I knew I would regret it. Clearing up some errors here. No more posts from me on this non-Unicode topic after this one. Philippe Verdy verdy underscore p at wanadoo dot fr wrote: This would be a major revision to BCP 47, it would have nothing to do with reordering, It woiuld have to do because all subtags after the pricmary language subtag in BCP47 are optional, and you can distincguish them only by their length *or* by the role assigned to specific singletons: there's already the x singleton exception (that is ordered at end), but other singletons are currently described to use a canonical order but it is used only for encoding variants unrelated to region subtags or even to the languages. All non-initial singletons introduce an extension, except for 'x' which introduces a private-use sequence, and which must be last. Even if an extension were defined to hold top-level region information, WHICH WILL NEVER HAPPEN, it would not matter whether that extension appeared before or after other extensions, because it would be an extension and not a region subtag. but fow we just have i-, deprecated but still valid, i- is not deprecated. for all other letters there's no parsing defined for now, their syntax is unknown and they are not interchangeable without a standard, so they are used only for private use Extension 't' was defined in 2011 and 'u' in 2010. They have well-defined syntax, specified in RFC 6497 and 6067 respectively. Undefined singletons may not be used for private use. some BCP47 use an empty first subtag, i.e. the tag starts by an hyphen; Absolutely, utterly false. -- Doug Ewell | http://ewellic.org | Thornton, CO