Mark Davis 🍻 <mark at macchiato dot com> wrote: >> Is there any precedent for CLDR to define the validity of Unicode >> character sequences? > > We already have, in tr51, the unicode_region_codes being used for > validity testing of flags: > http://unicode.org/reports/tr51/#Encoding > http://unicode.org/reports/tr51/#Flags
the second of which (Annex B) says: "The valid region sequences are specified by Unicode region subtags as defined in [CLDR], excluding those that are designated private-use or deprecated in [CLDR]." In that case, the wording in TUS needs to be corrected, because TUS 7.0 §22.10 says: "The regional indicator symbols in the range U+1F1E6..U+1F1FF can be used in pairs to represent an ISO 3166 region code." It doesn't say anything about valid pairs being defined by CLDR instead of ISO. I wonder how many users actually know this. > Those are typically the same as the ISO codes, but do add XK > http://unicode.org/reports/tr35/#unicode_region_subtag So QO, QU, and ZZ would be excluded, since those are private-use in BCP 47 and hence also in CLDR. But XK is included, even though it is also private-use. Is this correct? Can an application tell that XK is in and the others are out, just by looking at CLDR data? Also, I assume all of the same include/exclude rules apply both to RIS combinations and to PRI #299-style flag tags. Please let me know if that's not true. > CLDR treats UK as deprecated. > [...] > But you're right; we need to be able to distinguish this case (and > ones like it.) I filed > http://unicode.org/cldr/trac/ticket/8736 OK, so UK is not valid in RIS combinations or flag tags either. Glad to see that clarified. >> Is there any significance to the "subtype" hierarchy as far as flag >> tags are concerned, or are "[flag]FRJ" and "[flag]FR75" equally >> valid? > > No, there isn't. But see also E.5 in > http://www.unicode.org/review/pri299/pri299-additional-flags-background.html Right, clearly flags don't exist for many of the subdivisions. But I'm not sure this is the same question as whether the three-level hierarchy is relevant. In my example, Île-de-France and Paris both have flags, and they aren't the same. (Wikipedia says the Île-de-France flag is "non-official and unused," but they do have a page for it, and in any case there are probably better examples.) > The only purpose for the 4-character subdivision codes is stability. > So let's suppose that Colorado decides to join Canada (thereby > deprecating CO in ISO 3166-2), and British Columbia decides to join > the US (getting the code CO in ISO 3166-2). In that case, CLDR would > keep the old code CO (but deprecated) and create a new 4-letter code > for BC, such as XXCO. This is just for illustration, of course, I've > heard no rumors about either political shift... Thanks for the 'XXCO' example; this is different from tending toward 'COXX' and was what I was looking for. The exact scenario would not apply, of course, due to the agreement to keep subdivision codes unique across the US/Canada border. I'd suppose this would be preserved, and 3166-2 would assign US-BC to "British Columbia as US state," and there would be no coding conflict to resolve. But again, additional examples could easily be dreamed up: replace BC with the Central Abaco region of the Bahamas (currently BS-CO), which isn't that far away. >> (private-use flag tags) > > We'll have to address that. My view is that they should not be valid: > if someone wants a PU flag, of any source, they have over 130,000 > Unicode PU characters to play with. I concur, and this is consistent with Annex B. Thanks, -- Doug Ewell | http://ewellic.org | Thornton, CO 🇺🇸

