On 7/2/2015 12:33 PM, Leo Broukhis wrote:
If REGIONAL INDICATOR DASH and REGIONAL INDICATOR digits are added, along with regional supplementary symbols, then sequences <RIS><RIS><RID><RSS>*<RIS> can be parsed unambiguously as ISO 3166-2, whereas <RIS><RSS>+<RIS> can be parsed as a named sequence signifying a flag of a non-governmental entity (or <RIS><RSS><RIS> - as ISO 3166-1 alpha 3, and longer sequences as non-governmental).
The point of switching to the TAG characters for an extension mechanism beyond what the RIS pairs can handle is that TAG characters for letters *and* digits *and* dash already exist and do not have to be encoded yet again before they could be used. Any proposal that depends on getting agreement to encode and publish some *further* set of meta-characters for representing letters, digits, and ASCII punctuation marks would at this point push out any possible solution to the time frame of Unicode 10.0 (June, 2017). And even that would depend on first coming to agreement that *more* sets of meta-characters for dealing with the same kind of function that TAG characters could already serve would be a good idea. The potential for significant disagreement could push such a solution out even further. Remember that any solution involving encoding more characters with "funny behavior" would need not only to gain consensus in the UTC, but would also have to pass muster in SC2 and pass two formal ballots by the national bodies. You could create an equivalent proposal to what you are suggesting above by simply substituting <TAG-DASH> and <TAG-[0..9]> for your RID and RSS above -- and you could do it *now*, instead of in 2017. But once we look to TAG characters for an extension mechanism, why mess with the existing RIS pair syntax and break the existing implementations using them? Hence, the direction taken in PRI #399, which suggests an extension syntax based entirely on the TAG characters. --Ken

