Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Markus Scherer
On Mon, May 18, 2015 at 11:19 AM, Doug Ewell d...@ewellic.org wrote: Is the new mechanism intended to allow flag tags that include either subtype values or contains values? As far as I can tell from your quotes, CLDR will say what's valid (plus containment info), and Unicode permits you to

Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
Philippe and I have got bogged down in a long discussion of how to parse traces of Unicode strings under canonical equivalence against non-regular Kleene star of regular expressions. Fortunately, such expressions can be expected to have very little use. A seemingly simple example is the regex

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
The hyphen is not redundant in ISO 3166 that defines primary codes with variable length (even if ISO 3166 part 1 for now only use two-letter codes). Sometime in a future, two letters will not be enough even in ISO 3166-1, if countries continue to split/merge (this does not happen frequently but is

Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
L2/15-145R says: In CLDR 28, LDML will define a unicode_subdivision_subtag which also provides validity criteria for the codes used for regional subdivisions (see CLDR ticket #8423). When representing regional subdivisions using ISO 3166-2 codes, only those codes that are valid for the LDML

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Andrew West
On 18 May 2015 at 19:19, Doug Ewell d...@ewellic.org wrote: Is the new mechanism intended to allow flag tags that include either subtype values or contains values? For example: That is my understanding. 1F3F3 E0047 E0042 E002D E0053 E0043 E0054 (GB-SCT) for the Scottish flag and 1F3F3

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Philippe Verdy
2015-05-18 20:35 GMT+02:00 Richard Wordingham richard.wording...@ntlworld.com: The algorithm itself should be tractable - Mark Davis has published an algorithm to generate all strings canonically equivalent to a Unicode string, and what we need might not be so complex. Even this algorithm

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 19:37:06 +0100 Andrew West andrewcw...@gmail.com wrote: 1F3F3 E0047 E0042 E002D E004E E004C E004B (GB-NLK) for the North Lanarkshire council area flag I don't believe that North Lanarkshire has an associated flag, which I think is the case for most UK counties and

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Eli Zaretskii
Date: Mon, 18 May 2015 19:35:45 +0100 From: Richard Wordingham richard.wording...@ntlworld.com Mark Davis has published an algorithm to generate all strings canonically equivalent to a Unicode string Where can I find the description of that algorithm?

RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Markus Scherer markus dot icu at gmail dot com wrote: As far as I can tell from your quotes, CLDR will say what's valid (plus containment info), and Unicode permits you to show a flag for any valid tag. North Lanarkshire seems perfectly fine. I'm under the impression that this will be a

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 21:05:49 +0200 Philippe Verdy verd...@wanadoo.fr wrote: 2015-05-18 20:35 GMT+02:00 Richard Wordingham richard.wording...@ntlworld.com: The algorithm itself should be tractable - Mark Davis has published an algorithm to generate all strings canonically equivalent to a

[OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: If ever the country codes used in BCP47 becomes full (all pairs of letters used), just some time before this happens, we could see new prefixes added before a new range of code. It is possible to use a 1-letter prefix for new

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 23:38 GMT+02:00 Doug Ewell d...@ewellic.org: Philippe Verdy verdy underscore p at wanadoo dot fr wrote: So country codes cannot be reassigned (and we can expect many more merges/splits or changes of regimes in the many troubled areas of the world. Changes of regimes don't

Re: [OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 23:55 GMT+02:00 Doug Ewell d...@ewellic.org: Philippe Verdy verdy underscore p at wanadoo dot fr wrote: If ever the country codes used in BCP47 becomes full (all pairs of letters used), just some time before this happens, we could see new prefixes added before a new range of

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
2015-05-18 22:14 GMT+02:00 Doug Ewell d...@ewellic.org: I know I'll regret this... You should not Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Sometime in a future, two letters will not be enough even in ISO 3166-1, if countries continue to split/merge (this does not

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 22:40:21 +0300 Eli Zaretskii e...@gnu.org wrote: Date: Mon, 18 May 2015 19:35:45 +0100 From: Richard Wordingham richard.wording...@ntlworld.com Mark Davis has published an algorithm to generate all strings canonically equivalent to a Unicode string Where can I

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Philippe Verdy
Isn't it possible for your basic substitution to transform \uf073 into a character class [\uf071\uf072\uf073] that the regexp considers as a single entity to check ? In that case, backtracking for matching \u0F73*\u0F72 is simpler: [\uF071\uF072\uF073]*\u0F72, as it just requires backtracking

RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
I know I'll regret this... Philippe Verdy verdy underscore p at wanadoo dot fr wrote: Sometime in a future, two letters will not be enough even in ISO 3166-1, if countries continue to split/merge (this does not happen frequently but is occurs every few years; and it will not be possible to

Re: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Philippe Verdy
If ever the country codes used in BCP47 becomes full (all pairs of letters used), just some time before this happens, we could see new prefixes added before a new range of code. It is possible to use a 1-letter prefix for new country/territory code extensions, but with some maintenance of BCP47

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Tue, 19 May 2015 01:25:54 +0200 Philippe Verdy verd...@wanadoo.fr wrote: I don't work with strings, but with what you seem to call traces, For the concept of traces, Wikipedia suffices: https://fr.wikipedia.org/wiki/Mono%C3%AFde_des_traces . As far as text manipulation is concerned, the

Re: Tag characters

2015-05-18 Thread Mark Davis ☕️
​A few notes. A more concrete proposal will be in a PRI to be issued soon, and people will have a chance to comment more then. (I'm not trying to discourage discussion, just pointing out that there will be something more concrete relatively soon to comment on—people are pretty busy getting 8.0

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Richard Wordingham
On Mon, 18 May 2015 22:56:47 +0200 Philippe Verdy verd...@wanadoo.fr wrote: Isn't it possible for your basic substitution to transform \uf073 into a character class [\uf071\uf072\uf073] that the regexp considers as a single entity to check ? In that case, backtracking for matching

RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
Philippe Verdy verdy underscore p at wanadoo dot fr wrote: ISO 3166-1 already defines alpha-3 and numeric code elements, as well as alpha-2. But how to work with the 2 letters limitation when the world wants more stability in codes (this was an important reason why ISO 639 was not fully

[OT] RE: Flag tags with U+1F3F3 and subtypes

2015-05-18 Thread Doug Ewell
This is why I knew I would regret it. Clearing up some errors here. No more posts from me on this non-Unicode topic after this one. Philippe Verdy verdy underscore p at wanadoo dot fr wrote: This would be a major revision to BCP 47, it would have nothing to do with reordering, It woiuld

Re: Regexes, Canonical Equivalence and Backtracking of Input

2015-05-18 Thread Philippe Verdy
I don't work with strings, but with what you seem to call traces, but that I call sets of states (they are in fact bitsets, which may be compacted or just stored as arrays of bytes containing just 1 usefull bit, but which may be a bit faster; byte arrays are just simpler to program)., in a stack

Re: Arabic diacritics

2015-05-18 Thread عبد الرحمان أيمن
many thanks, this exactly the needed information :) respectfully 2015-05-15 19:09 GMT+03:00 Denis Jacquerye moy...@gmail.com: You should use ARABIC SHADDA U+0651 in all positions. The presentation forms (isolated, medial, final forms) are for compatibility with legacy systems. See what is