Thanks Ashley. > > On 06 June 2016 at 08:58 Mathias Bynens <[email protected]> wrote: > > > http://unicode.org/reports/tr44/#UAX44-LM3 mentions the `is` prefix: > > > For loose matching of symbolic values, an initial prefix string "is" is > > ignored. […] Ignoring any initial "is" on a symbolic value during loose > > matching is likely to produce the best results in application areas such > > as regex. Removal of an initial "is" string for a loose matching > > comparison only needs to be done once for a symbolic value, and need not > > be tested recursively. There are no property aliases or property value > > aliases of the form "isisisisistooconvoluted" defined just to test > > implementation edge cases. > > UAX44 provides the reason for the existence of this “feature”: > > > The reason for this is that APIs returning property values are often > > named using the convention of prefixing "is" (or "Is" or "Is_", and so > > forth) to a property value. > > That seems like a rather weak argument. Specifically applying this to > UTS18 (Unicode regular expressions): > > > "Script=Greek" is equivalent to "Script=isGreek" or "Script=Is_Greek" > > If there is already a way to match all symbols in the Greek script (not > counting the use of aliases and other loose matching requirements), i.e. > `Script=Greek` — what good does it do to add support for yet another one? > > Looking at implementations in the wild, Steven Levithan found > (https://github.com/mathiasbynens/es-unicode-regexp-proposal/issues/2#issuecomment-143288062) > that some regex flavors use `Is` for scripts, some for blocks, some for > scripts and blocks, some for neither. Since some script and block names > collide, this causes problems, especially when porting regexes across flavors. > > The `is` prefix doesn’t provide any functionality that would otherwise be > unavailable. It doesn’t add any value, yet causes incompatibility, author > confusion, and it increases implementation complexity. UAX 44 includes two > entire paragraphs pointing out that last part: > > > Removal of an initial "is" string for a loose matching comparison only > > needs to be done once for a symbolic value, and need not be tested > > recursively. There are no property aliases or property value aliases of > > the form "isisisisistooconvoluted" defined just to test implementation > > edge cases. > > > > Existing and future property aliases and property value aliases are > > guaranteed to be unique within their relevant namespaces, even if an > > initial prefix string "is" is ignored. The existing cases of note for > > aliases that do start with "is" are: dt=Iso > > (Decomposition_Type=Isolated) and lb=IS. The Decomposition_Type value > > alias does not cause any problem, because there is no contrasting value > > alias dt=o (Decomposition_Type=olated). For lb=IS, note that the "IS" is > > the entire property value alias, and is not a prefix. There is no null > > value for the Line_Break property for it to contrast with, but > > implementations of loose matching should be careful of this edge case, > > so that "lb=IS" is not misinterpreted as matching a null value. > > > Backwards compatibility seems to be the only good reason to continue > supporting the `is` prefix *for existing implementations*, such as the one in > Perl. But why is it still a requirement for new engines to support it as part > of UAX44-LM3? > > I’d like to propose changing UAX44-LM3 to make supporting the `is` prefix > optional for new implementations. > >
>

