On 6/6/2016 12:58 AM, Mathias Bynens wrote:
Backwards compatibility seems to be the only good reason to continue supporting 
the `is` prefix*for existing implementations*, such as the one in Perl. But why 
is it still a requirement for new engines to support it as part of UAX44-LM3?

I’d like to propose changing UAX44-LM3 to make supporting the `is` prefix 
optional for new implementations.


I think the target of concern here is wrong. UAX #44 doesn't *require* any regex engine to include this "is prefix" handling. What UAX #44 does is recommend that all property and property value aliases be correctly recognized, and then specifies a clear statement (in UAX44-LM3) of the loose matching rule for recognizing the various forms of those aliases that could be considered equivalent. I don't think messing with that rule statement (which has been in place since 2010) would be helpful.

The target instead should be in UTS #18, which happily, has a proposed update available for comment right now:

http://www.unicode.org/review/pri325/

The relevant point is:

http://www.unicode.org/reports/tr18/tr18-18.html#RL1.2

That is the conformance part that requires that conformant Unicode regex implementations "must follow the Matching rules from [UAX44]".

If you are seeking indulgences for new engine implementations, that seems like the correct point to be adding clarifications and exceptions. Note that the following text in that section already includes wording about exceptions and compatibility issues. There is also a following section specifically about regex for the Script and Script Extensions properties that seems like it would be the appropriate place to talk about the Greek/IsGreek issue as pertains to regex support.

I would suggest you make specific suggestions about the text of UTS #18 as part of the ongoing public review for the proposed update of that specification.

--Ken

Reply via email to