On Thu, May 17, 2018 at 12:51 PM, Timo Paulssen <t...@wakelift.de> wrote:
> character classes are fundamentally the wrong thing for "phrases", since > they describe only a character. > You were right the first time. > Your current regex (before changing [gm] to ["gm"]) was expressing "from > the start of the string, there's any amount of characters d through z (but > neither g nor m) and then the end of the string", which can be more easily > expressed as "the whole string contains only letters d through z (but > neither g nor m)". > > What you apparently want is "the whole string contains only letters d > through z, but never the phrase 'gm'", which - in order to get to a working > regex - we can rephrase as "the whole string contains only letters d > through z and no occurrence of g is followed by an m". Let's turn that into > a regex: > > /^ # Require the match to start at the beginning of the > # string so nothing can sneak in before that. > [ # Everything in this group will be matched a bunch > # of times. > | <[d..z]-[g]> # either anything between d and z, with no > # further restrictions, except for g. > | g <!before m> # If there's a g, it must not be followed > # by an m. > ]* # end of the group, allow the things in the group to > # occur any amount of times. > $/ # Require the match to end at the end of the string, > # so nothing at the end can sneak in. > Somewhat interesting exercise, but this kind of rephrasing doesn't scale well for longer phrases, is hard to automate (given a phrase, which character classes need we use?), and it's already pretty hard to read. If this requirement needs to be expressed in a single regex, here's what I'd use (quickly translated from Perl5, then tested to get rid of translation errors): /<!before .* gm> ^ <[d..z]>* $/ ... or, with comments: /<!before .* gm> # not containing the phrase "gm" anywhere from here, ^ <[d..z]>* $ # match the whole string, containing only letters d through z / Eirik