On Thu, May 17, 2018 at 12:51 PM, Timo Paulssen <t...@wakelift.de> wrote:

> character classes are fundamentally the wrong thing for "phrases", since
> they describe only a character.
>

  You were right the first time.


> Your current regex (before changing [gm] to ["gm"]) was expressing "from
> the start of the string, there's any amount of characters d through z (but
> neither g nor m) and then the end of the string", which can be more easily
> expressed as "the whole string contains only letters d through z (but
> neither g nor m)".
>
> What you apparently want is "the whole string contains only letters d
> through z, but never the phrase 'gm'", which - in order to get to a working
> regex - we can rephrase as "the whole string contains only letters d
> through z and no occurrence of g is followed by an m". Let's turn that into
> a regex:
>
>     /^     # Require the match to start at the beginning of the
>            # string so nothing can sneak in before that.
>     [      # Everything in this group will be matched a bunch
>            # of times.
>     |  <[d..z]-[g]>  # either anything between d and z, with no
>                      # further restrictions, except for g.
>     |  g <!before m> # If there's a g, it must not be followed
>                      # by an m.
>     ]*     # end of the group, allow the things in the group to
>            # occur any amount of times.
>     $/     # Require the match to end at the end of the string,
>            # so nothing at the end can sneak in.
>
  Somewhat interesting exercise, but this kind of rephrasing doesn't scale
well for longer phrases, is hard to automate (given a phrase, which
character classes need we use?), and it's already pretty hard to read.

  If this requirement needs to be expressed in a single regex, here's what
I'd use (quickly translated from Perl5, then tested to get rid of
translation errors):

  /<!before .* gm> ^ <[d..z]>* $/

  ... or, with comments:

  /<!before .* gm> # not containing the phrase "gm" anywhere from here,
   ^ <[d..z]>* $ # match the whole string, containing only letters d
through z
  /


Eirik

Reply via email to