On 05/18/2018 04:30 AM, The Sidhekin wrote:


On Thu, May 17, 2018 at 12:51 PM, Timo Paulssen <t...@wakelift.de <mailto:t...@wakelift.de>> wrote:

    character classes are fundamentally the wrong thing for "phrases",
    since they describe only a character.


   You were right the first time.

    Your current regex (before changing [gm] to ["gm"]) was expressing
    "from the start of the string, there's any amount of characters d
    through z (but neither g nor m) and then the end of the string",
    which can be more easily expressed as "the whole string contains
    only letters d through z (but neither g nor m)".

    What you apparently want is "the whole string contains only letters
    d through z, but never the phrase 'gm'", which - in order to get to
    a working regex - we can rephrase as "the whole string contains only
    letters d through z and no occurrence of g is followed by an m".
    Let's turn that into a regex:

         /^     # Require the match to start at the beginning of the
                # string so nothing can sneak in before that.
         [      # Everything in this group will be matched a bunch
                # of times.
         |  <[d..z]-[g]>  # either anything between d and z, with no
                          # further restrictions, except for g.
         |  g <!before m> # If there's a g, it must not be followed
                          # by an m.
         ]*     # end of the group, allow the things in the group to
                # occur any amount of times.
         $/     # Require the match to end at the end of the string,
                # so nothing at the end can sneak in.

  Somewhat interesting exercise, but this kind of rephrasing doesn't scale well for longer phrases, is hard to automate (given a phrase, which character classes need we use?), and it's already pretty hard to read.

  If this requirement needs to be expressed in a single regex, here's what I'd use (quickly translated from Perl5, then tested to get rid of translation errors):

   /<!before .* gm> ^ <[d..z]>* $/

   ... or, with comments:

   /<!before .* gm> # not containing the phrase "gm" anywhere from here,
   ^ <[d..z]>* $ # match the whole string, containing only letters d through z
   /


Eirik


Thank you!

Reply via email to