Nice RE tutorial :-)
/^ [ | <[d..z]-[g]> | g<!before m> ]* $/
One question I have is what is the first | for?
On Thu, 17 May 2018 at 21:10, Timo Paulssen <t...@wakelift.de> wrote:
> The description perhaps doesn't point out clearly enough: the reason why
> the stuff inside the [ ] will match any amount of times is only the * at
> the end, the [ ] is only there because otherwise the regex would instead
> match something you didn't mean at all. If you're interested, read on for
> an explanation, but it might actually be more confusing than helpful:
> The resulting regex means "either the beginning of the string is followed
> by any letter from d to z except g, or there's a g that's either not before
> an m, or it is, and followed by the end of the string".
> That's because now the | would not only separate the ^ and $ anchors into
> becoming alternatives, but the * would cling to the <!before m> which is
> now allowed to not match at all (because it's a * and not a +).
> Also, putting a quantifier (which is what * and + are called) on a before
> or after assertion makes no sense and probably leads to an infinite loop
> (the regex engine tries to make you proud by matching it as often as it
> possibly can. which if the assertion is true, is infinitely often. it is
> very diligent, but it does not really think much about what it does).
> Hope that helps
> - Timo
> On 17/05/18 12:51, Timo Paulssen wrote:
> character classes are fundamentally the wrong thing for "phrases", since
> they describe only a character.
> Your current regex (before changing [gm] to ["gm"]) was expressing "from
> the start of the string, there's any amount of characters d through z (but
> neither g nor m) and then the end of the string", which can be more easily
> expressed as "the whole string contains only letters d through z (but
> neither g nor m)".
> What you apparently want is "the whole string contains only letters d
> through z, but never the phrase 'gm'", which - in order to get to a working
> regex - we can rephrase as "the whole string contains only letters d
> through z and no occurrence of g is followed by an m". Let's turn that into
> a regex:
> /^ # Require the match to start at the beginning of the
> # string so nothing can sneak in before that.
> [ # Everything in this group will be matched a bunch
> # of times.
> | <[d..z]-[g]> # either anything between d and z, with no
> # further restrictions, except for g.
> | g <!before m> # If there's a g, it must not be followed
> # by an m.
> ]* # end of the group, allow the things in the group to
> # occur any amount of times.
> $/ # Require the match to end at the end of the string,
> # so nothing at the end can sneak in.
> Important things to note here:
> - <!before m> (spoken as "do not match before an m") will be fine with
> occurrences at the end of the string, too.
> - we don't remove the m from the character class any more, we only
> keep the g in there, because m can be in the string without restrictions;
> if there is an m after a g, our regex will already have failed before it
> even reaches the m, and all other cases are fine (like dm or fm or hm).
> - you are allowed to put a | not only between things, but also at the
> very front. This is allowed in the syntax so that you can line things up
> vertically like I did. Think of it as similar to allowing a , after the
> last element in a list, like with [1, 2, 3, 4, ]
> Match: ｢hi｣
> Match: Nil
> Match: ｢fog｣
> Match: ｢dm｣
> Match: ｢fm｣
> Match: ｢hm｣
> Match: Nil
> Match: ｢rofl｣
> Match: ｢dddddddddddg｣
> Match: ｢gggggggggggg｣
> Match: ｢mmmmmmmm｣
> Hope that helps!
> - Timo
Norman Gaywood, Computer Systems Officer
School of Science and Technology
University of New England
Armidale NSW 2351, Australia
Phone: +61 (0)2 6773 2412 Mobile: +61 (0)4 7862 0062
Please avoid sending me Word or Power Point attachments.