On 05/18/2018 04:30 AM, The Sidhekin wrote:
On Thu, May 17, 2018 at 12:51 PM, Timo Paulssen <t...@wakelift.de
<mailto:t...@wakelift.de>> wrote:
character classes are fundamentally the wrong thing for "phrases",
since they describe only a character.
You were right the first time.
Your current regex (before changing [gm] to ["gm"]) was expressing
"from the start of the string, there's any amount of characters d
through z (but neither g nor m) and then the end of the string",
which can be more easily expressed as "the whole string contains
only letters d through z (but neither g nor m)".
What you apparently want is "the whole string contains only letters
d through z, but never the phrase 'gm'", which - in order to get to
a working regex - we can rephrase as "the whole string contains only
letters d through z and no occurrence of g is followed by an m".
Let's turn that into a regex:
/^ # Require the match to start at the beginning of the
# string so nothing can sneak in before that.
[ # Everything in this group will be matched a bunch
# of times.
| <[d..z]-[g]> # either anything between d and z, with no
# further restrictions, except for g.
| g <!before m> # If there's a g, it must not be followed
# by an m.
]* # end of the group, allow the things in the group to
# occur any amount of times.
$/ # Require the match to end at the end of the string,
# so nothing at the end can sneak in.
Somewhat interesting exercise, but this kind of rephrasing doesn't
scale well for longer phrases, is hard to automate (given a phrase,
which character classes need we use?), and it's already pretty hard to read.
If this requirement needs to be expressed in a single regex, here's
what I'd use (quickly translated from Perl5, then tested to get rid of
translation errors):
/<!before .* gm> ^ <[d..z]>* $/
... or, with comments:
/<!before .* gm> # not containing the phrase "gm" anywhere from here,
^ <[d..z]>* $ # match the whole string, containing only letters d
through z
/
Eirik
Thank you!