Hi,
> I think these questions are better suited to https://www.reddit.com/r/regex
I want the answer to be from the perspective of the implementation. Looking at
the link, it looks similar to StackOverflow with all the "write me a regex"
question, instead of "how is this token implemented".
> anyway, I think the /g causes the regex to match all characters. Without that
> it probably just matches only one character, as you can see in regex101.
I am well aware of the effect of `g`. I deliberately use g flag to show all the
matches. I want to test whether my regex it has the same behavior as
/..(*SKIP)(*FAIL)|./g
> The first half of your second assumption is correct, the (*MARK:a) has no
> effect since it is inside an assertion. However (*SKIP:a) has no effect if
> the 'a' is not found, so it is not a (*SKIP) in that case!
Does it mean that (*SKIP:label) looks for the (*MARK:label) in the regex
execution stack to figure out where to bump along to?
Anyway, could anyone please explain the execution trace on regex101: Why is the
first branch tried twice, then it proceeds to the second branch?
Best regards,
Thanh Hong.
-Original Message-
From: Zoltán Herczeg [mailto:hzmes...@freemail.hu]
Sent: Friday, 18 March, 2016 8:11 PM
To: Thanh Hong Dai
Cc: pcre-dev@exim.org
Subject: Re: [pcre-dev] What is the expected behavior of
/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
Hi Thanh,
I think these questions are better suited to https://www.reddit.com/r/regex
anyway, I think the /g causes the regex to match all characters. Without that
it probably just matches only one character, as you can see in regex101. The
first half of your second assumption is correct, the (*MARK:a) has no effect
since it is inside an assertion. However (*SKIP:a) has no effect if the 'a' is
not found, so it is not a (*SKIP) in that case!
Hence your pattern is the same as /(?=..(*MARK:a))(*FAIL)|./ since 'a' is not
found during backtracking.
Because the first alternative is failed, we matches the second, and that
matches to the single character.
Regards,
Zoltan
Thanh Hong Dai írta:
>Hi,
>
>
>
>When testing the behavior of (*SKIP) to understand its underlying
>implementation, I constructed the following regex to verify my
>understanding:
>
>
>
>/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
>
>
>
>Test input:
>
>
>
>aaabbbaa
>
>
>
>With the assumption that (*SKIP) fails the attempt at the current
>starting position and bump along to the position where (*SKIP) is at, I
>anticipated two cases:
>
>
>
>1) (*MARK:a) somehow stores the position, so when (*SKIP) fails the
>current attempt, it bumps along 2 characters ahead.
>
>2) (*MARK:a) is not backtracked into, so when (*SKIP) fails the current
>attempt and bumps along by 1 character as per normal.
>
>
>
>In either case, I expect there can only be at most one match at the
>end, since it's the only place the look-ahead fails.
>
>
>
>However, as it turns out, all characters are matched. Running the
>debugger on regex101 (https://regex101.com/r/dA9tI1/1) reveals that it
>tries the first branch twice, and manages to try the second branch and
>succeeds.
>
>
>
>What is the expected behavior here?
>
>--
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev
--
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev