Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

2016-03-21 Thread Thanh Hong Dai
Dear Zoltan,

Thanks for your reply. That is exactly what I want to know.

Best regards,
Thanh Hong.


-Original Message-
From: Zoltán Herczeg [mailto:hzmes...@freemail.hu] 
Sent: Monday, 21 March, 2016 2:54 PM
To: Thanh Hong Dai <hdth...@tma.com.vn>
Cc: pcre-dev@exim.org
Subject: RE: [pcre-dev] What is the expected behavior of 
/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

>Does it mean that (*SKIP:label) looks for the (*MARK:label) in the regex 
>execution stack to figure out where to bump along to?

Exactly. It searches the last MARK in the regex stack which name matches and 
restart the match from there.

E.g.: when /x(*:a)x(*:a)(*SKIP:a)(*FAIL)|./  matches to xxy, the result is y 
not x. If you delete the second (*:a) the result is x.

One more rule is that no SKIP can go back (to avoid infinite match loops).

>
>Anyway, could anyone please explain the execution trace on regex101: Why is 
>the first branch tried twice, then it proceeds to the second branch?

I know the interpreter matches it twice because it cannot check the MARK label 
stack for the name, and the "not found" is caught when the whole stack is 
reverted. Then, the engine restarts the match, and knows that (*SKIP:label) 
must be ignored (this is just the brief overview of the concept, the 
implementation is a bit more complex). PCRE-JIT does not have this disadvantage.

Regards,
Zoltan


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

2016-03-21 Thread Zoltán Herczeg
>Does it mean that (*SKIP:label) looks for the (*MARK:label) in the regex 
>execution stack to figure out where to bump along to?

Exactly. It searches the last MARK in the regex stack which name matches and 
restart the match from there.

E.g.: when /x(*:a)x(*:a)(*SKIP:a)(*FAIL)|./  matches to xxy, the result is y 
not x. If you delete the second (*:a) the result is x.

One more rule is that no SKIP can go back (to avoid infinite match loops).

>
>Anyway, could anyone please explain the execution trace on regex101: Why is 
>the first branch tried twice, then it proceeds to the second branch?

I know the interpreter matches it twice because it cannot check the MARK label 
stack for the name, and the "not found" is caught when the whole stack is 
reverted. Then, the engine restarts the match, and knows that (*SKIP:label) 
must be ignored (this is just the brief overview of the concept, the 
implementation is a bit more complex). PCRE-JIT does not have this disadvantage.

Regards,
Zoltan


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

2016-03-21 Thread Thanh Hong Dai
Hi,

> I think these questions are better suited to https://www.reddit.com/r/regex
I want the answer to be from the perspective of the implementation. Looking at 
the link, it looks similar to StackOverflow with all the "write me a regex" 
question, instead of "how is this token implemented".

> anyway, I think the /g causes the regex to match all characters. Without that 
> it probably just matches only one character, as you can see in regex101.
I am well aware of the effect of `g`. I deliberately use g flag to show all the 
matches. I want to test whether my regex it has the same behavior as 
/..(*SKIP)(*FAIL)|./g

> The first half of your second assumption is correct, the (*MARK:a) has no 
> effect since it is inside an assertion. However (*SKIP:a) has no effect if 
> the 'a' is not found, so it is not a (*SKIP) in that case!

Does it mean that (*SKIP:label) looks for the (*MARK:label) in the regex 
execution stack to figure out where to bump along to?

Anyway, could anyone please explain the execution trace on regex101: Why is the 
first branch tried twice, then it proceeds to the second branch?

Best regards,
Thanh Hong.

-Original Message-
From: Zoltán Herczeg [mailto:hzmes...@freemail.hu] 
Sent: Friday, 18 March, 2016 8:11 PM
To: Thanh Hong Dai <hdth...@tma.com.vn>
Cc: pcre-dev@exim.org
Subject: Re: [pcre-dev] What is the expected behavior of 
/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

Hi Thanh,

I think these questions are better suited to https://www.reddit.com/r/regex

anyway, I think the /g causes the regex to match all characters. Without that 
it probably just matches only one character, as you can see in regex101. The 
first half of your second assumption is correct, the (*MARK:a) has no effect 
since it is inside an assertion. However (*SKIP:a) has no effect if the 'a' is 
not found, so it is not a (*SKIP) in that case!

Hence your pattern is the same as /(?=..(*MARK:a))(*FAIL)|./ since 'a' is not 
found during backtracking.

Because the first alternative is failed, we matches the second, and that 
matches to the single character.

Regards,
Zoltan

Thanh Hong Dai <hdth...@tma.com.vn> írta:
>Hi,
>
> 
>
>When testing the behavior of (*SKIP) to understand its underlying 
>implementation, I constructed the following regex to verify my
>understanding:
>
> 
>
>/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
>
> 
>
>Test input:
>
> 
>
>aaabbbaa
>
> 
>
>With the assumption that (*SKIP) fails the attempt at the current 
>starting position and bump along to the position where (*SKIP) is at, I 
>anticipated two cases:
>
> 
>
>1)  (*MARK:a) somehow stores the position, so when (*SKIP) fails the
>current attempt, it bumps along 2 characters ahead.
>
>2)  (*MARK:a) is not backtracked into, so when (*SKIP) fails the current
>attempt and bumps along by 1 character as per normal.
>
> 
>
>In either case, I expect there can only be at most one match at the 
>end, since it's the only place the look-ahead fails.
>
> 
>
>However, as it turns out, all characters are matched. Running the 
>debugger on regex101 (https://regex101.com/r/dA9tI1/1) reveals that it 
>tries the first branch twice, and manages to try the second branch and 
>succeeds.
>
> 
>
>What is the expected behavior here?
>
>--
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev



-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 

Re: [pcre-dev] What is the expected behavior of /(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g

2016-03-19 Thread Zoltán Herczeg
Hi Thanh,

I think these questions are better suited to https://www.reddit.com/r/regex

anyway, I think the /g causes the regex to match all characters. Without that 
it probably just matches only one character, as you can see in regex101. The 
first half of your second assumption is correct, the (*MARK:a) has no effect 
since it is inside an assertion. However (*SKIP:a) has no effect if the 'a' is 
not found, so it is not a (*SKIP) in that case!

Hence your pattern is the same as /(?=..(*MARK:a))(*FAIL)|./ since 'a' is not 
found during backtracking.

Because the first alternative is failed, we matches the second, and that 
matches to the single character.

Regards,
Zoltan

Thanh Hong Dai  írta:
>Hi,
>
> 
>
>When testing the behavior of (*SKIP) to understand its underlying
>implementation, I constructed the following regex to verify my
>understanding:
>
> 
>
>/(?=..(*MARK:a))(*SKIP:a)(*FAIL)|./g
>
> 
>
>Test input:
>
> 
>
>aaabbbaa
>
> 
>
>With the assumption that (*SKIP) fails the attempt at the current starting
>position and bump along to the position where (*SKIP) is at, I anticipated
>two cases:
>
> 
>
>1)  (*MARK:a) somehow stores the position, so when (*SKIP) fails the
>current attempt, it bumps along 2 characters ahead.
>
>2)  (*MARK:a) is not backtracked into, so when (*SKIP) fails the current
>attempt and bumps along by 1 character as per normal.
>
> 
>
>In either case, I expect there can only be at most one match at the end,
>since it's the only place the look-ahead fails.
>
> 
>
>However, as it turns out, all characters are matched. Running the debugger
>on regex101 (https://regex101.com/r/dA9tI1/1) reveals that it tries the
>first branch twice, and manages to try the second branch and succeeds.
>
> 
>
>What is the expected behavior here?
>
>-- 
>## List details at https://lists.exim.org/mailman/listinfo/pcre-dev 


-- 
## List details at https://lists.exim.org/mailman/listinfo/pcre-dev