https://bugs.exim.org/show_bug.cgi?id=2182
Bug ID: 2182 Summary: Lookahead behaving as though a match succeeded in a null-matching repeated group Product: PCRE Version: 8.41 Hardware: x86 OS: Windows Status: NEW Severity: bug Priority: medium Component: Code Assignee: p...@hermes.cam.ac.uk Reporter: tattara...@gmail.com CC: pcre-dev@exim.org This is a bug report and feature request rolled into one :D Consider the following simple expression matched against "aab": ^(?:(?=(\1?+a))(?=aab).){1,2} \1 then = "aa", whereas only "a" is expected. The first round succeeds, then in the second only the first lookahead matches. You should therefore expect that any environmental changes brought on by matching the first lookahead (such as backreference setting) to be reset. If you allow the group to consume a character, you get the expected result: ^(?:(?=(\1?+a))(?=aab).){1,2} Now \1 = "a". This brings me to my feature request: the reason we use such constructs is that, currently, a group quantified with + or * (ie. potentially endlessly) stops matching as soon as an empty string is matched. This makes sense; the engine is looking after us and trying to ensure we don't end up continuously matching empty strings until the end of time. However, is it possible to tweak this safeguard slightly so that if the state of the environment has changed since the last round, such as if one or more backreference values have changed, we can trust that the user knows what they're doing and continue matching? Or perhaps introduce a more explicit way to invoke this highly desirable behaviour? If you let us quantify null-matching groups endlessly, you will open up possibilities that can only be accomplished generally using variable-length lookbehinds (which, of course, are not supported). Think of all the people who so desire VLLBs who would be delighted with this feature :P And I'm guessing it wouldn't require nearly as much work as actually implementing them. Thank you for your time. - John "jaytea" Tattarakis -- You are receiving this mail because: You are on the CC list for the bug. -- ## List details at https://lists.exim.org/mailman/listinfo/pcre-dev