ID: 14893 User updated by: [EMAIL PROTECTED] Reported By: [EMAIL PROTECTED] Status: Bogus Bug Type: PCRE related Operating System: SunOS PHP Version: 4.1.1 New Comment:
The bug is in PCRE, as the category states -- I am merely bringing to the attention of the PHP developers that this bug exists in the regex engine it employs. I have contacted the author of PCRE, and he'll fix it when the next version of PCRE is released. As for why it is properly a bug: "ab1b" =~ /(.*)\d+\1/ should match as follows (assuming absolutely no optimizations are done); [] [ab1b] OPEN 1 [] [ab1b] STAR ANY [ab1b] [] CLOSE 1 [ab1b] [] PLUS DIGIT fail [ab1] [b] CLOSE 1 [ab1] [b] PLUS DIGIT fail [ab] [1b] CLOSE 1 [ab] [1b] PLUS DIGIT [ab1] [b] REF 1 fail [a] [b1b] CLOSE 1 [a] [b1b] PLUS DIGIT fail start over [a] [b1b] OPEN 1 [a] [b1b] STAR ANY [ab1b] [] CLOSE 1 [ab1b] [] PLUS DIGIT fail [ab1] [b] CLOSE 1 [ab1] [b] PLUS DIGIT fail [ab] [1b] CLOSE 1 [ab] [1b] PLUS DIGIT [ab1] [b] REF 1 [ab1b] [] DONE You can see that this regex should succeed (at least, I hope I've made that clear). The problem is that the PCRE engine optimizes a .* at the beginning of a regex to be implicitly anchored with ^, since it seems obvious that if .* is going to match anywhere, it will end up matching at the beginning of the string. This is perfectly sensible except in the case where that .* is captured and used later in the regex, as my case shows. Previous Comments: ------------------------------------------------------------------------ [2002-01-27 01:03:39] [EMAIL PROTECTED] a) Not a PHP bug (but its good to be aware of this issue, if you wouldn't mind please send a mail to [EMAIL PROTECTED] with any updates, etc.) b) not sure if this is really a bug, the way I read the 1st regex is: read in ab put that as \1 after a digit match \1 which is ab after the digit there is only b whereas in the second example you make the regex non-greedy, so therefore it matches from the beginning of the string and matches the ab from the lookahead assertion. I could be wrong, but either way its not a PHP bug ;) If you disagree please follow up at [EMAIL PROTECTED] regards, sterling ------------------------------------------------------------------------ [2002-01-06 17:57:14] [EMAIL PROTECTED] Here's the problem: <? echo preg_match('/(.*)\d+\1/', 'ab1b'); ?> It fails, but it really shouldn't. You can fool the engine into not having the bug: <? echo preg_match('/(?=)(.*)\d+\1/', 'ab1b'); ?> The bug is thus: a regex that starts with .* can logically be made to start with an implicit anchor to the beginning of the string. However, this optimization can break the success of a regex if the .* is captured (as above) and used later (the back-reference \1). I've contacted the author of the PCRE package. ------------------------------------------------------------------------ Edit this bug report at http://bugs.php.net/?id=14893&edit=1 -- PHP Development Mailing List <http://www.php.net/> To unsubscribe, e-mail: [EMAIL PROTECTED] For additional commands, e-mail: [EMAIL PROTECTED] To contact the list administrators, e-mail: [EMAIL PROTECTED]