ID: 33334 User updated by: kloske at tpg dot com dot au Reported By: kloske at tpg dot com dot au Status: Bogus Bug Type: PCRE related Operating System: Linux PHP Version: 4.3.10 New Comment:
Okay, found a page on the website which wasn't in my local docs: http://php.planetmirror.com/manual/en/reference.pcre.pattern.syntax.php It does mention the double quote thing. I stand corrected. The other docs probably need a cleanup or something to fix the stuff I mentioned before. Previous Comments: ------------------------------------------------------------------------ [2005-06-16 13:02:33] kloske at tpg dot com dot au Look I don't really care anymore one way or another because I've figured out now how it all works on a level that's detailed enough for me to understand correctly enough to write useful stable and correct code, but just for interest's sake, my regex used quotes because: 1. I needed other escapes and variables in there which single quotes will not allow, and the alternative was using lots of dot notation which looked uglier than using double quotes. 2. The documentation of which you speak, where this is apparently documented, http://au.php.net/preg_match, examples 1-3 (the only numbered examples on this documentation page) all use double quotes. As an aside, all three examples are wrong, or at best highly misleaing, since they use \b which inside a double quote escapes it before it ever gets to the PCRE code. I ran some tests today, and inside a double quote, its much more correct to use \\b instead of \b. Whilst it will work since PCRE is smarter than us, when it comes to \\ it won't, because PCRE is also more careful than us and assumes when it sees the resulting \ that we're trying to escape something. 3. I really really wanted to. Single or double quotes, regex is regex. I am sorry if I violated your preference. I should point out that regex is now 860+ characters long, so it ain't going to be easy to read in single or double quotes. I merely compressed it down and stuck with the format I was using. In spite of all this, I couldn't find anywhere in the PHP doco's that they specifically mentioned the stuff about backspacing, and as I mentioned in point 2 above far from it they in fact mislead in their examples. ------------------------------------------------------------------------ [2005-06-15 18:58:13] [EMAIL PROTECTED] It would be a hell of a lot easier to read your regexes if you would use single quotes. eg. $r = '/^"([^\\"]|\\\\|\\")*"$/'; $s = '"some text","test \\"'; preg_match($r, $s, $m); var_dump($m); for your above example. And this stuff is documented. ------------------------------------------------------------------------ [2005-06-15 12:03:40] kloske at tpg dot com dot au Okay, the PCRE people have gotten back to me, and PCRE has proven to produce the correct expected behavior and my test case has not failed. So now we're left with a test case which fails in PHP yet works on PCRE. For a more stark example, consider the following PHP code: $r = "/^\"([^\\\"]|\\\\|\\\")*\"\$/"; $s = "\"some text\",\"test \\\""; preg_match($r, $s, $m); var_dump($m); $m should be empty, since $s does not match $r, yet the following is returned: array(2) { [0]=> string(20) ""some text","test \"" [1]=> string(1) "\" } Note that the last element of the array contains a single backslash, indicating that the last choice that matched was a backslash, which is NOT ONE OF THE THREE CHOICES. So, the PCRE people explained that they were not familiar with PHP but wondered if it is an escaping issue. Does PHP require you to DOUBLE escape regex? ie, to match a sequence of two backslashes in a row, do you need to write "\\\\\\\\"? I've tried doing this and it seems to give the expected behavior, yet the manual does not mention this fact, and worse the user comments seem to indicate that you should not double escape (since no one is trying to do two backslashes in a row anywhere). I'd say this is a documentation ~defficiency~ more than anything, since it should be made clear that you need to escape the string first, which then will need to be escaped again for correct interpretation by PCRE if you are trying to include a literal backslash, or in other situations where PCRE needs to escape things. To recap, this is what you apparently need to write in PHP to match a literal of two backslashes next to each other: "\\\\\\\\" Gotta love it! Because: The number of backslashes are halved when PHP encodes it as a string, then it passes it literally to PCRE, which halves the number of backslashes again, to the final figure of two backslashes! Simple when you understand, not even hinted at in the PHP documentation. ------------------------------------------------------------------------ [2005-06-15 11:22:32] kloske at tpg dot com dot au As a more simple test case, this literal text string: "test","string\" matches the folling REGEX pattern: ^"([^\"]|\\|\")*"$ Reversing the sense of REGEX to being a pattern GENERATOR, there is no way for that REGEX pattern to generate the string above. I've reported this to the PCRE people and will keep you all posted as to the reply. ------------------------------------------------------------------------ [2005-06-15 01:18:47] kloske at tpg dot com dot au Thank you for that information - it is much appreciated. I will take this up with the PCRE people, as I still believe this to be incorrect behavior. FYI, the documentation I was reading was the regex man pages on both solaris and linux. My peers were people who've studied regular expressions (as have I), and agreed that based on the definitions we've all seen in our respective studies (though none of us have studied PCRE specifically as an implementation) that the behavior we saw was a violation of matching conditions, as specified in the test case's regular expression. ie: based on your greedy quote from the PCRE pages, I do not want it to match a minimum number of times, I want it to match as much as possible. Note the word possible; this regex did not allow it to match as much as it did - IE, it became very greedy indeed, to the point of matching text it wasn't allowed to! ------------------------------------------------------------------------ The remainder of the comments for this report are too long. To view the rest of the comments, please view the bug report online at http://bugs.php.net/33334 -- Edit this bug report at http://bugs.php.net/?id=33334&edit=1