From: kloske at tpg dot com dot au Operating system: Linux PHP version: 4.3.10 PHP Bug Type: Regexps related Bug description: Matching explicitly excluded characters
Description: ------------ Whilst trying to get a > 600 character regular expression to correctly match input lines from a file I discovered some strange mismatching which at first I imagined was a bug in my regex string until I reduced it to the simple test case included below. The test case shows some regex which should match limes that contain two fields, seperated by a comma. Each field is identical and can either be a string that does not start with a quote or a comma and contains no commas in it OR starts with a quote and ends with a quote and must contain only quotes or backslashes escaped with a preceeding backslash. Ie: Two fields which may only be simple strings or be c-style escaped strings seperated by a comma. Note in my expected output I am making an educated guess as to what the actual output would be, some of the other fields printed might be a little different. The basics of the problem however are clearly demonstrated. The final thing to note is that if I exclude quotes from the middle or end of the unquoted string case the problem vanishes. This leads me to suspect the problem is somehow related to regex's handling of quotes. Even if there are problems with my regex (I am well aware it is not optimal or particularly "good" in any sense - be aware this is a cut down test case only) this example clearly demonstrates php's regex engine matching a string which contains characters which are clearly excluded in the pattern which it matches. I've tested this with one field and it doesn't appear to be a problem there - it seems to only affect two fields one after another. Reproduce code: --------------- <?php $s = '"some text","test \",thing"'; $r_text = "(\"(([^\\\"]|\\\\|\\\")*)\"|[^\",][^,]*)"; $r_twofields = "${r_text},${r_text}"; preg_match("/^${r_twofields}\$/", $s, $line); echo "<pre>"; echo $s . "\n"; echo $r_twofields . "\n"; var_dump($line); echo "</pre>"; ?> Expected result: ---------------- "some text","test \",thing" ("(([^\"]|\\|\")*)"|[^",][^,]*),("(([^\"]|\\|\")*)"|[^",][^,]*) array(5) { [0]=> string(27) ""some text","test \",thing"" [1]=> string(20) ""some text","test \"" [2]=> string(18) "some text","test \" [3]=> string(1) "\" [4]=> string(6) "thing"" } Actual result: -------------- "some text","test \", thing" ("(([^\"]|\\|\")*)"|[^",][^,]*),("(([^\"]|\\|\")*)"|[^",][^,]*) array(5) { [0]=> string(28) ""some text","test \", thing"" [1]=> string(20) ""some text","test \"" [2]=> string(18) "some text","test \" [3]=> string(1) "\" [4]=> string(7) " thing"" } -- Edit bug report at http://bugs.php.net/?id=33334&edit=1 -- Try a CVS snapshot (php4): http://bugs.php.net/fix.php?id=33334&r=trysnapshot4 Try a CVS snapshot (php5.0): http://bugs.php.net/fix.php?id=33334&r=trysnapshot50 Try a CVS snapshot (php5.1): http://bugs.php.net/fix.php?id=33334&r=trysnapshot51 Fixed in CVS: http://bugs.php.net/fix.php?id=33334&r=fixedcvs Fixed in release: http://bugs.php.net/fix.php?id=33334&r=alreadyfixed Need backtrace: http://bugs.php.net/fix.php?id=33334&r=needtrace Need Reproduce Script: http://bugs.php.net/fix.php?id=33334&r=needscript Try newer version: http://bugs.php.net/fix.php?id=33334&r=oldversion Not developer issue: http://bugs.php.net/fix.php?id=33334&r=support Expected behavior: http://bugs.php.net/fix.php?id=33334&r=notwrong Not enough info: http://bugs.php.net/fix.php?id=33334&r=notenoughinfo Submitted twice: http://bugs.php.net/fix.php?id=33334&r=submittedtwice register_globals: http://bugs.php.net/fix.php?id=33334&r=globals PHP 3 support discontinued: http://bugs.php.net/fix.php?id=33334&r=php3 Daylight Savings: http://bugs.php.net/fix.php?id=33334&r=dst IIS Stability: http://bugs.php.net/fix.php?id=33334&r=isapi Install GNU Sed: http://bugs.php.net/fix.php?id=33334&r=gnused Floating point limitations: http://bugs.php.net/fix.php?id=33334&r=float No Zend Extensions: http://bugs.php.net/fix.php?id=33334&r=nozend MySQL Configuration Error: http://bugs.php.net/fix.php?id=33334&r=mysqlcfg