From:             kloske at tpg dot com dot au
Operating system: Linux
PHP version:      4.3.10
PHP Bug Type:     Regexps related
Bug description:  Matching explicitly excluded characters

Description:
------------
Whilst trying to get a > 600 character regular expression to correctly
match input lines from a file I discovered some strange mismatching which
at first I imagined was a bug in my regex string until I reduced it to the
simple test case included below.

The test case shows some regex which should match limes that contain two
fields, seperated by a comma. Each field is identical and can either be a
string that does not start with a quote or a comma and contains no commas
in it OR starts with a quote and ends with a quote and must contain only
quotes or backslashes escaped with a preceeding backslash. Ie: Two fields
which may only be simple strings or be c-style escaped strings seperated
by a comma.

Note in my expected output I am making an educated guess as to what the
actual output would be, some of the other fields printed might be a little
different. The basics of the problem however are clearly demonstrated.

The final thing to note is that if I exclude quotes from the middle or end
of the unquoted string case the problem vanishes. This leads me to suspect
the problem is somehow related to regex's handling of quotes.

Even if there are problems with my regex (I am well aware it is not
optimal or particularly "good" in any sense - be aware this is a cut down
test case only) this example clearly demonstrates php's regex engine
matching a string which contains characters which are clearly excluded in
the pattern which it matches.

I've tested this with one field and it doesn't appear to be a problem
there - it seems to only affect two fields one after another.

Reproduce code:
---------------
<?php

        $s = '"some text","test \",thing"';

        $r_text = "(\"(([^\\\"]|\\\\|\\\")*)\"|[^\",][^,]*)";
        
        $r_twofields = "${r_text},${r_text}";
        preg_match("/^${r_twofields}\$/", $s, $line);
        
        echo "<pre>";
        echo $s . "\n";
        echo $r_twofields . "\n";
        var_dump($line);
        echo "</pre>";

?>

Expected result:
----------------
"some text","test \",thing"
("(([^\"]|\\|\")*)"|[^",][^,]*),("(([^\"]|\\|\")*)"|[^",][^,]*)
array(5) {
  [0]=>
  string(27) ""some text","test \",thing""
  [1]=>
  string(20) ""some text","test \""
  [2]=>
  string(18) "some text","test \"
  [3]=>
  string(1) "\"
  [4]=>
  string(6) "thing""
}


Actual result:
--------------
"some text","test \", thing"
("(([^\"]|\\|\")*)"|[^",][^,]*),("(([^\"]|\\|\")*)"|[^",][^,]*)
array(5) {
  [0]=>
  string(28) ""some text","test \", thing""
  [1]=>
  string(20) ""some text","test \""
  [2]=>
  string(18) "some text","test \"
  [3]=>
  string(1) "\"
  [4]=>
  string(7) " thing""
}


-- 
Edit bug report at http://bugs.php.net/?id=33334&edit=1
-- 
Try a CVS snapshot (php4):   http://bugs.php.net/fix.php?id=33334&r=trysnapshot4
Try a CVS snapshot (php5.0): 
http://bugs.php.net/fix.php?id=33334&r=trysnapshot50
Try a CVS snapshot (php5.1): 
http://bugs.php.net/fix.php?id=33334&r=trysnapshot51
Fixed in CVS:                http://bugs.php.net/fix.php?id=33334&r=fixedcvs
Fixed in release:            http://bugs.php.net/fix.php?id=33334&r=alreadyfixed
Need backtrace:              http://bugs.php.net/fix.php?id=33334&r=needtrace
Need Reproduce Script:       http://bugs.php.net/fix.php?id=33334&r=needscript
Try newer version:           http://bugs.php.net/fix.php?id=33334&r=oldversion
Not developer issue:         http://bugs.php.net/fix.php?id=33334&r=support
Expected behavior:           http://bugs.php.net/fix.php?id=33334&r=notwrong
Not enough info:             
http://bugs.php.net/fix.php?id=33334&r=notenoughinfo
Submitted twice:             
http://bugs.php.net/fix.php?id=33334&r=submittedtwice
register_globals:            http://bugs.php.net/fix.php?id=33334&r=globals
PHP 3 support discontinued:  http://bugs.php.net/fix.php?id=33334&r=php3
Daylight Savings:            http://bugs.php.net/fix.php?id=33334&r=dst
IIS Stability:               http://bugs.php.net/fix.php?id=33334&r=isapi
Install GNU Sed:             http://bugs.php.net/fix.php?id=33334&r=gnused
Floating point limitations:  http://bugs.php.net/fix.php?id=33334&r=float
No Zend Extensions:          http://bugs.php.net/fix.php?id=33334&r=nozend
MySQL Configuration Error:   http://bugs.php.net/fix.php?id=33334&r=mysqlcfg

Reply via email to