Edit report at https://bugs.php.net/bug.php?id=55413&edit=1
ID: 55413
Comment by: spidgorny at gmail dot com
Reported by: mathielen at gmail dot com
Summary: str_getcsv doesnt remove escape characters
Status: Open
Type: Bug
Package: Strings related
Operating System: ubuntu 11.04
PHP Version: 5.3.6
Block user comment: N
Private report: N
New Comment:
5.3.10 is affected too. A bug in a primitive function like this after years of
evolution should be embarrassing.
Previous Comments:
------------------------------------------------------------------------
[2012-04-27 03:08:46] darren at dcook dot org
Another way of looking at the code in comment 1 is that the behaviour is
correct (for parsing Excel-style csv), but the documentation is confusing. In
my testing the "" within quotes is being handled correctly (and the $escape
parameter is either not being used, or has not got in my way yet).
But as another viewpoint, if we take the original bug report example and do:
$line = '"A";"Some \"Stuff\"";"C"'
print_r(str_getcsv($line, ';', '"', 'x'));
(BTW, I'm using 'x' to mean no escaping; using a '' uses the default instead!!)
Output is:
Array
(
[0] => A
[1] => Some \Stuff\""
[2] => C
)
This almost makes sense if you consider it treated the second field as three
sub-strings:
"Some \"
Stuff\
""
The problem is, if that was true, the 3rd sub-string got parsed wrongly. The
3rd sub-string should have evaluated to a blank string.
Summary: something is wrong. Either there is a bug to fix, or the $escape
parameter should be removed completely, or the function needs to document the
intended behaviour for corner cases like these.
------------------------------------------------------------------------
[2011-11-27 13:58:49] xoneca at gmail dot com
The bug can be reproduced with any escape character but quote char.
Test script:
---------------
$line = '"A";"Some ""Stuff""";"C"';
$tokens = str_getcsv( $line, ';', '"', '"' );
print_r( $tokens );
Actual and Expected Result:
---------------
Array
(
[0] => A
[1] => Some "Stuff"
[2] => C
)
------------------------------------------------------------------------
[2011-08-12 13:30:02] mathielen at gmail dot com
Description:
------------
Escape-characters should only escape the next character if it is the
delimiter-character. The Escape character itself should then be removed from
the result.
Test script:
---------------
$line = '"A";"Some \"Stuff\"";"C"';
$token = str_getcsv($line, ';', '"', '\\');
print_r($token);
Expected result:
----------------
Array
(
[0] => A
[1] => Some "Stuff"
[2] => C
)
Actual result:
--------------
Array
(
[0] => A
[1] => Some \"Stuff\"
[2] => C
)
------------------------------------------------------------------------
--
Edit this bug report at https://bugs.php.net/bug.php?id=55413&edit=1