Edit report at https://bugs.php.net/bug.php?id=55413&edit=1

 ID:                 55413
 Comment by:         dan dot libby at gmail dot com
 Reported by:        mathielen at gmail dot com
 Summary:            str_getcsv doesnt remove escape characters
 Status:             Open
 Type:               Bug
 Package:            Strings related
 Operating System:   ubuntu 11.04
 PHP Version:        5.3.6
 Block user comment: N
 Private report:     N

 New Comment:

I just ran into this bug also.

I don't know the history, and haven't reviewed the str_getcsv() source yet but 
I am guessing that *getcsv() were originally implemented with excel style 
double-quote escaping.  Somehow the escape='\\' param got added to the 
documentation, but seemingly not the code.

Defaulting escape='\\' as the documentation says would potentially break apps 
depending on escape='"'.  So that would be a breaking change, and a bad idea.

But leaving it as supporting only escape='"' is also bad, because it limits the 
utility of the function.  For example, I need to parse apache logs, and apache 
only supports escaping with \.   whoops.

So I believe the correct fix would be to default to escape='"' so we don't 
break apps using it with defaults, but still support explicit use of 
escape='\\'.

agree?  disagree?


Previous Comments:
------------------------------------------------------------------------
[2012-05-14 14:30:33] spidgorny at gmail dot com

5.3.10 is affected too. A bug in a primitive function like this after years of 
evolution should be embarrassing.

------------------------------------------------------------------------
[2012-04-27 03:08:46] darren at dcook dot org

Another way of looking at the code in comment 1 is that the behaviour is 
correct (for parsing Excel-style csv), but the documentation is confusing. In 
my testing the "" within quotes is being handled correctly (and the $escape 
parameter is either not being used, or has not got in my way yet).

But as another viewpoint, if we take the original bug report example and do:
  $line = '"A";"Some \"Stuff\"";"C"'
  print_r(str_getcsv($line, ';', '"', 'x'));

(BTW, I'm using 'x' to mean no escaping; using a '' uses the default instead!!)

Output is:

Array
(
    [0] => A
    [1] => Some \Stuff\""
    [2] => C
)

This almost makes sense if you consider it treated the second field as three 
sub-strings:
  "Some \"
  Stuff\
  ""

The problem is, if that was true, the 3rd sub-string got parsed wrongly. The 
3rd sub-string should have evaluated to a blank string.

Summary: something is wrong. Either there is a bug to fix, or the $escape 
parameter should be removed completely, or the function needs to document the 
intended behaviour for corner cases like these.

------------------------------------------------------------------------
[2011-11-27 13:58:49] xoneca at gmail dot com

The bug can be reproduced with any escape character but quote char.

Test script:
---------------
$line = '"A";"Some ""Stuff""";"C"';
$tokens = str_getcsv( $line, ';', '"', '"' );
print_r( $tokens );

Actual and Expected Result:
---------------
Array
(
    [0] => A
    [1] => Some "Stuff"
    [2] => C
)

------------------------------------------------------------------------
[2011-08-12 13:30:02] mathielen at gmail dot com

Description:
------------
Escape-characters should only escape the next character if it is the 
delimiter-character. The Escape character itself should then be removed from 
the result.

Test script:
---------------
$line = '"A";"Some \"Stuff\"";"C"';
$token = str_getcsv($line, ';', '"', '\\');
print_r($token);

Expected result:
----------------
Array
(
    [0] => A
    [1] => Some "Stuff"
    [2] => C
)


Actual result:
--------------
Array
(
    [0] => A
    [1] => Some \"Stuff\"
    [2] => C
)


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=55413&edit=1

Reply via email to