Edit report at https://bugs.php.net/bug.php?id=55413&edit=1
ID: 55413 Comment by: darren at dcook dot org Reported by: mathielen at gmail dot com Summary: str_getcsv doesnt remove escape characters Status: Open Type: Bug Package: Strings related Operating System: ubuntu 11.04 PHP Version: 5.3.6 Block user comment: N Private report: N New Comment: Yes, agree: 1. change docs to say $escape defaults to '"' 2. Change code to use $escape when it is something else (NB. IIUC, this won't break backwards compatibility.) Previous Comments: ------------------------------------------------------------------------ [2012-07-14 07:39:46] dan dot libby at gmail dot com I just ran into this bug also. I don't know the history, and haven't reviewed the str_getcsv() source yet but I am guessing that *getcsv() were originally implemented with excel style double-quote escaping. Somehow the escape='\\' param got added to the documentation, but seemingly not the code. Defaulting escape='\\' as the documentation says would potentially break apps depending on escape='"'. So that would be a breaking change, and a bad idea. But leaving it as supporting only escape='"' is also bad, because it limits the utility of the function. For example, I need to parse apache logs, and apache only supports escaping with \. whoops. So I believe the correct fix would be to default to escape='"' so we don't break apps using it with defaults, but still support explicit use of escape='\\'. agree? disagree? ------------------------------------------------------------------------ [2012-05-14 14:30:33] spidgorny at gmail dot com 5.3.10 is affected too. A bug in a primitive function like this after years of evolution should be embarrassing. ------------------------------------------------------------------------ [2012-04-27 03:08:46] darren at dcook dot org Another way of looking at the code in comment 1 is that the behaviour is correct (for parsing Excel-style csv), but the documentation is confusing. In my testing the "" within quotes is being handled correctly (and the $escape parameter is either not being used, or has not got in my way yet). But as another viewpoint, if we take the original bug report example and do: $line = '"A";"Some \"Stuff\"";"C"' print_r(str_getcsv($line, ';', '"', 'x')); (BTW, I'm using 'x' to mean no escaping; using a '' uses the default instead!!) Output is: Array ( [0] => A [1] => Some \Stuff\"" [2] => C ) This almost makes sense if you consider it treated the second field as three sub-strings: "Some \" Stuff\ "" The problem is, if that was true, the 3rd sub-string got parsed wrongly. The 3rd sub-string should have evaluated to a blank string. Summary: something is wrong. Either there is a bug to fix, or the $escape parameter should be removed completely, or the function needs to document the intended behaviour for corner cases like these. ------------------------------------------------------------------------ [2011-11-27 13:58:49] xoneca at gmail dot com The bug can be reproduced with any escape character but quote char. Test script: --------------- $line = '"A";"Some ""Stuff""";"C"'; $tokens = str_getcsv( $line, ';', '"', '"' ); print_r( $tokens ); Actual and Expected Result: --------------- Array ( [0] => A [1] => Some "Stuff" [2] => C ) ------------------------------------------------------------------------ [2011-08-12 13:30:02] mathielen at gmail dot com Description: ------------ Escape-characters should only escape the next character if it is the delimiter-character. The Escape character itself should then be removed from the result. Test script: --------------- $line = '"A";"Some \"Stuff\"";"C"'; $token = str_getcsv($line, ';', '"', '\\'); print_r($token); Expected result: ---------------- Array ( [0] => A [1] => Some "Stuff" [2] => C ) Actual result: -------------- Array ( [0] => A [1] => Some \"Stuff\" [2] => C ) ------------------------------------------------------------------------ -- Edit this bug report at https://bugs.php.net/bug.php?id=55413&edit=1