Edit report at https://bugs.php.net/bug.php?id=55763&edit=1

 ID:                 55763
 Comment by:         alotacents at gmail dot com
 Reported by:        talk at alexmingoia dot com
 Summary:            str_getcsv incorrectly handles line-breaks inside
                     fields
 Status:             Open
 Type:               Bug
 Package:            Strings related
 Operating System:   OS X 10.6
 PHP Version:        5.3.8
 Block user comment: N
 Private report:     N

 New Comment:

to split the string in to record lines I used a regular expression that makes 
sure not to split inside of double quotes instead of using the str_getcsv. Then 
I used the str_getcsv on the line.

example

$s2=<<<EOD
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
EOD;

lines = preg_split('/[\r\n]{1,2}(?=(?:[^\"]*\"[^\"]*\")*(?![^\"]*\"))/',$s2);

it outputs
Array (
 [0] => Year,Make,Model,Description,Price
 [1] => 1997,Ford,E350,"ac, abs, moon",3000.00
 [2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
 [3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
 [4] => 1996,Jeep,Grand Cherokee,"MUST SELL! air, moon roof, loaded",4799.00 
) 

to further convert 

$data = array();
foreach($lines as $row) {
 $data[] = str_getcsv($row);
}

print_r($data);

which will output

Array (
 [0] => Array (
   [0] => Year
   [1] => Make
   [2] => Model
   [3] => Description
   [4] => Price
 )
 [1] => Array (
   [0] => 1997
   [1] => Ford
   [2] => E350
   [3] => ac, abs, moon
   [4] => 3000.00
 )
 [2] => Array (
   [0] => 1999
   [1] => Chevy
   [2] => Venture "Extended Edition"
   [3] =>
   [4] => 4900.00
 )
 [3] => Array (
   [0] => 1999
   [1] => Chevy
   [2] => Venture "Extended Edition, Very Large"
   [3] => 
   [4] => 5000.00
 )
 [4] => Array (
   [0] => 1996
   [1] => Jeep
   [2] => Grand Cherokee
   [3] => MUST SELL! air, moon roof, loaded
   [4] => 4799.00
 )
)


Previous Comments:
------------------------------------------------------------------------
[2012-04-27 03:11:17] darren at dcook dot org

The problem can also be shown with the example from the Wikipedia page 
(http://en.wikipedia.org/wiki/Comma-separated_values):

$s2=<<<EOD
Year,Make,Model,Description,Price
1997,Ford,E350,"ac, abs, moon",3000.00
1999,Chevy,"Venture ""Extended Edition""","",4900.00
1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
EOD;

$lines=str_getcsv($s2,"\n");
print_r($lines);

It outputs:
Array
(
    [0] => Year,Make,Model,Description,Price
    [1] => 1997,Ford,E350,"ac, abs, moon",3000.00
    [2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
    [3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
    [4] => 1996,Jeep,Grand Cherokee,"MUST SELL!
    [5] => air, moon roof, loaded",4799.00
)

But it should output:
Array
(
    [0] => Year,Make,Model,Description,Price
    [1] => 1997,Ford,E350,"ac, abs, moon",3000.00
    [2] => 1999,Chevy,"Venture ""Extended Edition""","",4900.00
    [3] => 1999,Chevy,"Venture ""Extended Edition, Very Large""","",5000.00
    [4] => 1996,Jeep,Grand Cherokee,"MUST SELL!
air, moon roof, loaded",4799.00
)

------------------------------------------------------------------------
[2011-09-22 16:45:02] talk at alexmingoia dot com

Sorry... expected output should be

array(4) {
  [0]=>
  string(15) "Name,Desc,Email"
  [1]=>
  string(4) "Alex"
  [2]=>
  string(18) "Is a PHP
developer
"
  [3]=>
  string(16) "a...@example.com"
}

------------------------------------------------------------------------
[2011-09-22 16:41:15] talk at alexmingoia dot com

Description:
------------
RFC4180 states that fields can contain line breaks as long as they are properly 
enclosed by double-quotes.

str_getcsv treats line-breaks inside of enclosed fields as new records in the 
CSV.

Setting 'auto_detect_line_ending' to TRUE or using "\r\n" instead of "\n" still 
produces incorrect results.

Test script:
---------------
$csv = file_get_contents('test.csv');
$csvArray = str_getcsv($csv, "\n");
var_dump($csvArray);

Expected result:
----------------
array(4) {
  [0]=>
  string(15) "Name,Desc,Email"
  [1]=>
  string(4) "Alex"
  [2]=>
  string(18) "Is a PHP developer"
  [3]=>
  string(16) "a...@example.com"
}

Actual result:
--------------
array(4) {
  [0]=>
  string(15) "Name,Desc,Email"
  [1]=>
  string(14) "Alex,"Is a PHP"
  [2]=>
  string(9) "developer"
  [3]=>
  string(17) ",a...@example.com"
}


------------------------------------------------------------------------



-- 
Edit this bug report at https://bugs.php.net/bug.php?id=55763&edit=1

Reply via email to