Thanks for replying Micah.
But Spark is able to read this file without the commas for missing data.
Is the behavior of spark.read.csv not same as pyarrow csv?

This file is part of spark csv tests - 
https://github.com/apache/spark/blob/f36a5fb2b88620c1c490d087b0293c4e58d29979/sql/core/src/test/scala/org/apache/spark/sql/execution/datasources/csv/CSVSuite.scala#L120

Thanks
Sricheta.

From: Micah Kornfield <[email protected]>
Sent: Thursday, March 17, 2022 4:28 PM
To: [email protected]
Subject: [EXTERNAL] Re: [pyarrow] CSV parse error: Expected 5 columns, got 3

You don't often get email from 
[email protected]<mailto:[email protected]>. Learn why this is 
important<http://aka.ms/LearnAboutSenderIdentification>
I believe the Arrow parser expects the last line to be:
"2015,Chevy,Volt,,"
(i.e. have commas for the missing data).

On Thu, Mar 17, 2022 at 3:23 PM Sricheta Ruj 
<[email protected]<mailto:[email protected]>> wrote:
Hello.

I am using pyarrow csv module.

from pyarrow import csv
fn = '/home/srruj/cars.csv'
read_options=csv.ReadOptions(column_names=('year', 'make', 'model', 'comment', 
'blank'))

convert_options = csv.ConvertOptions(include_columns=column_names=('year', 
'make', 'model', 'comment', 'blank'),

                                     include_missing_columns=True,

                                     strings_can_be_null=True)

table = csv.read_csv(fn, read_options=read_options, 
convert_options=convert_options)
table

I am getting the following error :
Csv parse error: Expected 5 columns, got 3

This is how file looks:

year,make,model,comment,blank
"2012","Tesla","S","No comment",
1997,Ford,E350,"Go get one now they are going fast",
2015,Chevy,Volt

I am able to read this file from spark using spark.read.csv(..) but not using 
pyarrow.

Can you please help?

Thanks
Sricheta.


Reply via email to