The quote options seem to be related to escaping quotes and the dataset
isn't escaaping quotes. As I said quoted strings with embedded commas is
something that pandas handles easily, and even Excel does that as well.
Femi
On Sun, Nov 6, 2016 at 6:59 AM, Hyukjin Kwon wrote:
> Hi Femi,
>
> Have
Hi Femi,
Have you maybe tried the quote related options specified in the
documentation?
http://spark.apache.org/docs/latest/api/python/pyspark.sql.html#pyspark.sql.DataFrameReader.csv
Thanks.
2016-11-06 6:58 GMT+09:00 Femi Anthony :
> Hi, I am trying to process a very large comma delimited csv
Hi, I am trying to process a very large comma delimited csv file and I am
running into problems.
The main problem is that some fields contain quoted strings with embedded
commas.
It seems as if PySpark is unable to properly parse lines containing such
fields like say Pandas does.
Here is the code