If you have found a parser that works, simply read the data as text files, 
apply the parser manually, and convert to DataFrame (if needed at all),
________________________________
From: Saurabh Gulati <saurabh.gul...@fedex.com.INVALID>
Sent: Wednesday, January 4, 2023 3:45 PM
To: Sean Owen <sro...@gmail.com>
Cc: Mich Talebzadeh <mich.talebza...@gmail.com>; User <user@spark.apache.org>
Subject: [EXTERNAL] Re: Re: Incorrect csv parsing when delimiter used within 
the data


ATTENTION: This email originated from outside of GM.


Hi @Sean Owen<mailto:sro...@gmail.com>
Probably the data is incorrect, and the source needs to fix it.
But using python's csv parser returns the correct results.

import csv

with open("/tmp/test.csv") as c_file:

    csv_reader = csv.reader(c_file, delimiter=",")
    for row in csv_reader:
        print(row)

['a', 'b', 'c']
['1', '', ',see what "I did",\ni am still writing']
['2', '', 'abc']
And also, I don't understand why there is a distinction in outputs from 
df.show()​ and df.select("c").show()​

Mvg/Regards
Saurabh Gulati
Data Platform
________________________________
From: Sean Owen <sro...@gmail.com>
Sent: 04 January 2023 14:25
To: Saurabh Gulati <saurabh.gul...@fedex.com>
Cc: Mich Talebzadeh <mich.talebza...@gmail.com>; User <user@spark.apache.org>
Subject: Re: [EXTERNAL] Re: Incorrect csv parsing when delimiter used within 
the data

That input is just invalid as CSV for any parser. You end a quoted col without 
following with a col separator. What would the intended parsing be and how 
would it work?

On Wed, Jan 4, 2023 at 4:30 AM Saurabh Gulati 
<saurabh.gul...@fedex.com<mailto:saurabh.gul...@fedex.com>> wrote:

@Sean Owen<mailto:sro...@gmail.com> Also see the example below with quotes 
feedback:
"a","b","c"
"1","",",see what ""I did"","
"2","","abc"

Reply via email to