Re: CSV parsing issue

Sean Owen Thu, 28 May 2020 08:34:48 -0700

I don't think so, that data is inherently ambiguous and incorrectly
formatted. If you know something about the structure, maybe you can rewrite
the middle column manually to escape the inner quotes and then reparse.


On Thu, May 28, 2020 at 10:25 AM elango vaidyanathan <elango...@gmail.com>
wrote:

> Is there any way I can handle it in code?
>
> Thanks,
> Elango
>
> On Thu, May 28, 2020, 8:52 PM Sean Owen <sro...@gmail.com> wrote:
>
>> Your data doesn't escape double-quotes.
>>
>> On Thu, May 28, 2020 at 10:21 AM elango vaidyanathan <elango...@gmail.com>
>> wrote:
>>
>>>
>>> Hi team,
>>>
>>> I am loading an CSV. One column contains a json value. I am unable to
>>> parse that column properly. Below is the details. Can you please check once?
>>>
>>>
>>>
>>> val df1=spark.read.option("inferSchema","true").
>>> option("header","true").option("quote", "\"")
>>>
>>> .option("escape",
>>> "\"").csv("/FileStore/tables/sample_file_structure.csv")
>>>
>>>
>>>
>>> sample data:
>>>
>>> ----------------
>>>
>>> column1,column2,column3
>>>
>>> 123456789,"{   "moveId" : "123456789",   "dob" : null,   "username" :
>>> "abcdef",   "language" : "en" }",11
>>>
>>> 123456789,"{   "moveId" : "123456789",   "dob" : null,   "username" :
>>> "ghi, jkl",   "language" : "en" }",12 123456789,"{   "moveId" :
>>> "123456789",   "dob" : null,   "username" : "mno, pqr",   "language" : "en"
>>> }",13
>>>
>>>
>>>
>>> output:
>>>
>>> -----------
>>>
>>> +---------+--------------------+---------------+
>>>
>>> | column1| column2| column3 |
>>>
>>> +---------+--------------------+---------------+
>>>
>>> |123456789|"{ "moveId" : "...| "dob" : null|
>>>
>>> |123456789|"{ "moveId" : "...| "dob" : null|
>>>
>>> +---------+--------------------+---------------+
>>>
>>>
>>>
>>> Thanks,
>>> Elango
>>>
>>

Re: CSV parsing issue

Reply via email to