Forgot to reply-all last message, whoops. Not very good at email. You need to normalize the CSV with a parser that can escape commas inside of strings Not sure if Spark has an option for this?
On Wed, May 25, 2022 at 4:37 PM Sid <flinkbyhe...@gmail.com> wrote: > Thank you so much for your time. > > I have data like below which I tried to load by setting multiple options > while reading the file but however, but I am not able to consolidate the > 9th column data within itself. > > [image: image.png] > > I tried the below code: > > df = spark.read.option("header", "true").option("multiline", > "true").option("inferSchema", "true").option("quote", > > '"').option( > "delimiter", ",").csv("path") > > What else I can do? > > Thanks, > Sid > > > On Thu, May 26, 2022 at 1:46 AM Apostolos N. Papadopoulos < > papad...@csd.auth.gr> wrote: > >> Dear Sid, >> >> can you please give us more info? Is it true that every line may have a >> different number of columns? Is there any rule followed by >> >> every line of the file? From the information you have sent I cannot >> fully understand the "schema" of your data. >> >> Regards, >> >> Apostolos >> >> >> On 25/5/22 23:06, Sid wrote: >> > Hi Experts, >> > >> > I have below CSV data that is getting generated automatically. I can't >> > change the data manually. >> > >> > The data looks like below: >> > >> > 2020-12-12,abc,2000,,INR, >> > 2020-12-09,cde,3000,he is a manager,DOLLARS,nothing >> > 2020-12-09,fgh,,software_developer,I only manage the development part. >> > >> > Since I don't have much experience with the other domains. >> > >> > It is handled by the other people.,INR >> > 2020-12-12,abc,2000,,USD, >> > >> > The third record is a problem. Since the value is separated by the new >> > line by the user while filling up the form. So, how do I handle this? >> > >> > There are 6 columns and 4 records in total. These are the sample >> records. >> > >> > Should I load it as RDD and then may be using a regex should eliminate >> > the new lines? Or how it should be? with ". /n" ? >> > >> > Any suggestions? >> > >> > Thanks, >> > Sid >> >> -- >> Apostolos N. Papadopoulos, Associate Professor >> Department of Informatics >> Aristotle University of Thessaloniki >> Thessaloniki, GREECE >> tel: ++0030312310991918 >> email: papad...@csd.auth.gr >> twitter: @papadopoulos_ap >> web: http://datalab.csd.auth.gr/~apostol >> >> >> --------------------------------------------------------------------- >> To unsubscribe e-mail: user-unsubscr...@spark.apache.org >> >>