Re: Databricks fails to read the csv file with blank line at the file header

Koert Kuipers Sat, 26 Mar 2016 19:05:49 -0700

To me this is expected behavior that I would not want fixed, but if you
look at the recent commits for spark-csv it has one that deals this...
On Mar 26, 2016 21:25, "Mich Talebzadeh" <[email protected]> wrote:


>
> Hi,
>
> I have a standard csv file (saved as csv in HDFS) that has first line of
> blank at the header
> as follows
>
> [blank line]
> Date, Type, Description, Value, Balance, Account Name, Account Number
> [blank line]
> 22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN
> AE","'638585-60125663",
>
> When I read this file using the following standard
>
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
> "true").option("header",
> "true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")
>
> it crashes.
>
> java.util.NoSuchElementException
>         at java.util.ArrayList$Itr.next(ArrayList.java:794)
>
>  If I go and manually delete the first blank line it works OK
>
> val df =
> sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
> "true").option("header",
> "true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")
>
> df: org.apache.spark.sql.DataFrame = [Date: string,  Type: string,
> Description: string,  Value: double,  Balance: double,  Account Name:
> string,  Account Number: string]
>
> I can easily write a shell script to get rid of blank line. I was
> wondering if databricks does have a flag to get rid of the first blank line
> in csv file format?
>
> P.S. If the file is stored as DOS text file, this problem goes away.
>
> Thanks
>
> Dr Mich Talebzadeh
>
>
>
> LinkedIn * 
> https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
> <https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
>
>
>
> http://talebzadehmich.wordpress.com
>
>
>

Re: Databricks fails to read the csv file with blank line at the file header

Reply via email to