Databricks fails to read the csv file with blank line at the file header

Mich Talebzadeh Sat, 26 Mar 2016 18:25:42 -0700

Hi,

I have a standard csv file (saved as csv in HDFS) that has first line of
blank at the header
as follows


[blank line]
Date, Type, Description, Value, Balance, Account Name, Account Number
[blank line]
22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN
AE","'638585-60125663",

When I read this file using the following standard

val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")

it crashes.

java.util.NoSuchElementException
        at java.util.ArrayList$Itr.next(ArrayList.java:794)

 If I go and manually delete the first blank line it works OK

val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")

df: org.apache.spark.sql.DataFrame = [Date: string,  Type: string,
Description: string,  Value: double,  Balance: double,  Account Name:
string,  Account Number: string]

I can easily write a shell script to get rid of blank line. I was wondering
if databricks does have a flag to get rid of the first blank line in csv
file format?

P.S. If the file is stored as DOS text file, this problem goes away.

Thanks

Dr Mich Talebzadeh



LinkedIn * 
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*



http://talebzadehmich.wordpress.com

Databricks fails to read the csv file with blank line at the file header

Reply via email to