Hi,
I have a standard csv file (saved as csv in HDFS) that has first line of
blank at the header
as follows
[blank line]
Date, Type, Description, Value, Balance, Account Name, Account Number
[blank line]
22/03/2011,SBT,"'FUNDS TRANSFER , FROM A/C 1790999",200.00,200.00,"'BROWN
AE","'638585-60125663",
When I read this file using the following standard
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")
it crashes.
java.util.NoSuchElementException
at java.util.ArrayList$Itr.next(ArrayList.java:794)
If I go and manually delete the first blank line it works OK
val df =
sqlContext.read.format("com.databricks.spark.csv").option("inferSchema",
"true").option("header",
"true").load("hdfs://rhes564:9000/data/stg/accounts/ac/")
df: org.apache.spark.sql.DataFrame = [Date: string, Type: string,
Description: string, Value: double, Balance: double, Account Name:
string, Account Number: string]
I can easily write a shell script to get rid of blank line. I was wondering
if databricks does have a flag to get rid of the first blank line in csv
file format?
P.S. If the file is stored as DOS text file, this problem goes away.
Thanks
Dr Mich Talebzadeh
LinkedIn *
https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw
<https://www.linkedin.com/profile/view?id=AAEAAAAWh2gBxianrbJd6zP6AcPCCdOABUrV8Pw>*
http://talebzadehmich.wordpress.com