Re: CSV empty columns handling in Spark 2.0.2

2017-03-16 Thread Hyukjin Kwon
I think this is fixed in https://github.com/apache/spark/pull/15767

This should be fixed in 2.1.0.


2017-03-17 3:28 GMT+09:00 George Obama :

> Hello,
>
>
>
> I am using spark 2.0.2 to read the CSV file with empty columns and is
> hitting the issue:
>
> scala>val df = sqlContext.read.option("header", true).option("inferSchema", 
> true).csv("file location")
>
> 17/03/13 07:26:26 WARN DataSource: Error while looking for metadata directory.
>
>
> scala> df.show()
>
> 17/03/13 07:26:41 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 4)
> java.lang.NumberFormatException: null
>
> at java.lang.Integer.parseInt(Integer.java:542)
> at java.lang.Integer.parseInt(Integer.java:615)
> at 
> scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
> at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:241)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:85)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:128)
> at 
> org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:127)
> at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
> at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
> at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
>
>
> Could you help me with this issue please?
>
>
> Besides, what does the warning mean?
>
> 17/03/13 07:26:26 WARN DataSource: Error while looking for metadata directory.
>
> Regards, John
>
>
>


CSV empty columns handling in Spark 2.0.2

2017-03-16 Thread George Obama
Hello,



I am using spark 2.0.2 to read the CSV file with empty columns and is hitting 
the issue:
scala>val df = sqlContext.read.option("header", true).option("inferSchema", 
true).csv("file location")
17/03/13 07:26:26 WARN DataSource: Error while looking for metadata directory.

scala> df.show()
17/03/13 07:26:41 ERROR Executor: Exception in task 0.0 in stage 3.0 (TID 4)
java.lang.NumberFormatException: null
at java.lang.Integer.parseInt(Integer.java:542)
at java.lang.Integer.parseInt(Integer.java:615)
at 
scala.collection.immutable.StringLike$class.toInt(StringLike.scala:272)
at scala.collection.immutable.StringOps.toInt(StringOps.scala:29)
at 
org.apache.spark.sql.execution.datasources.csv.CSVTypeCast$.castTo(CSVInferSchema.scala:241)
at 
org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:116)
at 
org.apache.spark.sql.execution.datasources.csv.CSVRelation$$anonfun$csvParser$3.apply(CSVRelation.scala:85)
at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:128)
at 
org.apache.spark.sql.execution.datasources.csv.CSVFileFormat$$anonfun$buildReader$1$$anonfun$apply$2.apply(CSVFileFormat.scala:127)
at scala.collection.Iterator$$anon$12.nextCur(Iterator.scala:434)
at scala.collection.Iterator$$anon$12.hasNext(Iterator.scala:440)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)

Could you help me with this issue please? 


Besides, what does the warning mean?
17/03/13 07:26:26 WARN DataSource: Error while looking for metadata directory.
Regards,
John