I have a simple people.csv and following SimpleApp


Working Code
Object SimpleApp {}
  case class Person(name: String, age: Long)
  def main(args: Array[String]): Unit = {
    val spark = SparkFactory.getSparkSession("PIPE2Dataset")
    import spark.implicits._

    val peopleDS = spark.read.option("inferSchema","true").option("header",
"true").option("delimiter", ",").csv("/people.csv").as[Person]

Fails for data with no header
Removing header record "name,age" AND switching header option off
=>.option("header", "false") return error => *cannot resolve '`name`' given
input columns: [_c0, _c1]*
val peopleDS = spark.read.option("inferSchema","true").option("header",
"false").option("delimiter", ",").csv("/people.csv").as[Person]

Should'nt this just assing the header from Person class

invalid data
As I've specified *.as[Person]* which does schema inferance then
*is redundant and not needed!

And lastly does .as[Person] check that column value matches with data type
i.e. "age Long" would fail if it gets a non numeric value! because the
input file could be millions of row which could be very time consuming.

Reply via email to