I have a simple people.csv and following SimpleApp
people.csv ---------- name,age abc,22 xyz,32 ******************************** Working Code ******************************** Object SimpleApp {} case class Person(name: String, age: Long) def main(args: Array[String]): Unit = { val spark = SparkFactory.getSparkSession("PIPE2Dataset") import spark.implicits._ val peopleDS = spark.read.option("inferSchema","true").option("header", "true").option("delimiter", ",").csv("/people.csv").as[Person] } ******************************** ******************************** Fails for data with no header ******************************** Removing header record "name,age" AND switching header option off =>.option("header", "false") return error => *cannot resolve '`name`' given input columns: [_c0, _c1]* val peopleDS = spark.read.option("inferSchema","true").option("header", "false").option("delimiter", ",").csv("/people.csv").as[Person] Should'nt this just assing the header from Person class ******************************** invalid data ******************************** As I've specified *.as[Person]* which does schema inferance then *"option("inferSchema","true")" *is redundant and not needed! And lastly does .as[Person] check that column value matches with data type i.e. "age Long" would fail if it gets a non numeric value! because the input file could be millions of row which could be very time consuming.