>
> As I've specified *.as[Person]* which does schema inferance then
> *"option("inferSchema","true")" *is redundant and not needed!
The resolution of fields is done by name, not by position for case
classes. This is what allows us to support more complex things like JSON
or nested structures. If you you just want to map it by position you can
do .as[(String, Long)] to map it to a tuple instead.
And lastly does .as[Person] check that column value matches with data type
> i.e. "age Long" would fail if it gets a non numeric value! because the
> input file could be millions of row which could be very time consuming.
No, this is a static check based on the schema. It does not scan the data
(though schema inference does).
On Tue, Jan 10, 2017 at 11:34 AM, A Shaikh wrote:
> I have a simple people.csv and following SimpleApp
>
>
> people.csv
> --
> name,age
> abc,22
> xyz,32
>
>
> Working Code
>
> Object SimpleApp {}
> case class Person(name: String, age: Long)
> def main(args: Array[String]): Unit = {
> val spark = SparkFactory.getSparkSession("PIPE2Dataset")
> import spark.implicits._
>
> val peopleDS = spark.read.option("inferSchema","true").option("header",
> "true").option("delimiter", ",").csv("/people.csv").as[Person]
> }
>
>
>
>
> Fails for data with no header
>
> Removing header record "name,age" AND switching header option off
> =>.option("header", "false") return error => *cannot resolve '`name`'
> given input columns: [_c0, _c1]*
> val peopleDS = spark.read.option("inferSchema","true").option("header",
> "false").option("delimiter", ",").csv("/people.csv").as[Person]
>
> Should'nt this just assing the header from Person class
>
>
>
> invalid data
>
> As I've specified *.as[Person]* which does schema inferance then
> *"option("inferSchema","true")"
> *is redundant and not needed!
>
>
> And lastly does .as[Person] check that column value matches with data type
> i.e. "age Long" would fail if it gets a non numeric value! because the
> input file could be millions of row which could be very time consuming.
>