I tried DataFrame option below, not sure what that is for but doesnt seems to work.
- nullValue: specifies a string that indicates a null value, nulls in the DataFrame will be written as this string. On 11 January 2017 at 17:11, A Shaikh <shaikh.af...@gmail.com> wrote: > > > How does Spark handle null values. > > case class AvroSource(name: String, age: Integer, sal: Long, col_float: > Float, col_double: Double, col_bytes: String, col_bool: Boolean ) > > > val userDS = > spark.read.format("com.databricks.spark.avro").option("nullValue", > "x").load("./users.avro")//.as[AvroSource] > userDS.printSchema() > userDS.show() > userDS.createOrReplaceTempView("user") > spark.sql("select * from user where xdouble is not null ").show() > > > > [image: Inline images 2] > > > Adding Following lines to the code returns error which seems contradicting > to the schema which says nullable = true. how to handle null here? > > val filteredDS = userDS.filter(_.age > 30) > filteredDS.show(10) > > java.lang.RuntimeException: Null value appeared in non-nullable field: > - field (class: "scala.Double", name: "col_double") > - root class: "com.model.AvroSource" > If the schema is inferred from a Scala tuple/case class, or a Java bean, > please try to use scala.Option[_] or other nullable types (e.g. > java.lang.Integer instead of int/scala.Int). > >