How does Spark handle null values. case class AvroSource(name: String, age: Integer, sal: Long, col_float: Float, col_double: Double, col_bytes: String, col_bool: Boolean )
val userDS = spark.read.format("com.databricks.spark.avro").option("nullValue", "x").load("./users.avro")//.as[AvroSource] userDS.printSchema() userDS.show() userDS.createOrReplaceTempView("user") spark.sql("select * from user where xdouble is not null ").show() [image: Inline images 2] Adding Following lines to the code returns error which seems contradicting to the schema which says nullable = true. how to handle null here? val filteredDS = userDS.filter(_.age > 30) filteredDS.show(10) java.lang.RuntimeException: Null value appeared in non-nullable field: - field (class: "scala.Double", name: "col_double") - root class: "com.model.AvroSource" If the schema is inferred from a Scala tuple/case class, or a Java bean, please try to use scala.Option[_] or other nullable types (e.g. java.lang.Integer instead of int/scala.Int).