Github user viirya commented on the issue: https://github.com/apache/spark/pull/21732 Yeah. The encoding rules regarding `Option` are: 1. For non `Product` types, no change is made. ``` Option[Int] in normal encoder -> a nullable int column Option[Int] in agg encoder -> a nullable int column Option[Int] in tupled encoder -> a nullable int column ``` E.g., ```scala val df = Seq(Some(1), Some(2), None).toDF() df.printSchema root |-- value: integer (nullable = true) val df2 = Seq((1, Some(1)), (2, Some(2)), (3, None)).toDF() df2.printSchema root |-- _1: integer (nullable = false) |-- _2: integer (nullable = true) ``` 2. For `Product` types, we don't support it at top level and aggregator for now. After this change, we can support it in normal encoder and aggregator. ``` Option[Product] in normal encoder => a nullable struct column ``` E.g., ```scala val df3 = Seq(Some(1 -> "a"), Some(2 -> "b"), None).toDF() df3.printSchema root |-- value: struct (nullable = true) | |-- _1: integer (nullable = false) | |-- _2: string (nullable = true) ``` ``` Option[Product] in agg encoder => a nullable struct column // topLevel = false Option[Product] in agg encoder => a nullable struct column nested in a struct column // topLevel = true ``` ```scala val df4 = Seq( OptionBooleanIntData("bob", Some((true, 1))), OptionBooleanIntData("bob", Some((false, 2))), OptionBooleanIntData("bob", None)).toDF() val group = df4 .groupBy("name") .agg(OptionBooleanIntAggregator("isGood").toColumn.alias("isGood")) group.printSchema // topLevel = false root |-- name: string (nullable = true) |-- isGood: struct (nullable = true) | |-- _1: boolean (nullable = false) | |-- _2: integer (nullable = false) // topLevel = true root |-- name: string (nullable = true) |-- isGood: struct (nullable = true) | |-- value: struct (nullable = true) | | |-- _1: boolean (nullable = false) | | |-- _2: integer (nullable = false) ``` 3. For `Option[Product]` in tupled encoder, no change is made too: ```scala val df5 = Seq((1, Some(1 -> "a")), (2, Some(2 -> "b")), (3, None)).toDF() df5.printSchema root |-- _1: integer (nullable = false) |-- _2: struct (nullable = true) | |-- _1: integer (nullable = false) | |-- _2: string (nullable = true) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org