Github user viirya commented on a diff in the pull request: https://github.com/apache/spark/pull/21732#discussion_r230739979 --- Diff: sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala --- @@ -207,7 +198,7 @@ case class ExpressionEncoder[T]( val serializer: Seq[NamedExpression] = { val clsName = Utils.getSimpleName(clsTag.runtimeClass) - if (isSerializedAsStruct) { + if (isSerializedAsStruct && !classOf[Option[_]].isAssignableFrom(clsTag.runtimeClass)) { --- End diff -- I think some places are needed to check Option too. I will also add few more tests to cover some use cases. One possible place is Dataset.groupByKey. Before that, I may need #22944 to be merged first. So I can write something like: ```scala val ds = Seq(Some(("a", 1)), Some(("b", 2)), Some(("c", 3))).toDS() ds.groupByKey(_.map(_._2).getOrElse("d")).agg(sum("value._2").as[Long], sum($"value._2" + 1).as[Long]) ```
--- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org