Github user viirya commented on a diff in the pull request:
https://github.com/apache/spark/pull/21732#discussion_r236117313
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/encoders/ExpressionEncoder.scala
---
@@ -253,10 +247,24 @@ case class ExpressionEncoder[T](
})
/**
- * Returns true if the type `T` is serialized as a struct.
+ * Returns true if the type `T` is serialized as a struct by
`objSerializer`.
*/
def isSerializedAsStruct: Boolean =
objSerializer.dataType.isInstanceOf[StructType]
+ /**
+ * Returns true if the type `T` is an `Option` type.
+ */
+ def isOptionType: Boolean =
classOf[Option[_]].isAssignableFrom(clsTag.runtimeClass)
+
+ /**
+ * If the type `T` is serialized as a struct, when it is encoded to a
Spark SQL row, fields in
+ * the struct are naturally mapped to top-level columns in a row. In
other words, the serialized
+ * struct is flattened to row. But in case of the `T` is also an
`Option` type, it can't be
+ * flattened to top-level row, because in Spark SQL top-level row can't
be null. This method
+ * returns true if `T` is serialized as struct and is not `Option` type.
+ */
+ def isSerializedAsStructForTopLevel: Boolean = isSerializedAsStruct &&
!isOptionType
--- End diff --
ok.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]