Github user mn-mikke commented on a diff in the pull request:
https://github.com/apache/spark/pull/21687#discussion_r200849422
--- Diff:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/expressions/Expression.scala
---
@@ -695,6 +695,56 @@ abstract class TernaryExpression extends Expression {
}
}
+/**
+ * A trait resolving nullable, containsNull, valueContainsNull flags of
the output date type.
+ * This logic is usually utilized by expressions combining data from
multiple child expressions
+ * of non-primitive types (e.g. [[CaseWhen]]).
+ */
+trait NonPrimitiveTypeMergingExpression extends Expression
+{
+ /**
+ * A collection of data types used for resolution the output type of the
expression. By default,
+ * data types of all child expressions. The collection must not be empty.
+ */
+ @transient
+ lazy val inputTypesForMerging: Seq[DataType] = children.map(_.dataType)
+
+ /**
+ * A method determining whether the input types are equal ignoring
nullable, containsNull and
+ * valueContainsNull flags and thus convenient for resolution of the
final data type.
+ */
+ def areInputTypesForMergingEqual: Boolean = {
+ inputTypesForMerging.lengthCompare(1) <= 0 ||
inputTypesForMerging.sliding(2, 1).forall {
+ case Seq(dt1, dt2) => dt1.sameType(dt2)
+ }
+ }
+
+ private def mergeTwoDataTypes(dt1: DataType, dt2: DataType): DataType =
(dt1, dt2) match {
+ case (t1, t2) if t1 == t2 => t1
+ case (ArrayType(et1, cn1), ArrayType(et2, cn2)) =>
+ ArrayType(mergeTwoDataTypes(et1, et2), cn1 || cn2)
+ case (MapType(kt1, vt1, vcn1), MapType(kt2, vt2, vcn2)) =>
+ MapType(mergeTwoDataTypes(kt1, kt2), mergeTwoDataTypes(vt1, vt2),
vcn1 || vcn2)
+ case (StructType(fields1), StructType(fields2)) =>
+ val newFields = fields1.zip(fields2).map {
+ case (f1, f2) if f1 == f2 => f1
+ case (StructField(name, fdt1, nl1, _), StructField(_, fdt2, nl2,
_)) =>
+ StructField(name, mergeTwoDataTypes(fdt1, fdt2), nl1 || nl2)
--- End diff --
The comment of ```metadata``` field says:
> The metadata should be preserved during transformation if the content of
the column is not modified, e.g, in selection.
So I would say no, since the expressions inheriting from this trait will
combine data from multiple columns into one.
If decide to extend the definition and merge metadata from multiple
columns, it would make sense to also change ```findTightestCommonType```
method for cases when coercion rules are executed.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]