cboumalh commented on code in PR #54236:
URL: https://github.com/apache/spark/pull/54236#discussion_r2801424618
##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/plans/logical/basicLogicalOperators.scala:
##########
@@ -566,9 +566,58 @@ abstract class UnionBase extends LogicalPlan {
}
}
+/**
+ * Trait for union operators that support standard optimizer rules like
flattening and
+ * projection push-down.
+ *
+ * IMPORTANT: Child order preservation semantics:
+ * - Optimizer rules operating on UnionLike MUST preserve the order of children
+ * - Some implementations (e.g., SequentialStreamingUnion) have semantically
significant
+ * child ordering where children are processed sequentially in order
+ * - Other implementations (e.g., Union) have no semantic ordering - children
are processed
+ * in parallel and their order is insignificant for correctness
+ * - Order-dependent optimizations (e.g., reordering children for performance)
should ONLY
+ * be applied to specific implementations where order is not significant,
NOT to UnionLike
+ * in general
+ *
+ * Optimizer rules that are safe for all UnionLike implementations:
+ * - Flattening nested unions (preserves child order within the flattened
sequence)
+ * - Pushing projections down to children (each child maintains its position)
+ * - Collapsing adjacent projects (order-preserving transformation)
+ *
+ * IMPORTANT: Union and SequentialStreamingUnion should never be merged with
each other
+ * as they have different execution semantics. Use `isSameType` to check
compatibility.
+ *
+ * This trait provides:
+ * - byName: controls whether column resolution is by name vs position
+ * - allowMissingCol: allows children to have different columns (requires
byName)
+ * - children: inherited from LogicalPlan via UnionBase
+ * - isSameType: checks if another UnionLike can be merged with this one
+ */
+trait UnionLike extends LogicalPlan { self: UnionBase =>
+ def byName: Boolean
+ def allowMissingCol: Boolean
+
+ /**
+ * Returns true if the other UnionLike is the same concrete type as this one.
+ * Used during flattening to ensure Union and SequentialStreamingUnion are
not merged.
+ */
+ def isSameType(other: UnionLike): Boolean
+
+ /**
+ * Creates a new instance of this UnionLike with the specified children.
+ * Implementations should delegate to their case class copy() method.
+ */
+ def withNewChildrenSeq(newChildren: Seq[LogicalPlan]): UnionLike
Review Comment:
Nice fix, also a small nit, can we just use the function that is provided by
TreeNode instead of adding it to the trait?
https://github.com/apache/spark/blob/ac9b01ecff6f26d6f06e97afa059cee0d075ebab/sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/trees/TreeNode.scala#L363
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]