cloud-fan commented on a change in pull request #24236: [SPARK-27314][SQL]
Deduplicate exprIds for Union.
URL: https://github.com/apache/spark/pull/24236#discussion_r270257528
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala
##########
@@ -956,6 +956,21 @@ class Analyzer(
i.copy(right = dedupRight(left, right))
case e @ Except(left, right, _) if !e.duplicateResolved =>
e.copy(right = dedupRight(left, right))
+ case u @ Union(children) if !u.duplicateResolved =>
+ // Use projection-based de-duplication for Union to avoid behavior
changes in streaming.
Review comment:
can we be more specific here? I think it's the checkpoint sharing feature
that forces us to handle union specially.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
With regards,
Apache Git Services
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]