Re: [PR] [SPARK-47955][SQL] Improve `DeduplicateRelations` performance [spark]

via GitHub Thu, 25 Apr 2024 09:04:11 -0700


viirya commented on code in PR #46183:
URL: https://github.com/apache/spark/pull/46183#discussion_r1579758611



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/DeduplicateRelations.scala:
##########
@@ -38,28 +38,29 @@ case class RelationWrapper(cls: Class[_], outputAttrIds: 
Seq[Long])
 object DeduplicateRelations extends Rule[LogicalPlan] {
   override def apply(plan: LogicalPlan): LogicalPlan = {
     val newPlan = renewDuplicatedRelations(mutable.HashSet.empty, plan)._1
-    if (newPlan.find(p => p.resolved && p.missingInput.nonEmpty).isDefined) {
-      // Wait for `ResolveMissingReferences` to resolve missing attributes 
first
-      return newPlan
-    }
+
+    def noMissingInput(p: LogicalPlan) = !p.exists(_.missingInput.nonEmpty)
+
     newPlan.resolveOperatorsUpWithPruning(
       _.containsAnyPattern(JOIN, LATERAL_JOIN, AS_OF_JOIN, INTERSECT, EXCEPT, 
UNION, COMMAND),
       ruleId) {
       case p: LogicalPlan if !p.childrenResolved => p
       // To resolve duplicate expression IDs for Join.
-      case j @ Join(left, right, _, _, _) if !j.duplicateResolved =>
+      case j @ Join(left, right, _, _, _) if !j.duplicateResolved && 
noMissingInput(right) =>

Review Comment:
   The removed comment `Wait for `ResolveMissingReferences` to resolve missing 
attributes first` looks a good reference to know why we check no missing input 
here. Can we keep it?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-47955][SQL] Improve `DeduplicateRelations` performance [spark]

Reply via email to