Re: [PR] [SPARK-51259][SQL] Refactor and improve performance for natural and using join keys computation [spark]

via GitHub Wed, 19 Feb 2025 06:48:33 -0800


mihailotim-db commented on code in PR #50009:
URL: https://github.com/apache/spark/pull/50009#discussion_r1961806667



##########
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/analysis/Analyzer.scala:
##########
@@ -3564,55 +3564,17 @@ class Analyzer(override val catalogManager: 
CatalogManager) extends RuleExecutor
       hint: JoinHint): LogicalPlan = {
     import org.apache.spark.sql.catalyst.util._
 
-    val leftKeys = joinNames.map { keyName =>
-      left.output.find(attr => resolver(attr.name, keyName)).getOrElse {
-        throw QueryCompilationErrors.unresolvedUsingColForJoinError(
-          keyName, left.schema.fieldNames.sorted.map(toSQLId).mkString(", "), 
"left")
-      }
-    }
-    val rightKeys = joinNames.map { keyName =>
-      right.output.find(attr => resolver(attr.name, keyName)).getOrElse {
-        throw QueryCompilationErrors.unresolvedUsingColForJoinError(
-          keyName, right.schema.fieldNames.sorted.map(toSQLId).mkString(", "), 
"right")
-      }
-    }
-    val joinPairs = leftKeys.zip(rightKeys)
-
-    val newCondition = (condition ++ 
joinPairs.map(EqualTo.tupled)).reduceOption(And)
-
-    // columns not in joinPairs
-    val lUniqueOutput = left.output.filterNot(att => leftKeys.contains(att))
-    val rUniqueOutput = right.output.filterNot(att => rightKeys.contains(att))

Review Comment:
   I changed the return type of `computeKeysForNaturalOrUsingJoin` from `Seq` 
to `AttributeSet` to avoid quadratic lookups here. I think it makes sense to 
make this change in this PR, but it can be moved to a followup. Wdyt @cloud-fan 
@vladimirg-db ?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Re: [PR] [SPARK-51259][SQL] Refactor and improve performance for natural and using join keys computation [spark]

Reply via email to