[GitHub] [spark] viirya commented on a diff in pull request #41875: [SPARK-44317][SQL] Use PartitionEvaluator API in ShuffledHashJoinExec

via GitHub Wed, 12 Jul 2023 00:01:09 -0700


viirya commented on code in PR #41875:
URL: https://github.com/apache/spark/pull/41875#discussion_r1260694736



##########
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/HashJoin.scala:
##########
@@ -766,4 +598,331 @@ object HashJoin extends CastSupport with SQLConfHelper {
         ansiEnabled = false)
     }
   }
+
+  private def streamedBoundKeys(streamedKeys: Seq[Expression], streamedOutput: 
Seq[Attribute]) =
+    bindReferences(HashJoin.rewriteKeyExpr(streamedKeys), streamedOutput)
+  private def streamSideKeyGenerator(
+      streamedKeys: Seq[Expression],
+      streamedOutput: Seq[Attribute]): UnsafeProjection =
+    UnsafeProjection.create(streamedBoundKeys(streamedKeys, streamedOutput))
+
+  def boundCondition(
+      condition: Option[Expression],
+      joinType: JoinType,
+      buildSide: BuildSide,
+      buildPlanOutput: Seq[Attribute],
+      streamedPlanOutput: Seq[Attribute]): InternalRow => Boolean = if 
(condition.isDefined) {
+    if (joinType == FullOuter && buildSide == BuildLeft) {
+      // Put join left side before right side. This is to be consistent with
+      // `ShuffledHashJoinExec.fullOuterJoin`.
+      Predicate.create(condition.get, buildPlanOutput ++ 
streamedPlanOutput).eval _
+    } else {
+      Predicate.create(condition.get, streamedPlanOutput ++ 
buildPlanOutput).eval _
+    }
+  } else { (r: InternalRow) =>
+    true
+  }
+
+  private def createResultProjection(
+      joinType: JoinType,
+      output: Seq[Attribute],
+      buildPlanOutput: Seq[Attribute],
+      streamedPlanOutput: Seq[Attribute]): (InternalRow) => InternalRow = {
+    joinType match {
+      case LeftExistence(_) =>
+        UnsafeProjection.create(output, output)
+      case _ =>
+        // Always put the stream side on left to simplify implementation
+        // both of left and right side could be null
+        UnsafeProjection.create(
+          output, (streamedPlanOutput ++ 
buildPlanOutput).map(_.withNullability(true)))
+    }
+  }
+  def join(hashJoinParams: HashJoinParams): Iterator[InternalRow] = {
+
+    val streamedIter: Iterator[InternalRow] = hashJoinParams.streamedIter
+    val hashed: HashedRelation = hashJoinParams.hashedRelation
+    val streamedKeys: Seq[Expression] = hashJoinParams.streamedKeys
+    val streamedOutput: Seq[Attribute] = hashJoinParams.streamedOutput
+    val condition: Option[Expression] = hashJoinParams.condition
+    val joinType: JoinType = hashJoinParams.joinType
+    val buildSide: BuildSide = hashJoinParams.buildSide
+    val buildPlanOutput: Seq[Attribute] = hashJoinParams.buildPlanOutput
+    val streamedPlanOutput: Seq[Attribute] = hashJoinParams.streamedPlanOutput
+    val output: Seq[Attribute] = hashJoinParams.output
+    val numOutputRows: SQLMetric = hashJoinParams.numOutputRows
+
+    val joinedIter = joinType match {
+      case _: InnerLike =>
+        innerJoin(
+          streamedIter,
+          hashed,
+          streamedKeys,
+          streamedOutput,
+          condition,
+          joinType,
+          buildSide,
+          buildPlanOutput,
+          streamedPlanOutput)

Review Comment:
   Hmm, seems previously these methods only take two parameters (stream 
iterator and hash relation. Maybe we can initiates the two parameters before 
calling these methods, so they can keep two parameters as before?



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a diff in pull request #41875: [SPARK-44317][SQL] Use PartitionEvaluator API in ShuffledHashJoinExec

Reply via email to