[GitHub] [spark] wangyum commented on a diff in pull request #37930: [SPARK-40487][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel

2022-09-21 Thread GitBox


wangyum commented on code in PR #37930:
URL: https://github.com/apache/spark/pull/37930#discussion_r976127799


##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1440,4 +1440,25 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   }
 }
   }
+
+  test("SPARK-40487: Make defaultJoin in BroadcastNestedLoopJoinExec running 
in parallel") {
+withTable("t1", "t2") {
+  spark.range(5, 15).toDF("k").write.saveAsTable("t1")
+  spark.range(4, 8).toDF("k").write.saveAsTable("t2")
+
+  val queryBuildLeft =
+s"""
+   |SELECT /*+ BROADCAST(t1) */ *  FROM t1 LEFT JOIN t2 on t1.k < t2.k
+   """.stripMargin

Review Comment:
   ```suggestion
 val queryBuildLeft = "SELECT /*+ BROADCAST(t1) */ * FROM t1 LEFT JOIN 
t2 ON t1.k < t2.k"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a diff in pull request #37930: [SPARK-40487][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel

2022-09-21 Thread GitBox


wangyum commented on code in PR #37930:
URL: https://github.com/apache/spark/pull/37930#discussion_r976130112


##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1440,4 +1440,25 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   }
 }
   }
+
+  test("SPARK-40487: Make defaultJoin in BroadcastNestedLoopJoinExec running 
in parallel") {
+withTable("t1", "t2") {
+  spark.range(5, 15).toDF("k").write.saveAsTable("t1")
+  spark.range(4, 8).toDF("k").write.saveAsTable("t2")
+
+  val queryBuildLeft =
+s"""
+   |SELECT /*+ BROADCAST(t1) */ *  FROM t1 LEFT JOIN t2 on t1.k < t2.k
+   """.stripMargin
+  val result1 = sql(queryBuildLeft)
+
+  val queryBuildRight =
+s"""
+   |SELECT /*+ BROADCAST(t2) */ *  FROM t1 LEFT JOIN t2 on t1.k < t2.k
+   """.stripMargin

Review Comment:
   ```suggestion
 val queryBuildRight = "SELECT /*+ BROADCAST(t2) */ * FROM t1 LEFT JOIN 
t2 ON t1.k < t2.k"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a diff in pull request #37930: [SPARK-40487][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel

2022-09-21 Thread GitBox


wangyum commented on code in PR #37930:
URL: https://github.com/apache/spark/pull/37930#discussion_r976130112


##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1440,4 +1440,25 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   }
 }
   }
+
+  test("SPARK-40487: Make defaultJoin in BroadcastNestedLoopJoinExec running 
in parallel") {
+withTable("t1", "t2") {
+  spark.range(5, 15).toDF("k").write.saveAsTable("t1")
+  spark.range(4, 8).toDF("k").write.saveAsTable("t2")
+
+  val queryBuildLeft =
+s"""
+   |SELECT /*+ BROADCAST(t1) */ *  FROM t1 LEFT JOIN t2 on t1.k < t2.k
+   """.stripMargin
+  val result1 = sql(queryBuildLeft)
+
+  val queryBuildRight =
+s"""
+   |SELECT /*+ BROADCAST(t2) */ *  FROM t1 LEFT JOIN t2 on t1.k < t2.k
+   """.stripMargin

Review Comment:
   ```suggestion
   val queryBuildRight = "SELECT /*+ BROADCAST(t2) */ * FROM t1 LEFT JOIN t2 ON 
t1.k < t2.k"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a diff in pull request #37930: [SPARK-40487][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel

2022-09-21 Thread GitBox


wangyum commented on code in PR #37930:
URL: https://github.com/apache/spark/pull/37930#discussion_r976127799


##
sql/core/src/test/scala/org/apache/spark/sql/JoinSuite.scala:
##
@@ -1440,4 +1440,25 @@ class JoinSuite extends QueryTest with 
SharedSparkSession with AdaptiveSparkPlan
   }
 }
   }
+
+  test("SPARK-40487: Make defaultJoin in BroadcastNestedLoopJoinExec running 
in parallel") {
+withTable("t1", "t2") {
+  spark.range(5, 15).toDF("k").write.saveAsTable("t1")
+  spark.range(4, 8).toDF("k").write.saveAsTable("t2")
+
+  val queryBuildLeft =
+s"""
+   |SELECT /*+ BROADCAST(t1) */ *  FROM t1 LEFT JOIN t2 on t1.k < t2.k
+   """.stripMargin

Review Comment:
   ```suggestion
   val queryBuildLeft = "SELECT /*+ BROADCAST(t1) */ * FROM t1 LEFT JOIN t2 on 
t1.k < t2.k"
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[GitHub] [spark] wangyum commented on a diff in pull request #37930: [SPARK-40487][SQL] Make defaultJoin in BroadcastNestedLoopJoinExec running in parallel

2022-09-21 Thread GitBox


wangyum commented on code in PR #37930:
URL: https://github.com/apache/spark/pull/37930#discussion_r976121993


##
sql/core/src/main/scala/org/apache/spark/sql/execution/joins/BroadcastNestedLoopJoinExec.scala:
##
@@ -342,6 +347,13 @@ case class BroadcastNestedLoopJoinExec(
   private def getMatchedBroadcastRowsBitSet(
   streamRdd: RDD[InternalRow],
   relation: Broadcast[Array[InternalRow]]): BitSet = {
+getMatchedBroadcastRowsBitSetRDD(streamRdd, relation).
+  fold(new BitSet(relation.value.length))(_ | _)

Review Comment:
   ```suggestion
   getMatchedBroadcastRowsBitSetRDD(streamRdd, relation)
 .fold(new BitSet(relation.value.length))(_ | _)
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org