[GitHub] [spark] LantaoJin commented on a change in pull request #29021: [SPARK-32201][SQL] More general skew join pattern matching

GitBox Tue, 07 Jul 2020 00:50:39 -0700


LantaoJin commented on a change in pull request #29021:
URL: https://github.com/apache/spark/pull/29021#discussion_r450674200




##########
File path: 
sql/core/src/test/scala/org/apache/spark/sql/execution/adaptive/AdaptiveQueryExecSuite.scala
##########
@@ -730,6 +713,67 @@ class AdaptiveQueryExecSuite
     }
   }
 
+  private def checkSkewJoin(
+      joins: Seq[SortMergeJoinExec],
+      leftSkewNum: Int,
+      rightSkewNum: Int): Unit = {
+    assert(joins.size == 1 && joins.head.isSkewJoin)
+    assert(joins.head.left.collect {
+      case r: CustomShuffleReaderExec => r
+    }.head.partitionSpecs.collect {
+      case p: PartialReducerPartitionSpec => p.reducerIndex
+    }.distinct.length == leftSkewNum)
+    assert(joins.head.right.collect {
+      case r: CustomShuffleReaderExec => r
+    }.head.partitionSpecs.collect {
+      case p: PartialReducerPartitionSpec => p.reducerIndex
+    }.distinct.length == rightSkewNum)
+  }
+
+  test("SPARK-32201: handle general skew join pattern") {

Review comment:
       In our internal Spark, we add a configuration to disable this check to 
maximize the skew join optimization.
   ` if (numShuffles > 0 && conf.getConf(SQLConf.SKEW_JOIN_NO_EXTRA_SHUFFLE)) 
`. So the above UT can work well. But this is another topic. Or we could 
rewrite a new unit case to avoid this check.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] LantaoJin commented on a change in pull request #29021: [SPARK-32201][SQL] More general skew join pattern matching

Reply via email to