[GitHub] [spark] viirya commented on a change in pull request #28961: [SPARK-32143][SQL] Prevent a skewed join from producing too many partition splits

GitBox Thu, 02 Jul 2020 12:59:08 -0700


viirya commented on a change in pull request #28961:
URL: https://github.com/apache/spark/pull/28961#discussion_r449234227




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -507,6 +507,15 @@ object SQLConf {
       .bytesConf(ByteUnit.BYTE)
       .createWithDefaultString("256MB")
 
+  val SKEW_JOIN_MAX_PARTITION_SPLITS =
+    buildConf("spark.sql.adaptive.skewJoin.maxPartitionSplits")
+      .doc("The max partition number produced in a skewed join. Generate the 
plan with too many " +

Review comment:
       `The max number of partitions produced...`

##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -236,6 +236,10 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends 
Rule[SparkPlan] {
         } {
           leftSidePartitions += leftSidePartition
           rightSidePartitions += rightSidePartition
+          if (leftSidePartitions.length > 
conf.getConf(SQLConf.SKEW_JOIN_MAX_PARTITION_SPLITS)) {
+            throw new SparkException(s"Too many partition splits produced in 
handling data skew." +
+              s" The threshold is 
${conf.getConf(SQLConf.SKEW_JOIN_MAX_PARTITION_SPLITS)}")
+          }

Review comment:
       Besides of this error message, should we also provide some hints like 
adjusting `advisoryPartitionSizeInBytes` so users can try workaround?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] viirya commented on a change in pull request #28961: [SPARK-32143][SQL] Prevent a skewed join from producing too many partition splits

Reply via email to