viirya commented on a change in pull request #28961:
URL: https://github.com/apache/spark/pull/28961#discussion_r449234227
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -507,6 +507,15 @@ object SQLConf {
.bytesConf(ByteUnit.BYTE)
.createWithDefaultString("256MB")
+ val SKEW_JOIN_MAX_PARTITION_SPLITS =
+ buildConf("spark.sql.adaptive.skewJoin.maxPartitionSplits")
+ .doc("The max partition number produced in a skewed join. Generate the
plan with too many " +
Review comment:
`The max number of partitions produced...`
##########
File path:
sql/core/src/main/scala/org/apache/spark/sql/execution/adaptive/OptimizeSkewedJoin.scala
##########
@@ -236,6 +236,10 @@ case class OptimizeSkewedJoin(conf: SQLConf) extends
Rule[SparkPlan] {
} {
leftSidePartitions += leftSidePartition
rightSidePartitions += rightSidePartition
+ if (leftSidePartitions.length >
conf.getConf(SQLConf.SKEW_JOIN_MAX_PARTITION_SPLITS)) {
+ throw new SparkException(s"Too many partition splits produced in
handling data skew." +
+ s" The threshold is
${conf.getConf(SQLConf.SKEW_JOIN_MAX_PARTITION_SPLITS)}")
+ }
Review comment:
Besides of this error message, should we also provide some hints like
adjusting `advisoryPartitionSizeInBytes` so users can try workaround?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]