AngersZhuuuu commented on a change in pull request #28032:
URL: https://github.com/apache/spark/pull/28032#discussion_r495736123
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -2038,6 +2038,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val REPARTITION_BEFORE_INSERT =
+ buildConf("spark.sql.execution.repartitionBeforeInsert")
Review comment:
> If this is better in 90% and worse in 10% cases , I might be okay. If
it's better in 50% and worse in 50% cases, is it worthwhile?
For dynamic partition write, file size is `(shuffle partition size) * (table
partition size)`. after repartition file size is ` (table partition size)`,
add `RepartitionByExpression` shuffle data is quick since without other
computation. we should concern is data skew, with AQE we can control each
partition's size to match expected file size.
IMO, if this pr can give a test case of data skew and how it behavior with
AQE is better.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]