sunchao commented on a change in pull request #35657:
URL: https://github.com/apache/spark/pull/35657#discussion_r815017381
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1234,6 +1234,15 @@ object SQLConf {
.booleanConf
.createWithDefault(true)
+ val V2_BUCKETING_ENABLED =
buildConf("spark.sql.sources.v2.bucketing.enabled")
+ .doc(s"Similar to ${BUCKETING_ENABLED.key}, this config is used to
enable bucketing for V2 " +
+ "data sources. When turned on, Spark will recognize the specific
distribution " +
+ "reported by a V2 data source through SupportsReportPartitioning, and
will try to " +
+ "avoid shuffle if necessary.")
+ .version("3.3.0")
+ .booleanConf
+ .createWithDefault(false)
Review comment:
By default this is false. Previously when V2 data sources report
`DataSourcePartitioning`, Spark can potentially eliminate shuffle in
aggregation. However, with this config they now have to turn this flag in order
to get the same behavior.
My primary goal is to disable storage-partitioned join by default. So
perhaps I can introduce another flag to control the join behavior and use this
to control the aggregate behavior, and set it to true by default.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]