[GitHub] [spark] sunchao commented on a change in pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

GitBox Fri, 25 Feb 2022 10:56:38 -0800


sunchao commented on a change in pull request #35657:
URL: https://github.com/apache/spark/pull/35657#discussion_r815017381




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1234,6 +1234,15 @@ object SQLConf {
     .booleanConf
     .createWithDefault(true)
 
+  val V2_BUCKETING_ENABLED = 
buildConf("spark.sql.sources.v2.bucketing.enabled")
+      .doc(s"Similar to ${BUCKETING_ENABLED.key}, this config is used to 
enable bucketing for V2 " +
+        "data sources. When turned on, Spark will recognize the specific 
distribution " +
+        "reported by a V2 data source through SupportsReportPartitioning, and 
will try to " +
+        "avoid shuffle if necessary.")
+      .version("3.3.0")
+      .booleanConf
+      .createWithDefault(false)

Review comment:
       By default this is false. Previously when V2 data sources report 
`DataSourcePartitioning`, Spark can potentially eliminate shuffle in 
aggregation. However, with this config they now have to turn this flag in order 
to get the same behavior.
   
   My primary goal is to disable storage-partitioned join by default. So 
perhaps I can introduce another flag to control the join behavior and use this 
to control the aggregate behavior, and set it to true by default.




-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] sunchao commented on a change in pull request #35657: [SPARK-37377][SQL] Initial implementation of Storage-Partitioned Join

Reply via email to