prakharjain09 commented on a change in pull request #28424:
URL: https://github.com/apache/spark/pull/28424#discussion_r418940221
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1682,6 +1682,22 @@ object SQLConf {
.doubleConf
.createWithDefault(0.9)
+ val OPTIMIZE_INTERSECT_ENABLED =
+ buildConf("spark.sql.cbo.optimizeIntersect.enabled")
+ .internal()
+ .doc("Whether to use optimized Intersect implementation or not. " +
+ "Optimized Intersect logic tries to pushdown Distinct through Join
based on some stats")
+ .booleanConf
+ .createWithDefault(true)
+
+ val OPTIMIZE_INTERSECT_DISTINCT_REDUCTION_THRESHOLD =
+ buildConf("spark.sql.cbo.optimizeIntersect.distinctReductionThreshold")
+ .internal()
+ .doc("Ratio by which Distinct should reduce number of rows to qualify
for pushdown" +
+ " in Intersect optimization")
+ .intConf
+ .createWithDefault(100)
Review comment:
No I didn't find any performance regression with this stats based
triggers. I kept everything behind config based on other configs used in
similar features (Ex - costBasedjoinReorder).
Should we remove the "spark.sql.cbo.optimizeIntersect.enabled" config here.
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1682,6 +1682,22 @@ object SQLConf {
.doubleConf
.createWithDefault(0.9)
+ val OPTIMIZE_INTERSECT_ENABLED =
+ buildConf("spark.sql.cbo.optimizeIntersect.enabled")
+ .internal()
+ .doc("Whether to use optimized Intersect implementation or not. " +
+ "Optimized Intersect logic tries to pushdown Distinct through Join
based on some stats")
+ .booleanConf
+ .createWithDefault(true)
+
+ val OPTIMIZE_INTERSECT_DISTINCT_REDUCTION_THRESHOLD =
+ buildConf("spark.sql.cbo.optimizeIntersect.distinctReductionThreshold")
+ .internal()
+ .doc("Ratio by which Distinct should reduce number of rows to qualify
for pushdown" +
+ " in Intersect optimization")
+ .intConf
+ .createWithDefault(100)
Review comment:
No I didn't find any performance regression with this stats based
triggers. I kept everything behind config based on other configs used in
similar features (Ex - costBasedjoinReorder).
Should we remove the "spark.sql.cbo.optimizeIntersect.enabled" config here?
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]