[GitHub] [spark] dilipbiswal commented on a change in pull request #28424: [SPARK-31618][SQL] Distinct pushdown in Intersect Distinct based on stats

GitBox Sat, 02 May 2020 15:30:15 -0700


dilipbiswal commented on a change in pull request #28424:
URL: https://github.com/apache/spark/pull/28424#discussion_r419016053




##########
File path: 
sql/catalyst/src/main/scala/org/apache/spark/sql/internal/SQLConf.scala
##########
@@ -1682,6 +1682,22 @@ object SQLConf {
     .doubleConf
     .createWithDefault(0.9)
 
+  val OPTIMIZE_INTERSECT_ENABLED =
+    buildConf("spark.sql.cbo.optimizeIntersect.enabled")
+      .internal()
+      .doc("Whether to use optimized Intersect implementation or not. " +
+        "Optimized Intersect logic tries to pushdown Distinct through Join 
based on some stats")
+      .booleanConf
+      .createWithDefault(true)
+
+  val OPTIMIZE_INTERSECT_DISTINCT_REDUCTION_THRESHOLD =
+    buildConf("spark.sql.cbo.optimizeIntersect.distinctReductionThreshold")
+      .internal()
+      .doc("Ratio by which Distinct should reduce number of rows to qualify 
for pushdown" +
+        " in Intersect optimization")
+      .intConf
+      .createWithDefault(100)

Review comment:
       @prakharjain09 Hello, if the stats are wrong or incomplete (like the 
cardinality changed drastically after that last time we performed the analyze 
on the table) , we could possibly slow down especially when we push the 
distinct to the right leg , no ? @maropu isn't it better to keep this under a 
flag ?




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] dilipbiswal commented on a change in pull request #28424: [SPARK-31618][SQL] Distinct pushdown in Intersect Distinct based on stats

Reply via email to