[GitHub] [spark] aokolnychyi commented on a change in pull request #30808: [WIP] [SPARK-33784][SQL] Rename dataSourceRewriteRules batch

GitBox Thu, 17 Dec 2020 03:14:15 -0800


aokolnychyi commented on a change in pull request #30808:
URL: https://github.com/apache/spark/pull/30808#discussion_r544908719




##########
File path: 
sql/core/src/main/scala/org/apache/spark/sql/SparkSessionExtensions.scala
##########
@@ -200,19 +200,20 @@ class SparkSessionExtensions {
     optimizerRules += builder
   }
 
-  private[this] val dataSourceRewriteRules = mutable.Buffer.empty[RuleBuilder]
+  private[this] val postOperatorOptimizationRules = 
mutable.Buffer.empty[RuleBuilder]
 
-  private[sql] def buildDataSourceRewriteRules(session: SparkSession): 
Seq[Rule[LogicalPlan]] = {
-    dataSourceRewriteRules.map(_.apply(session)).toSeq
+  private[sql] def buildPostOperatorOptimizationRules(
+      session: SparkSession): Seq[Rule[LogicalPlan]] = {
+    postOperatorOptimizationRules.map(_.apply(session)).toSeq
   }
 
   /**
-   * Inject an optimizer `Rule` builder that rewrites data source plans into 
the [[SparkSession]].
+   * Inject an optimizer `Rule` builder that rewrites logical plans into the 
[[SparkSession]].
    * The injected rules will be executed after the operator optimization batch 
and before rules
    * that depend on stats.
    */
-  def injectDataSourceRewriteRule(builder: RuleBuilder): Unit = {
-    dataSourceRewriteRules += builder
+  def injectPostOperatorOptimizationRule(builder: RuleBuilder): Unit = {

Review comment:
       I do share some of the points brought by @cloud-fan. There is no clear 
separation between CBO and RBO and we run quite some rule-based optimizations 
after the existing cost-based optimization rule. Also, we need to run this 
batch before early scan pushdown rules to capture possible rewrites, which, in 
turn, run before CBO. That’s why `preCBORules` does seem to ideally convey what 
this rule does.
   
   Our existing hook in the extensions API says `The injected rules will be 
executed during the operator optimization batch`. That was one of my 
motivations for the current name as we have resolution and post-hoc resolution 
rules.
   
   That said, I don’t mean `postOperatorOptimizationRules` is a perfect name 
either. It just seems to better reflect the place in the Spark optimizer. I’ll 
be happy to discuss and iterate further.




----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

For queries about this service, please contact Infrastructure at:
[email protected]



---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] [spark] aokolnychyi commented on a change in pull request #30808: [WIP] [SPARK-33784][SQL] Rename dataSourceRewriteRules batch

Reply via email to