aokolnychyi commented on a change in pull request #30558:
URL: https://github.com/apache/spark/pull/30558#discussion_r543257534
##########
File path:
sql/catalyst/src/main/scala/org/apache/spark/sql/catalyst/optimizer/Optimizer.scala
##########
@@ -185,6 +185,9 @@ abstract class Optimizer(catalogManager: CatalogManager)
RemoveLiteralFromGroupExpressions,
RemoveRepetitionFromGroupExpressions) :: Nil ++
operatorOptimizationBatch) :+
+ // This batch rewrites data source plans and should be run after the
operator
+ // optimization batch and before any batches that depend on stats.
+ Batch("Data Source Rewrite Rules", Once, dataSourceRewriteRules: _*) :+
Review comment:
I'd fix the name and cherry-pick it to 3.1 rather than revert the change
completely. Moreover, I'd suggest including the feature in full, not just the
new batch. PR https://github.com/apache/spark/pull/30577 was submitted before
3.1 was cut as I wanted to keep the scope of PRs small and it was merged a
couple of days after 3.1 branch was created. We have many places to inject
custom rules but there is no way to inject rules after operator optimization. I
feel it is going to be really useful for many customers. I've seen people fork
session builders just because of the lack of this functionality.
Something like `postOperatorOptimizationRules` or
`preCostBasedOptimizationRules` or `preCBORules` sounds OK to me.
----------------------------------------------------------------
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
For queries about this service, please contact Infrastructure at:
[email protected]
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]