Github user skambha commented on the issue:
https://github.com/apache/spark/pull/23206
Today in Spark, the extension points API `injectOptimizerRule` method
allows the rules to be injected at the end in
`extendedOperatorOptimizationRules` and this becomes 2 batches separated by
removing the InferFiltersFromConstraints rule, ie: "Operator Optimization
before Inferring Filters", "Operator Optimization after Inferring Filters". As
you can see, even here we have a usecase of an order, where we want the rules
to kick in before the InferFiltersFromConstraint rule.
What this PR proposes is a method to inject a rule in a specific place in a
batch. For our usecase, we have optimization rules defined that need to be
kicked in after a certain rule. Just a case like above. The position the
rules get kicked in by default now, alter the plan and our optimization rule
doesn't get kicked in.
The other method that is proposed is adding a batch. This is similar to
what exists today already in postHocOptimizationBatches and
preHocOptimizationBatches, but this is not exposed in SparkSessionExtensions
API. So the proposed method `injectOptimizerBatch` just exposes this as part
of the extension points so we can make use of it.
I agree this adds logic to the Optimizer to compute batches. The new code
is structured in such a way that if these new inject methods are not used, it
will not use any of the computation of the new logic. There is one check to
see if there are any new rules or batches to be injected, if not, then the code
is as before.
Hope this helps some.
The SparkSessionExtension APIs is experimental and for third party
developers who want to add extensions to Spark without the need to get their
code merged into Spark. If there are other ways to achieve this without
changing Spark code, please share your thoughts. Thanks.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]