nikhilsheoran-db opened a new pull request, #46248:
URL: https://github.com/apache/spark/pull/46248
### What changes were proposed in this pull request?
- This PR instead of calling `conf.resolver` for each call in
`resolveExpression`, reuses the `resolver` obtained once.
### Why are the changes needed?
- Consider a view with large number of columns (~1000s). When looking at the
RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE SELECT
* FROM large_view`, observed that a large fraction of time is spent in
`ResolveReferences` and `ResolveRelations`. Of these, the majority of the
driver time went in initializing the `conf` to obtain `conf.resolver` for each
of the column in the view.
- Since, the same `conf` is used in each of these calls, calling the
`conf.resolver` again and again can be avoided by initializing it once and
reusing the same resolver.
### Does this PR introduce _any_ user-facing change?
No
### How was this patch tested?
- Created a dummy view with a large number of columns.
- Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`.
Saw significant improvement here.
### Was this patch authored or co-authored using generative AI tooling?
No
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org
-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org