nikhilsheoran-db opened a new pull request, #46248: URL: https://github.com/apache/spark/pull/46248
### What changes were proposed in this pull request? - This PR instead of calling `conf.resolver` for each call in `resolveExpression`, reuses the `resolver` obtained once. ### Why are the changes needed? - Consider a view with large number of columns (~1000s). When looking at the RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE SELECT * FROM large_view`, observed that a large fraction of time is spent in `ResolveReferences` and `ResolveRelations`. Of these, the majority of the driver time went in initializing the `conf` to obtain `conf.resolver` for each of the column in the view. - Since, the same `conf` is used in each of these calls, calling the `conf.resolver` again and again can be avoided by initializing it once and reusing the same resolver. ### Does this PR introduce _any_ user-facing change? No ### How was this patch tested? - Created a dummy view with a large number of columns. - Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`. Saw significant improvement here. ### Was this patch authored or co-authored using generative AI tooling? No -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org