nikhilsheoran-db opened a new pull request, #46248:
URL: https://github.com/apache/spark/pull/46248

   ### What changes were proposed in this pull request?
   - This PR instead of calling `conf.resolver` for each call in 
`resolveExpression`, reuses the `resolver` obtained once.
   
   ### Why are the changes needed?
   - Consider a view with large number of columns (~1000s). When looking at the 
RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE SELECT 
* FROM large_view`, observed that a large fraction of time is spent in 
`ResolveReferences` and `ResolveRelations`. Of these, the majority of the 
driver time went in initializing the `conf` to obtain `conf.resolver` for each 
of the column in the view.
   - Since, the same `conf` is used in each of these calls, calling the 
`conf.resolver` again and again can be avoided by initializing it once and 
reusing the same resolver.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   - Created a dummy view with a large number of columns.
   - Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`. 
Saw significant improvement here.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to