Re: [PR] [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression [spark]

2024-04-26 Thread via GitHub


dongjoon-hyun closed pull request #46248: [SPARK-48010][SQL] Avoid repeated 
calls to conf.resolver in resolveExpression
URL: https://github.com/apache/spark/pull/46248


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



Re: [PR] [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression [spark]

2024-04-26 Thread via GitHub


dongjoon-hyun commented on PR #46248:
URL: https://github.com/apache/spark/pull/46248#issuecomment-2079911328

   Merged to master for Apache Spark 4.0.0.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org



[PR] [SPARK-48010][SQL] Avoid repeated calls to conf.resolver in resolveExpression [spark]

2024-04-26 Thread via GitHub


nikhilsheoran-db opened a new pull request, #46248:
URL: https://github.com/apache/spark/pull/46248

   ### What changes were proposed in this pull request?
   - This PR instead of calling `conf.resolver` for each call in 
`resolveExpression`, reuses the `resolver` obtained once.
   
   ### Why are the changes needed?
   - Consider a view with large number of columns (~1000s). When looking at the 
RuleExecutor metrics and flamegraph for a query that only does `DESCRIBE SELECT 
* FROM large_view`, observed that a large fraction of time is spent in 
`ResolveReferences` and `ResolveRelations`. Of these, the majority of the 
driver time went in initializing the `conf` to obtain `conf.resolver` for each 
of the column in the view.
   - Since, the same `conf` is used in each of these calls, calling the 
`conf.resolver` again and again can be avoided by initializing it once and 
reusing the same resolver.
   
   
   ### Does this PR introduce _any_ user-facing change?
   No
   
   
   ### How was this patch tested?
   - Created a dummy view with a large number of columns.
   - Observed the `RuleExecutor` metrics using `RuleExecutor.dumpTimeSpent()`. 
Saw significant improvement here.
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


-
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org