[PR] [SPARK-54383][SQL] Create BoundInternalRowComparableWrapper util to avoid schema cache lookups [spark]

via GitHub Mon, 17 Nov 2025 09:02:47 -0800


chirag-s-db opened a new pull request, #53097:
URL: https://github.com/apache/spark/pull/53097


   ### What changes were proposed in this pull request?
   For KeyGroupedPartitioned scans, the InternalRowComparableWrapper does a 
cache lookup of the current data types to create a schema and ordering for the 
given types. When there are a large number of partitions, this cache lookup 
(which occurs in many places once per partition) can become a bottleneck on 
physical planning. This PR creates a new `BoundInternalRowComparableWrapper` 
util that requires a precomputed schema and ordering, which ensures that this 
schema can be computed once before each partition instead of creating or 
looking up this schema in the hot path.
   
   ### Why are the changes needed?
   Removes a physical planning bottleneck.
   
   ### Does this PR introduce _any_ user-facing change?
   No.
   
   
   ### How was this patch tested?
   This change should not change any behavior (existing tests should suffice).
   
   ### Was this patch authored or co-authored using generative AI tooling?
   No.
   


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[PR] [SPARK-54383][SQL] Create BoundInternalRowComparableWrapper util to avoid schema cache lookups [spark]

Reply via email to