yadavay-amzn commented on PR #55928:
URL: https://github.com/apache/spark/pull/55928#issuecomment-4480005825

   @steveloughran Good analysis. The sort-then-binary-search approach is better 
when `k` lookups are expected on the same object. However, the Variant spec 
defines the sort bit specifically to avoid paying the sort cost on read -- 
producers that sort at write time set the bit, and readers can binary search 
without re-sorting. For unsorted objects (sort bit = 0), linear scan is the 
safe fallback per spec.
   
   A future optimization could sort-on-first-access and cache, but that changes 
the object's memory model (currently zero-copy over the binary buffer). Keeping 
it simple for now.
   
   Will address the bitmask nit.


-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: [email protected]

For queries about this service, please contact Infrastructure at:
[email protected]


---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to