yadavay-amzn commented on PR #55928: URL: https://github.com/apache/spark/pull/55928#issuecomment-4480005825
@steveloughran Good analysis. The sort-then-binary-search approach is better when `k` lookups are expected on the same object. However, the Variant spec defines the sort bit specifically to avoid paying the sort cost on read -- producers that sort at write time set the bit, and readers can binary search without re-sorting. For unsorted objects (sort bit = 0), linear scan is the safe fallback per spec. A future optimization could sort-on-first-access and cache, but that changes the object's memory model (currently zero-copy over the binary buffer). Keeping it simple for now. Will address the bitmask nit. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
