richardstartin opened a new pull request #24310: use bitmap appender to optimise bitmap incrementally and avoid binary… URL: https://github.com/apache/spark/pull/24310 … search ## What changes were proposed in this pull request? This PR modifies `HighlyCompressedMapStatus` to use a new feature in RoaringBitmap which buffers insertions to 16-but containers and appends containers to the underlying bitmap as late as possible. This has two effects: the best container type is chosen incrementally, so there is no need to call `runOptimize` and there are never binary searches in the high 16 bits to locate the container to add a bit to, which improves insertion performance. This performance improvement is proportional to the number of empty blocks, but always avoids a call to `runOptimize`. ## How was this patch tested? This change was verified not to break existing unit tests. New tests to demonstrate that the new mechanism always builds a bitmap as compressed as a bitmap calling `runOptimize` were added, as well as justification (in terms of bitmap size) for the existing decision to represent empty blocks rather than full blocks.
---------------------------------------------------------------- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: [email protected] With regards, Apache Git Services --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
