Github user lemire commented on a diff in the pull request:
https://github.com/apache/spark/pull/9661#discussion_r44665152
--- Diff: core/src/main/scala/org/apache/spark/scheduler/MapStatus.scala ---
@@ -176,15 +179,17 @@ private[spark] object HighlyCompressedMapStatus {
// From a compression standpoint, it shouldn't matter whether we track
empty or non-empty
--- End diff --
This comment about compression is maybe generally slightly misleading. Even
a BitSet can use a different amount of memory when you reverse the bits. Though
some compression schemes (e.g., EWAH as used in Hive, see
https://github.com/lemire/javaewah) are mostly symmetric with respect to bit
flips, it is not the case with the RoaringBitmap library (though Lucene's
Roaring implementation appears to be symmetric). I'd recommend either removing
or qualifying this comment. (The comment is not wrong because it says "should".)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]