Github user JoshRosen commented on the pull request: https://github.com/apache/spark/pull/2866#issuecomment-60307770 Based on a [simple benchmark](https://gist.github.com/JoshRosen/5031568144f96475ed7b), it looks like Java BitSet performs slightly better for the types of access patterns that we use in Spark right now. These access patterns can be changed to be more efficient, but this is a much more involved change that will need to be part of a larger refactoring of how we send map output statuses to reducers. Therefore, I'm going to replace my HashSet with a Java BitMap, wait for Jenkins, then commit this so we can fix this blocker.
--- If your project is set up for it, you can reply to this email and have your reply appear on GitHub as well. If your project does not have this feature enabled and wishes so, or if the feature is enabled but not working, please contact infrastructure at infrastruct...@apache.org or file a JIRA ticket with INFRA. --- --------------------------------------------------------------------- To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org For additional commands, e-mail: reviews-h...@spark.apache.org