Github user JoshRosen commented on a diff in the pull request:

    https://github.com/apache/spark/pull/7924#discussion_r36327861
  
    --- Diff: 
core/src/main/java/org/apache/spark/unsafe/map/BytesToBytesMap.java ---
    @@ -256,6 +272,15 @@ public boolean hasNext() {
         @Override
         public Location next() {
           int totalLength = PlatformDependent.UNSAFE.getInt(pageBaseObject, 
offsetInPage);
    +      if (destructive) {
    +        MemoryLocation keyAddress = loc.getKeyAddress();
    +        Object keyBaseObject = keyAddress.getBaseObject();
    +        long keyBaseOffset = keyAddress.getBaseOffset();
    +
    +        int hashcode = HASHER.hashUnsafeWords(keyBaseObject, 
keyBaseOffset, loc.getKeyLength());
    +        int pos = hashcode & map.mask;
    +        this.map.bitset.unset(pos);
    +      }
    --- End diff --
    
    This block is what originally confused me.  This is going to involve a huge 
performance hit to have to re-hash each record. Do we need to free the bitset? 
I suppose that not doing so leaves things in a slightly inconsistent state, but 
in my imagined use case for the destructive iterator we would never call any 
methods on the map after calling the destructive iterator and thus this would 
be safe.  Is there some consideration that I've overlooked here?


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to