GitHub user JoshRosen opened a pull request:

    https://github.com/apache/spark/pull/6159

    [SPARK-7251] Perform sequential scan when iterating over BytesToBytesMap

    This patch modifies `BytesToBytesMap.iterator()` to iterate through records 
in the order that they appear in the data pages rather than iterating through 
the hashtable pointer arrays. This results in fewer random memory accesses, 
significantly improving performance for scan-and-copy operations.
    
    This is possible because our data pages are laid out as sequences of 
`[keyLength][data][valueLength][data]` entries.  In order to mark the end of a 
partially-filled data page, we write `-1` as a special end-of-page length 
(BytesToByesMap supports empty/zero-length keys and values, which is why we had 
to use a negative length).
    
    This patch incorporates / closes #5836.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/JoshRosen/spark SPARK-7251

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/6159.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #6159
    
----
commit 273b842cce2fc0afa06640723743869af0fe6f94
Author: Josh Rosen <[email protected]>
Date:   2015-05-14T21:55:49Z

    [SPARK-7251] Perform sequential scan when iterating over entries in 
BytesToBytesMap

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to