GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/11870

    [SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation

    ## What changes were proposed in this pull request?
    
    Currently, for the key that can not fit within a long,  we build a hash map 
for UnsafeHashedRelation, it's converted to BytesToBytesMap after serialization 
and deserialization. We should build a BytesToBytesMap directly to have better 
memory efficiency.
    
    In order to do that, BytesToBytesMap should support multiple (K,V) pair 
with the same K, a new API called `append()` is added to BytesToBytesMap, which 
only look for empty slot and put key there. Also add a `next()` for 
BytesToBytesMap.Location to check if there is any pair following current one 
that have the same key.
    
    ## How was this patch tested?
    
    Existing tests.
    
    


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark map2

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/11870.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #11870
    
----
commit d18e5edda8b403ea3655a84e0bb7d8f00c8ebf9b
Author: Davies Liu <[email protected]>
Date:   2016-03-21T23:14:37Z

    build a BytesToBytesMap directly in HashedRelation

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to