GitHub user davies opened a pull request:
https://github.com/apache/spark/pull/11870
[SPARK-14052] [SQL] build a BytesToBytesMap directly in HashedRelation
## What changes were proposed in this pull request?
Currently, for the key that can not fit within a long, we build a hash map
for UnsafeHashedRelation, it's converted to BytesToBytesMap after serialization
and deserialization. We should build a BytesToBytesMap directly to have better
memory efficiency.
In order to do that, BytesToBytesMap should support multiple (K,V) pair
with the same K, a new API called `append()` is added to BytesToBytesMap, which
only look for empty slot and put key there. Also add a `next()` for
BytesToBytesMap.Location to check if there is any pair following current one
that have the same key.
## How was this patch tested?
Existing tests.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/davies/spark map2
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/11870.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #11870
----
commit d18e5edda8b403ea3655a84e0bb7d8f00c8ebf9b
Author: Davies Liu <[email protected]>
Date: 2016-03-21T23:14:37Z
build a BytesToBytesMap directly in HashedRelation
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]