GitHub user rxin opened a pull request:

    https://github.com/apache/spark/pull/7845

    [SPARK-9517][SQL] BytesToBytesMap should encode data the same way as 
UnsafeExternalSorter

    BytesToBytesMap current encodes key/value data in the following format:
    ```
    8B key length, key data, 8B value length, value data
    ```
    
    UnsafeExternalSorter, on the other hand, encodes data this way:
    ```
    4B record length, data
    ```
    
    As a result, we cannot pass records encoded by BytesToBytesMap directly 
into UnsafeExternalSorter for sorting. However, if we rearrange data slightly, 
we can then pass the key/value records directly into UnsafeExternalSorter:
    ```
    4B key+value length, 4B key length, key data, value data
    ```


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/rxin/spark kvsort-rebase

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/7845.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #7845
    
----
commit 0744d2ccefe3d84931a4893bdc4a225f1017fa56
Author: Reynold Xin <[email protected]>
Date:   2015-08-01T00:41:25Z

    Added a KV sorter interface.

commit 2d4ad0585b576b9b1f31c0e75d08fb3e02f5fdc1
Author: Reynold Xin <[email protected]>
Date:   2015-08-01T02:15:13Z

    Updated BytesToBytesMap's data encoding to put the key first.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to