GitHub user ooq opened a pull request:

    https://github.com/apache/spark/pull/13960

    [SQL] Support null handling for vectorized hashmap during hash aggregate

    ## What changes were proposed in this pull request?
    
    The current impl of vectorized hashmap does not support null keys. This 
patch fix the problem by adding `generateFindOrInsertWithNullable()` method in 
`VectorizedHashMapGenerator.scala`, which code-generates another version of 
`findOrInsert` that handles null keys. 
    
    We need null support so the aggregate logic does not have to fallback to 
BytesToBytesMap. This would also us to remove BytesToBytesMap completely.
    
    ## How was this patch tested?
    
    No additional test is added. A simple benchmark test is included to show 
the performance gain.


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/ooq/spark spt_nullable_in_vhm

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/13960.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #13960
    
----
commit a1b099c94db6b90983a51d6eaf2b465336b998ea
Author: Qifan Pu <[email protected]>
Date:   2016-06-29T00:54:09Z

    support null handling in vectorized hash map

commit 4355039cafa96de29c7b0add4a426691d5b4428c
Author: Qifan Pu <[email protected]>
Date:   2016-06-29T01:18:22Z

    add a benchmark test

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to