GitHub user davies opened a pull request:

    https://github.com/apache/spark/pull/16844

    [SPARK-19500] [SQL] Fix off-by-one bug in BytesToBytesMap

    ## What changes were proposed in this pull request?
    
    Radix sort require that half of array as free (as temporary space), so we 
use 0.5 as the scale factor to make sure that BytesToBytesMap will not have 
more items than 1/2 of capacity. Turned out this is not true, the current 
implementation of append() could leave 1 more item than the threshold (1/2 of 
capacity) in the array, which break the requirement of radix sort (fail the 
assert in 2.2, or fail to insert into InMemorySorter in 2.1).
    
    This PR fix the off-by-one bug in BytesToBytesMap.
    
    ## How was this patch tested?
    
    Added regression test.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/davies/spark off_by_one

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16844.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16844
    
----

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to