GitHub user cloud-fan opened a pull request:

    [SPARK-23376][SQL] creating UnsafeKVExternalSorter with BytesToBytesMap may 

    ## What changes were proposed in this pull request?
    This is a long-standing bug in `UnsafeKVExternalSorter` and was reported in 
the dev list multiple times.
    When creating `UnsafeKVExternalSorter` with `BytesToBytesMap`, we need to 
create a `UnsafeInMemorySorter` to sort the data in `BytesToBytesMap`. The data 
format of the sorter and the map is same, so no data movement is required. 
However, both the sorter and the map need a point array for some bookkeeping 
    There is an optimization in `UnsafeKVExternalSorter`: reuse the point array 
between the sorter and the map, to avoid an extra memory allocation. This 
sounds like a reasonable optimization, the length of the `BytesToBytesMap` 
point array is at least 4 times larger than the number of keys(to avoid hash 
collision, the hash table size should be at least 2 times larger than the 
number of keys, and each key occupies 2 slots). `UnsafeInMemorySorter` needs 
the pointer array size to be 4 times of the number of entries, so we are safe 
to reuse the point array.
    However, the number of keys of the map doesn't equal to the number of 
entries in the map, because `BytesToBytesMap` supports duplicated keys. This 
breaks the assumption of the above optimization and we may run out of space 
when inserting data into the sorter, and hit error
    java.lang.IllegalStateException: There is no space for new record
    This PR fixes this bug by creating a new point array if the existing one is 
not big enough.
    ## How was this patch tested?
    a new test

You can merge this pull request into a Git repository by running:

    $ git pull bug

Alternatively you can review and apply these changes as the patch at:

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #20561



To unsubscribe, e-mail:
For additional commands, e-mail:

Reply via email to