GitHub user aarondav opened a pull request:

    https://github.com/apache/spark/pull/1502

    SPARK-2047: Introduce an in-mem Sorter, and use it to reduce mem usage

    ### Why and what?
    Currently, the AppendOnlyMap performs an "in-place" sort by converting its 
array of [key, value, key, value] pairs into a an array of [(key, value), (key, 
value)] pairs. However, this causes us to allocate many Tuple2 objects, which 
come at a nontrivial overhead.
    
    This patch adds a Sorter API, intended for in memory sorts, which simply 
ports the Java OpenJDK 6 implementation of Arrays.sort() (which uses a merge 
sort) and abstracts the interface in a way which introduces no more than 1 
virtual function invocation of overhead at each abstraction point.
    
    Please compare our port of the Java 6 sort with the original 
implementation: http://www.diffchecker.com/kh9ufcqo
    
    ### Memory implications
    An AppendOnlyMap contains N kv pairs, which results in roughly 2N elements 
within its underlying array. Each of these elements is 4 bytes wide in a 
[compressed 
OOP](https://wikis.oracle.com/display/HotSpotInternals/CompressedOops) system, 
which is the default.
    
    Today's approach immediately allocates N Tuple2 objects, which take up 24N 
bytes in total (exposed via YourKit), and undergoes a Java sort. The Java 6 
version immediately copies the entire array (4N bytes here), while the Java 7 
version has a worst-case allocation of half the array (2N bytes).
    This results in a sorting overhead of 24N + 4N = 28N bytes (for Java 6).
    
    The Sorter does not require allocating any tuples, but since it uses the 
Java 6 merge sort algorithm, it does copy the entire array (and that is the 
entire array, not just the half needed for Tuples).
    This results in a sorting overhead of 8N bytes.
    
    Thus, we have reduced the overhead of the sort by roughly 20 bytes times 
the number of elements.
    
    ### Performance implications
    As the destructiveSortedIterator is used for spilling in an 
ExternalAppendOnlyMap, the purpose of this patch is to provide stability by 
reducing memory usage rather than improve performance.
    
    Indeed, this PR implements Java 6's merge sort rather than the Java 7 
Timsort, which is much more performant. A future optimization is to port the 
Timsort over, which the SortDataFormat API should support with minimal changes.
    
    Nevertheless, here are the results of a microbenchmark that sorted 25 
million, randomly distributed (Float, Int) pairs. The Java Arrays.sort() tests 
were run **only on the keys**, and thus moved less data. Our current 
implementation is called "Tuple-sort using Arrays.sort()". 
    
    <table>
    <tr><th>Test</th><th>First run (JDK6)</th><th>Average of 10 
(JDK6)</th><th>First run (JDK7)</th><th>Average of 10 (JDK7)</th></tr>
    <tr><td>primitive Arrays.sort()</td><td>3216 ms</td><td>1190 
ms</td><td>2724 ms</td><td>131 ms (!!)</td></tr>
    <tr><td>Arrays.sort()</td><td>18564 ms</td><td>2006 ms</td><td>13201 
ms</td><td>878 ms</td></tr>
    <tr><td>Tuple-sort using Arrays.sort()</td><td>31813 ms</td><td>3550 
ms</td><td>20990 ms</td><td>1919 ms</td></tr>
    <tr><td><b>KV-array using Sorter</b></td><td></td><td></td><td><b>18232 
ms</b></td><td><b>2030 ms</b></td></tr>
    <tr><td>Microbenchmarks are stupid</td><td></td><td></td><td>25708 
ms</td><td>2400 ms</td></tr>
    </table>
    
    Note that the final test was the same as "KV-array using Sorter", but with 
a second implementation of SortDataFormat loaded in the JVM (presumably causing 
a de-opt for the virtual function call shortcircuit).
    
    The results show that this Sorter performs exactly as expected -- it is 
about as fast as the Java 6 Arrays.sort() (which shares the same algorithm), 
but is significantly faster than the Tuple-sort on Java 6.
    
    The Java 7 Timsort provided a huge speedup in this benchmark, suggesting 
that using it instead would result in roughly a 2x speedup. However, the 
Tuple-based approach is still not significantly faster despite the much better 
algorithm.
    
    In short, this patch should significantly improve performance for users 
running Java 6, and provide a minor performance degradation for users running 
Java 7.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/aarondav/spark sort

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/1502.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #1502
    
----
commit 6307338402e7872368bf61993b1be4ed7302cb6e
Author: Aaron Davidson <[email protected]>
Date:   2014-07-19T19:21:29Z

    SPARK-2047: Introduce an in-mem Sorter, and use it to reduce mem usage
    
    Currently, the AppendOnlyMap performs an "in-place" sort by converting
    its array of [key, value, key, value] pairs into a an array of
    [(key, value), (key, value)] pairs. However, this causes us to allocate
    many Tuple2 objects, which come at a nontrivial overhead.
    
    This patch adds a Sorter API, intended for in memory sorts, which simply
    ports the Java OpenJDK 6 implementation of Arrays.sort() (which uses
    a merge sort) and abstracts the interface in a way which introduces no more
    than 1 virtual function invocation of overhead at each abstraction point.
    
    Please compare our port of the Java 6 sort with the original implementation:
        http://www.diffchecker.com/kh9ufcqo
    
    === Memory implications ===
    An AppendOnlyMap contains N kv pairs, which results in roughly 2N elements
    within its underlying array. Each of these elements is 4 bytes wide in a
    [compressed 
OOP](https://wikis.oracle.com/display/HotSpotInternals/CompressedOops) system, 
which is the default.
    
    Today's approach immediately allocates N Tuple2 objects, which take up 24N 
bytes
    in total (exposed via YourKit), and undergoes a Java sort. The Java 6 
version
    immediately copies the entire array (4N bytes here), while the Java 7
    version has a worst-case allocation of half the array (2N bytes).
    This results in a sorting overhead of 24N + 4N = 28N bytes (for Java 6).
    
    The Sorter does not require allocating any tuples, but since it uses the 
Java 6
    merge sort algorithm, it does copy the entire array (and that is the entire 
array,
    not just the half needed for Tuples).
    This results in a sorting overhead of 8N bytes.
    
    Thus, we have reduced the overhead of the sort by roughly 20 bytes times the
    number of elements.
    
    === Performance implications ===
    As the destructiveSortedIterator is used for spilling in an 
ExternalAppendOnlyMap,
    the purpose of this patch is to provide stability by reducing memory usage 
rather
    than improve performance.
    
    Indeed, this PR implements Java 6's merge sort rather than the Java 7 
Timsort, which
    is much more performant. A future optimization is to port the Timsort over, 
which the
    SortDataFormat API should support with minimal changes.
    
    Nevertheless, here are the results of a microbenchmark that sorted 25 
million, randomly
    distributed (Float, Int) pairs. The Java Arrays.sort() tests were run 
**only on the keys**,
    and thus moved less data. Our current implementation is called "Tuple-sort 
using Arrays.sort()".
    
    <table>
    <tr><th>Java version</th><th>Test</th><th>First run</th><th>Average of 
10</th></tr>
    <tr><td>6</td><td>primitive Arrays.sort()</td><td>3216 ms</td><td>1190 
ms</td></tr>
    <tr><td>6</td><td>Arrays.sort()</td><td>18564 ms</td><td>2006 ms</td></tr>
    <tr><td>6</td><td>Tuple-sort using Arrays.sort()</td><td>31813 
ms</td><td>3550 ms</td></tr>
    <tr><td>7</td><td>primitive Arrays.sort()</td><td>2724 ms</td><td>131 ms 
(!!)</td></tr>
    <tr><td>7</td><td>Arrays.sort()</td><td>13201 ms</td><td>878 ms</td></tr>
    <tr><td>7</td><td>Tuple-sort using Arrays.sort()</td><td>20990 
ms</td><td>1919 ms</td></tr>
    <tr><td>7</td><td>**KV-sort using Sorter**</td><td>**18232 
ms**</td><td>**2030 ms**</td></tr>
    <tr><td>7</td><td>Microbenchmarks are stupid</td><td>25708 ms</td><td>2400 
ms</td></tr>
    </table>
    
    Note that the final test was the same as KV-sort using Sorter, but with a 
second
    impelementation of SortDataFormat loaded in the JVM (presumably causing a 
de-opt
    for the virtual function call shortcircuit).
    
    The results show that this Sorter performs exactly as expected -- it is 
about as fast as the
    Java 6 Arrays.sort() (which shares the same algorithm), but is 
significantly faster than
    the Tuple-sort on Java 6.
    
    The Java 7 Timsort provided a huge speedup in this benchmark, suggesting 
that using it instead
    would result in roughly a 2x speedup. However, the Tuple-based approach is 
still not
    significantly faster despite the much better algorithm.
    
    In short, this patch should significantly improve performance on users 
running Java 6, and
    provide a minor performance degradation for users running Java 7.

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

Reply via email to