GitHub user tejasapatil reopened a pull request:

    https://github.com/apache/spark/pull/14475

    [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`

    ## What changes were proposed in this pull request?
    
    Jira: https://issues.apache.org/jira/browse/SPARK-16862
    
    `BufferedInputStream` used in `UnsafeSorterSpillReader` uses the default 8k 
buffer to read data off disk. This PR makes it configurable to improve on disk 
reads. I have kept the default value to be same as it was before (8k) so there 
would not be any change in current behavior.
    
    ## How was this patch tested?
    
    I am relying on the existing unit tests.
    
    ## Performance
    
    After deploying this change to prod and setting the config to 1 mb, there 
was a 12% reduction in the CPU time and 19.5% reduction in CPU reservation time.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/tejasapatil/spark spill_buffer

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/14475.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #14475
    
----
commit 6d6feef965d07cb18b6806b588c649637f52eb56
Author: Tejas Patil <[email protected]>
Date:   2016-08-03T01:00:46Z

    [SPARK-16862] Configurable buffer size in `UnsafeSorterSpillReader`

commit cac4ebe1fa31f79e8c460bc9d86995a686659b65
Author: Tejas Patil <[email protected]>
Date:   2016-08-03T05:35:16Z

    review comments #1

commit 6ce70a9995287aa19a042a80c77ecdbb1f56fe4f
Author: Tejas Patil <[email protected]>
Date:   2016-08-03T07:35:55Z

    Handle test case failure due to Sparkenv being null
    
    ```
    [error] Test 
org.apache.spark.unsafe.map.BytesToBytesMapOffHeapSuite.spillInIterator failed: 
java.lang.NullPointerException: null, took 0.008 sec
    [error]     at 
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillReader.<init>(UnsafeSorterSpillReader.java:60)
    [error]     at 
org.apache.spark.util.collection.unsafe.sort.UnsafeSorterSpillWriter.getReader(UnsafeSorterSpillWriter.java:150)
    [error]     at 
org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.advanceToNextPage(BytesToBytesMap.java:291)
    [error]     at 
org.apache.spark.unsafe.map.BytesToBytesMap$MapIterator.next(BytesToBytesMap.java:320)
    [error]     at 
org.apache.spark.unsafe.map.AbstractBytesToBytesMapSuite.spillInIterator(AbstractBytesToBytesMapSuite.java:577)
    [error]     ...
    ```

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to