GitHub user sitalkedia opened a pull request:

    https://github.com/apache/spark/pull/12309

    [SPARK-14542][CORE] PipeRDD should allow configurable buffer size for…

    ## What changes were proposed in this pull request?
    
    Currently PipedRDD internally uses PrintWriter to write data to the stdin 
of the piped process, which by default uses a BufferedWriter of buffer size 8k. 
In our experiment, we have seen that 8k buffer size is too small and the job 
spends significant amount of CPU time in system calls to copy the data. We 
should have a way to configure the buffer size for the writer.
    
    
    ## How was this patch tested?
    Ran PipedRDDSuite tests. 

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/sitalkedia/spark bufferedPipedRDD

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/12309.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #12309
    
----
commit 697433d49fde2b5f76ab2a7b986e133c435efdc3
Author: Sital Kedia <[email protected]>
Date:   2016-04-11T22:43:04Z

    [SPARK-14542][CORE] PipeRDD should allow configurable buffer size for the 
stdin writer

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to