GitHub user NathanHowell opened a pull request:

    https://github.com/apache/spark/pull/16089

    [SPARK-18658][SQL] Write text records directly to a FileOutputStream

    ## What changes were proposed in this pull request?
    
    This replaces uses of `TextOutputFormat` with an `OutputStream`, which will 
either write directly to the filesystem or indirectly via a compressor (if so 
configured). This avoids intermediate buffering.
    
    The inverse of this (reading directly from a stream) is necessary for 
streaming large JSON records (when `wholeFile` is enabled) so I wanted to keep 
the read and write paths symmetric.
    
    ## How was this patch tested?
    
    Existing unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NathanHowell/spark SPARK-18658

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16089
    
----
commit 66e02959dd5f750579d29c8d79b577844df58c0c
Author: Nathan Howell <nhow...@godaddy.com>
Date:   2016-11-30T22:39:53Z

    [SPARK-18658][SQL] Write text records directly to a FileOutputStream

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at infrastruct...@apache.org or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: reviews-unsubscr...@spark.apache.org
For additional commands, e-mail: reviews-h...@spark.apache.org

Reply via email to