[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

NathanHowell Wed, 30 Nov 2016 14:55:31 -0800

GitHub user NathanHowell opened a pull request:

    https://github.com/apache/spark/pull/16089


    [SPARK-18658][SQL] Write text records directly to a FileOutputStream

    ## What changes were proposed in this pull request?
    
    This replaces uses of `TextOutputFormat` with an `OutputStream`, which will 
either write directly to the filesystem or indirectly via a compressor (if so 
configured). This avoids intermediate buffering.
    
    The inverse of this (reading directly from a stream) is necessary for 
streaming large JSON records (when `wholeFile` is enabled) so I wanted to keep 
the read and write paths symmetric.
    
    ## How was this patch tested?
    
    Existing unit tests.

You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/NathanHowell/spark SPARK-18658

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/16089.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #16089
    
----
commit 66e02959dd5f750579d29c8d79b577844df58c0c
Author: Nathan Howell <[email protected]>
Date:   2016-11-30T22:39:53Z

    [SPARK-18658][SQL] Write text records directly to a FileOutputStream

----


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #16089: [SPARK-18658][SQL] Write text records directly to...

Reply via email to