[GitHub] spark pull request #21381: refactor ExecuteWriteTask

gengliangwang Mon, 21 May 2018 06:05:31 -0700

GitHub user gengliangwang opened a pull request:

    https://github.com/apache/spark/pull/21381


    refactor ExecuteWriteTask

    ## What changes were proposed in this pull request?
    As I am working on File data source V2 write path [in my repo 
](https://github.com/gengliangwang/spark/blob/47f39e1f54bc748e116ae9580413fae317898327/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileSourceWriter.scala#L78),
 I find it essential to refactor ExecuteWriteTask in FileFormatWriter with 
DataWriter of Data source V2:
    1. Reuse the code in both `FileFormat` and Data Source V2
    2. Better abstraction, callers only need to call `commit()` or `abort` at 
the end of task. Also there is less code in `SingleDirectoryWriteTask` and 
`DynamicPartitionWriteTask`.
    
    This PR is part of data source V2 migration. Definitions of related classes 
is moved to a new file, and `ExecuteWriteTask` is rename to 
`FileFormatDataWriter`
    
    
    ## How was this patch tested?
    Existing unit test


You can merge this pull request into a Git repository by running:

    $ git pull https://github.com/gengliangwang/spark refactorExecuteWriteTask

Alternatively you can review and apply these changes as the patch at:

    https://github.com/apache/spark/pull/21381.patch

To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:

    This closes #21381
    
----
commit cbd4ce2959bdfe63dff32d0c36b2982fcde22aac
Author: Gengliang Wang <gengliang.wang@...>
Date:   2018-05-21T12:16:14Z

    refactor ExecuteWriteTask

----


---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

[GitHub] spark pull request #21381: refactor ExecuteWriteTask

Reply via email to