GitHub user gengliangwang opened a pull request:
https://github.com/apache/spark/pull/21381
refactor ExecuteWriteTask
## What changes were proposed in this pull request?
As I am working on File data source V2 write path [in my repo
](https://github.com/gengliangwang/spark/blob/47f39e1f54bc748e116ae9580413fae317898327/sql/core/src/main/scala/org/apache/spark/sql/execution/datasources/v2/FileSourceWriter.scala#L78),
I find it essential to refactor ExecuteWriteTask in FileFormatWriter with
DataWriter of Data source V2:
1. Reuse the code in both `FileFormat` and Data Source V2
2. Better abstraction, callers only need to call `commit()` or `abort` at
the end of task. Also there is less code in `SingleDirectoryWriteTask` and
`DynamicPartitionWriteTask`.
This PR is part of data source V2 migration. Definitions of related classes
is moved to a new file, and `ExecuteWriteTask` is rename to
`FileFormatDataWriter`
## How was this patch tested?
Existing unit test
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/gengliangwang/spark refactorExecuteWriteTask
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/21381.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #21381
----
commit cbd4ce2959bdfe63dff32d0c36b2982fcde22aac
Author: Gengliang Wang <gengliang.wang@...>
Date: 2018-05-21T12:16:14Z
refactor ExecuteWriteTask
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]