GitHub user szhem opened a pull request:
https://github.com/apache/spark/pull/19294
[SPARK-21549][CORE] Respect OutputFormats with no output directory provided
## What changes were proposed in this pull request?
Fix for https://issues.apache.org/jira/browse/SPARK-21549 JIRA issue.
Since version 2.2 Spark does not respect OutputFormat with no output paths
provided.
The examples of such formats are [Cassandra
OutputFormat](https://github.com/finn-no/cassandra-hadoop/blob/08dfa3a7ac727bb87269f27a1c82ece54e3f67e6/src/main/java/org/apache/cassandra/hadoop2/AbstractColumnFamilyOutputFormat.java),
[Aerospike
OutputFormat](https://github.com/aerospike/aerospike-hadoop/blob/master/mapreduce/src/main/java/com/aerospike/hadoop/mapreduce/AerospikeOutputFormat.java),
etc. which do not have an ability to rollback the results written to an
external systems on job failure.
Provided output directory is required by Spark to allows files to committed
to an absolute output location, that is not the case for output formats which
write data to external systems.
This pull request proposes to use Filysystem's working directory, that is
usually user's home directory in case of distributed file systems, if no output
directory is provided by means of job configuration.
## How was this patch tested?
Unit tests
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/szhem/spark SPARK-21549-abs-output-commits
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/19294.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #19294
----
commit b99344845a73b33d4ec319b6484c3104306c34ee
Author: Sergey Zhemzhitsky <[email protected]>
Date: 2017-09-20T13:07:20Z
[SPARK-21549][CORE] Respect empty output paths for files to be committed to
an absolute output location in case of custom output formats
commit 5c1474ab78f46a73236d971a23d9b112d8613405
Author: Sergey Zhemzhitsky <[email protected]>
Date: 2017-09-20T13:13:58Z
[SPARK-21549][CORE] Respect empty output paths for files to be committed to
an absolute location - reformatting imports
----
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]