GitHub user steveloughran opened a pull request:
https://github.com/apache/spark/pull/18111
[SPARK-20886][CORE] HadoopMapReduceCommitProtocol to fail meaningfully if
FileOutputCommitter.getWorkPath==null
## What changes were proposed in this pull request?
Handles the situation where a `FileOutputCommitter.getWorkPath()` returns
`null` by a `require()` call and a message which explains the problem and
includes the `toString` value of the committer for better diagnostics.
The situation occurs if the committer being passed in is a job committer,
not a task committer, that is: it was initalised with a `JobAttemptContext` not
a `TaskAttemptContext`.
The existing code does an `Option(workPath.toString).getOrElse(path)`
which *may* be an attempt to handle the null path case. If so, it isn't,
because its the `.toString()` call which is failing. If people do think that
code should be resilient to null work paths, that line could be changed.
However, it may hide the underlying problem: the committer is misconfigured.
It may be a rare-occurence today, but it is more likely with modified
subclasses of `FileOutputCommitter`, as well as possible
with some ongoing work of mine in Hadoop to better support commitment to
cloud storage infrastructures.
## How was this patch tested?
Manually. The before & after stack traces are on the JIRA.
You can merge this pull request into a Git repository by running:
$ git pull https://github.com/steveloughran/spark
cloud/SPARK-20886-committer-NPE
Alternatively you can review and apply these changes as the patch at:
https://github.com/apache/spark/pull/18111.patch
To close this pull request, make a commit to your master/trunk branch
with (at least) the following in the commit message:
This closes #18111
----
commit 02eb7bf0ee6b81841f22e3c46d822eaebb28e85c
Author: Steve Loughran <[email protected]>
Date: 2017-05-25T15:46:50Z
SPARK-20886 HadoopMapReduceCommitProtocol to fail with message if
FileOutputCommitter.getWorkPath==null
Add a requirement.
The existing code does an Option.getWorkpath.toString() which *may* be an
attempt to handle the null path case. If so, it isn't, because its the
.toString() which is failing.
Change-Id: Idddf9813761e7008425542f96903bce12bedd978
----
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]