Github user mridulm commented on a diff in the pull request:
https://github.com/apache/spark/pull/19487#discussion_r144633605
--- Diff:
core/src/main/scala/org/apache/spark/internal/io/HadoopMapReduceCommitProtocol.scala
---
@@ -48,6 +49,16 @@ class HadoopMapReduceCommitProtocol(jobId: String, path:
String)
@transient private var committer: OutputCommitter = _
/**
+ * Checks whether there are files to be committed to a valid output
location.
+ *
+ * As committing and aborting a job occurs on driver, where
`addedAbsPathFiles` is always null,
+ * it is necessary to check whether a valid output path is specified.
+ * [[HadoopMapReduceCommitProtocol#path]] need not be a valid
[[org.apache.hadoop.fs.Path]] for
+ * committers not writing to distributed file systems.
+ */
+ private val hasValidPath = Try { new Path(path) }.isSuccess
--- End diff --
That would depend on whether we want to support invalid paths or not
(please see my comment below).
If we are not supporting invalid paths, I will change this to `null != path
&& "" != path` explicitly - and have driver throw `IllegalArgumentException` as
part of `commitJob` or `abortJob` as earlier.
If we do want to support invalid paths, then exception is not irrelevant :
since it indicates an explicit invalid path passed in to output committer (and
output committer will suitably log in case parameter is invalid; it is not
`HadoopMapReduceCommitProtocol`'s responsibility to do so).
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]