Github user vanzin commented on a diff in the pull request:
https://github.com/apache/spark/pull/19848#discussion_r154157813
--- Diff:
core/src/main/scala/org/apache/spark/mapred/SparkHadoopMapRedUtil.scala ---
@@ -70,7 +79,14 @@ object SparkHadoopMapRedUtil extends Logging {
if (shouldCoordinateWithDriver) {
val outputCommitCoordinator = SparkEnv.get.outputCommitCoordinator
val taskAttemptNumber = TaskContext.get().attemptNumber()
- val canCommit = outputCommitCoordinator.canCommit(jobId, splitId,
taskAttemptNumber)
+ var canCommit: Boolean = true
+ // This checks whether the commitTask provided by stageId, which
if not the canCommit
+ // will use jobId as stageId to decide whether the commit should
be possible
+ if (stageId != -1) {
--- End diff --
In which case would this happen? Would it be hard to change the API so that
the stage id is always provided to `commitTask`?
Mridul suggested in the previous PR to use the MR job configuration to
propagate this (which you can access in the `mrTaskContext` parameter above).
Any reason why you didn't go that route?
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]