Github user squito commented on a diff in the pull request:
https://github.com/apache/spark/pull/16959#discussion_r103606657
--- Diff:
core/src/test/scala/org/apache/spark/scheduler/OutputCommitCoordinatorSuite.scala
---
@@ -195,6 +195,17 @@ class OutputCommitCoordinatorSuite extends
SparkFunSuite with BeforeAndAfter {
sc.runJob(rdd,
OutputCommitFunctions(tempDir.getAbsolutePath).callCanCommitMultipleTimes _,
0 until rdd.partitions.size)
}
+
+ test("SPARK-19631: Do not allow failed attempts to be authorized for
committing") {
+ val stage: Int = 1
+ val partition: Int = 1
+ val failedAttempt: Int = 0
+ outputCommitCoordinator.stageStart(stage, maxPartitionId = 1)
+ outputCommitCoordinator.taskCompleted(stage, partition, attemptNumber
= failedAttempt,
+ reason = ExecutorLostFailure("0", exitCausedByApp = true, None))
+ assert(!outputCommitCoordinator.canCommit(stage, partition,
failedAttempt))
--- End diff --
sorry if I am being really dense, but it still seems to me like in this
particular scenario, we're taking broken behavior, fixing it in some cases, and
making it worse in others.
suppose E1 got permission to commit, then lost connectivity to the driver
(or missed heartbeats etc), but continued to try to commit. then E2 asks to
commit.
Before, we might have ended up with an infinite loop, where E1 never
finishes committing, and E2 never gets to commit. Similarly, all future
attempts don't get to commit, but we don't even fail the task set because
taskcommitdenied doesn't count towards failing a taskset, so it just keeps
retrying.
After this change, E2 gets to commit immediately after E1 loses
connectivity to the driver. E1 may or may not commit at any time. If E1
doesn't commit, great. If E1 does commit, then in most scenarios, things will
still be fine. But sometimes, the two commits will stomp on each other.
so we've narrowed the scenarios with incorrect behavior -- but the behavior
has gone from an infinite loop (bad), to jobs appearing to succeed when they
have actually written corrupt data (worse, IMO).
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]