Github user squito commented on the pull request:
https://github.com/apache/spark/pull/9214#issuecomment-150704989
Well, as I've done some more testing and taken a closer look at the some of
the corner cases, I don't think its easy to get this working. The major
problem comes from when executors are lost but they come back. Say executor 1
generates some shuffle output, then for whatever reason the driver thinks its
lost the executor. The driver unregisters the shuffle output, then queues up
some tasks to regenerate that shuffle output. Meanwhile the executor comes
back, and thanks to locality preferences its likely to get some of those same
tasks again. But the `ShuffleOutputCoordinator` will ignore the output of
those tasks, since those files already exist. If we don't send some map status
back when the task completes, the driver will still think that some shuffle
output needs to be generated, b/c its already removed the original shuffle
output.
We could send back the map status based on the new (but uncommitted)
shuffle output, but then it could be wrong about the zero-sized blocks if the
data is non-deterministic. We could have the `ShuffleOutputCoordinator`
remember the map status of all output it commits, so that it always returns the
mapstatus of the committed output -- this way the second attempt would return
the map status for the committed outputs of the first attempt. This adds
complexity and memory overhead, though.
And, that still wouldn't solve the problem if the shuffle output files got
corrupted somehow. You'd get a fetch failure, the executor would get removed,
and then it would come right back. The same tasks would get scheduled on it,
but it would never replace the shuffle output. You'd just spin till you get
your 4 failures, and the job would fail.
To get around this, the driver could add some logic to prevent executors
from re-registering, but to me that sounds like a much bigger change that needs
to be thought through very carefully.
In my experience, it is actually much more common that an executor gets
removed and then comes back, rather than a node being permanently unreachable.
There is just a long GC or something, the executor gets removed, but then it
re-registers w/ the master. (And there are unit tests in "local" mode which
simulate the executor getting removed, but of course its the same executor
which comes right back.)
---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]