Github user squito commented on the pull request:

    https://github.com/apache/spark/pull/9214#issuecomment-150704989
  
    Well, as I've done some more testing and taken a closer look at the some of 
the corner cases, I don't think its easy to get this working.  The major 
problem comes from when executors are lost but they come back.  Say executor 1 
generates some shuffle output, then for whatever reason the driver thinks its 
lost the executor.  The driver unregisters the shuffle output, then queues up 
some tasks to regenerate that shuffle output.  Meanwhile the executor comes 
back, and thanks to locality preferences its likely to get some of those same 
tasks again.  But the `ShuffleOutputCoordinator` will ignore the output of 
those tasks, since those files already exist.  If we don't send some map status 
back when the task completes, the driver will still think that some shuffle 
output needs to be generated, b/c its already removed the original shuffle 
output.
    
    We could send back the map status based on the new (but uncommitted) 
shuffle output, but then it could be wrong about the zero-sized blocks if the 
data is non-deterministic.  We could have the `ShuffleOutputCoordinator` 
remember the map status of all output it commits, so that it always returns the 
mapstatus of the committed output -- this way the second attempt would return 
the map status for the committed outputs of the first attempt.  This adds 
complexity and memory overhead, though.
    
    And, that still wouldn't solve the problem if the shuffle output files got 
corrupted somehow. You'd get a fetch failure, the executor would get removed, 
and then it would come right back.  The same tasks would get scheduled on it, 
but it would never replace the shuffle output.  You'd just spin till you get 
your 4 failures, and the job would fail.
    
    To get around this, the driver could add some logic to prevent executors 
from re-registering, but to me that sounds like a much bigger change that needs 
to be thought through very carefully.
    
    In my experience, it is actually much more common that an executor gets 
removed and then comes back, rather than a node being permanently unreachable.  
There is just a long GC or something, the executor gets removed, but then it 
re-registers w/ the master.  (And there are unit tests in "local" mode which 
simulate the executor getting removed, but of course its the same executor 
which comes right back.)


---
If your project is set up for it, you can reply to this email and have your
reply appear on GitHub as well. If your project does not have this feature
enabled and wishes so, or if the feature is enabled but not working, please
contact infrastructure at [email protected] or file a JIRA ticket
with INFRA.
---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to