Github user squito commented on the issue:
https://github.com/apache/spark/pull/20640
thanks @IgorBerman, description looks fine to me now, maybe I saw it wrong
before.
your test sounds pretty good to me ... you could turn on debug logging for
MesosCoarseGrainedSchedulerBackend and look for these log msgs:
https://github.com/apache/spark/blob/master/resource-managers/mesos/src/main/scala/org/apache/spark/scheduler/cluster/mesos/MesosSchedulerUtils.scala#L603
What do you mean "it didn't have much effect" -- sounds like it did exactly
the right thing?
Sorry, I don't really understand description of the other bug you
mentioned. Why shouldn't it start a 2nd executor on the same slave for the
same application? That seems fine until you have enough failures for the node
blacklisting to take effect. There is also a small race (that is relatively
benign) that would allow you to get a an executor on a node which you are in
the middle of blacklisting.
---
---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]