Github user timout commented on the issue:

    https://github.com/apache/spark/pull/17619
  
    That does exactly what is supposed to do. And you absolutely right it 
related to executors.
    I am sorry if it is not clear from my previous explanations.
    Let us say:
    Spark Streaming App - very long running app:
     Driver, started by marathon using  docker image, schedules (in mesos 
meaning) executors using 
     docker images.(net=HOST) (every executor started from docker image on some 
mesos agent)
    So if some recoverable error happens, for instance: 
    ExecutorLostFailure (executor 40 exited caused by one of the running tasks) 
Reason: Remote RPC client disassociated...(I do not know how about others but 
it is relatively often in my env.)
    As result the executor will be dead and after 2 failures mesos agent node 
will be included in MesosCoarseGrainedSchedulerBackend black list and driver 
will never schedule (in mesos meaning) executor on it. So the app will 
starve... and notice will not die.
    That exactly what happened with my streams apps before that patch.
    
    That patch may be incompatible with master already but i can fix it if 
needed.
    



---

---------------------------------------------------------------------
To unsubscribe, e-mail: [email protected]
For additional commands, e-mail: [email protected]

Reply via email to