sorry if i am not clear.. but this is whats triggering it... I am using Spark with Mesos. When the driver proces is killed by user ( this is not in my control), mesos framework kills the slave tasks, and we don't receive any signal or atexit handler called in python code. Hence we are not able to perform any cleanup. I am not sure if restarting the framework is an option here.
Regards Sumit Chawla On Wed, Jul 19, 2017 at 4:41 PM, Tomek Janiszewski <[email protected]> wrote: > When framework dies prematurly it should be restated and perform > reconciliation. If shutdown is desired it should explicitly deregister. > > For a quick fix you may configure failover_timeout. It is a time to wait > for a framework to re-register after fail. After this time tasks will be > killed. > > http://mesos.apache.org/documentation/latest/high- > availability-framework-guide/ > > czw., 20.07.2017, 00:55 użytkownik Chawla,Sumit <[email protected]> > napisał: > >> We are using mesos 0.27 to launch Pyspark jobs. >> >> These jobs start docker containers to do the processing. Whenever the >> mesos framework dies prematurely, the docker containers are left running on >> the machine leading to space and memory issues overtime. I am looking for a >> solution on how can we get notified on framework shutdown so that we can do >> proper cleanup of resources. >> >> >> Regards >> Sumit Chawla >> >>

