> > I need to wait until all tasks are done and during this time no new tasks > should be started on this slave
This is exactly what maintenance mode is designed for. But to achieve this, it requires the cooperation of the framework. When the operator adds a maintenance schedule for a slave, mesos master would first send "inverse offers" to all frameworks that have tasks running on that slave, and the frameworks are "assumed to" move the tasks away to other slaves. But the framework can ignore the inverse offers as well, for example, I can't find any code to handle it in marathon code. > Also the maintenance mode seems not to be an option: When maintenance is triggered by the operator, all agents on the machine > are told to shutdown Be aware that the maintenance process is a two-phase process: - the first step is "adding the maintenance schedule", the operator tells master "I would take slaveX down for maintenance in 1 hour, please ask the frameworks to move their tasks to other slaves", as I described above - the second step is "starting the maintenance", the operator tells the master "I'm taking this slave down RIGHT NOW". The master would kill all tasks on that slave and asks the mesos-slave process to exit, as described in the paragrah you quoted in the original mesasge. In a word, it mostly depends on the frameworks you use. On Wed, Dec 30, 2015 at 7:43 PM, Mike Michel <[email protected]> wrote: > Hi, > > > > i need to update slaves from time to time and looking for a way to take > them out of the cluster but without killing the running tasks. I need to > wait until all tasks are done and during this time no new tasks should be > started on this slave. My first idea was to set a constraint > „status:online“ for every task i start and then change the attribute of the > slave to „offline“, restart slave process while executer still runs the > tasks but it seems if you change the attributes of a slave it can not > connect to the cluster without rm -rf /tmp before which will kill all tasks. > > > > Also the maintenance mode seems not to be an option: > > > > „When maintenance is triggered by the operator, all agents on the machine > are told to shutdown. These agents are subsequently removed from the master > which causes tasks to be updated as TASK_LOST. Any agents from machines > in maintenance are also prevented from registering with the master.“ > > > > Is there another way? > > > > > > Cheers > > > > Mike >

