If the currently running tasks do not have checkpointing turned on, they cannot reconnect to a restarted slave no matter what.
And yes currently you can't change the Slave resources roles without wiping metadata. @vinodkone > On Aug 14, 2015, at 6:14 AM, Mike Barborak <[email protected]> wrote: > > I’ve made the changes to my frameworks and Marathon to use roles. My question > is, is there a way to change a slave’s role without restarting it? I ask > because the slave I want to reconfigure is running frameworks that scheduled > tasks that take a very long time to complete their work. These frameworks do > not have checkpointing turned on. (I’ve changed the code so that they will in > the future.) My understanding and experience tell me that to change the > slave’s configuration, I have to restart the slave and that when I do that I > will get a log message saying I have to rm –f /tmp/mesos/meta/slaves/latest. > After I do that and restart, I believe the running frameworks will not > reconnect with the slave (does that sound right?) and will timeout and shut > down along with the tasks they scheduled and that is what I’m trying to avoid. > > Thanks, > Mike > > From: Mike B > Sent: Tuesday, July 14, 2015 5:33 PM > To: [email protected] > Subject: RE: resources not offered to framework > > I didn’t understood the difference between roles and attributes. That sounds > like what I am looking for. Thanks for your help. > > -Mike > > From: Vinod Kone [mailto:[email protected]] > Sent: Tuesday, July 14, 2015 4:37 PM > To: [email protected] > Subject: Re: resources not offered to framework > > > On Tue, Jul 14, 2015 at 4:36 AM, Mike B <[email protected]> wrote: > I could see the master processing ACCEPT calls for offers and I could see the > resources associated with the new slave being recovered because none of the > frameworks they were offered to wanted them. What I never saw was these new > resources being offered to the framework that could have used them. Ideally, > I would have liked these new resources to have been offered to that > framework. (One note, another instance of the same framework was launched > after seeing this problem and it was offered these new resources.) > > > I imagine this could be possible with the built-in allocator if the framework > (say F) that needed the "worker" resources had a high DRF share and other > frameworks had a low DRF share. If the frameworks that do not need "worker" > resources do not filter them for long enough (refuse_seconds is small) time, > they might repeatedly become candidates for allocation starving out F. > > Couple of options here. > --> You can have frameworks that are not interested in "worker" resources > decline offers (with "worker" resources) with a very long interval (say 1 > year). > > --> Instead of attributes, use roles (role: worker, role: workstation etc) > and have framework F register with role "worker". >

