Thanks. In testing my new setup using roles, I’m having a problem with Marathon not being offered any resources. I’ve posted a question on the Marathon forums:
https://groups.google.com/forum/?hl=en#!topic/marathon-framework/bQ2pO5Dk2MA but am not getting any replies so was wondering if there is guidance for understanding from a Mesos perspective why a framework (Marathon here) is not getting resource offers. (Btw, my own custom framework does get offers – just not Marathon.) Is there a set of command line options that will reveal the master’s resource offer process to the point that the problem will be revealed? Or is there a trouble shooting guide that provides understanding around not getting resource offers? Sorry to be so non-specific – I’m a few days into this and starting to grasp. Thanks, Mike From: Vinod Kone [mailto:[email protected]] Sent: Friday, August 14, 2015 11:37 AM To: [email protected] Subject: Re: resources not offered to framework If the currently running tasks do not have checkpointing turned on, they cannot reconnect to a restarted slave no matter what. And yes currently you can't change the Slave resources roles without wiping metadata. @vinodkone On Aug 14, 2015, at 6:14 AM, Mike Barborak <[email protected]<mailto:[email protected]>> wrote: I’ve made the changes to my frameworks and Marathon to use roles. My question is, is there a way to change a slave’s role without restarting it? I ask because the slave I want to reconfigure is running frameworks that scheduled tasks that take a very long time to complete their work. These frameworks do not have checkpointing turned on. (I’ve changed the code so that they will in the future.) My understanding and experience tell me that to change the slave’s configuration, I have to restart the slave and that when I do that I will get a log message saying I have to rm –f /tmp/mesos/meta/slaves/latest. After I do that and restart, I believe the running frameworks will not reconnect with the slave (does that sound right?) and will timeout and shut down along with the tasks they scheduled and that is what I’m trying to avoid. Thanks, Mike From: Mike B Sent: Tuesday, July 14, 2015 5:33 PM To: [email protected]<mailto:[email protected]> Subject: RE: resources not offered to framework I didn’t understood the difference between roles and attributes. That sounds like what I am looking for. Thanks for your help. -Mike From: Vinod Kone [mailto:[email protected]] Sent: Tuesday, July 14, 2015 4:37 PM To: [email protected]<mailto:[email protected]> Subject: Re: resources not offered to framework On Tue, Jul 14, 2015 at 4:36 AM, Mike B <[email protected]<mailto:[email protected]>> wrote: I could see the master processing ACCEPT calls for offers and I could see the resources associated with the new slave being recovered because none of the frameworks they were offered to wanted them. What I never saw was these new resources being offered to the framework that could have used them. Ideally, I would have liked these new resources to have been offered to that framework. (One note, another instance of the same framework was launched after seeing this problem and it was offered these new resources.) I imagine this could be possible with the built-in allocator if the framework (say F) that needed the "worker" resources had a high DRF share and other frameworks had a low DRF share. If the frameworks that do not need "worker" resources do not filter them for long enough (refuse_seconds is small) time, they might repeatedly become candidates for allocation starving out F. Couple of options here. --> You can have frameworks that are not interested in "worker" resources decline offers (with "worker" resources) with a very long interval (say 1 year). --> Instead of attributes, use roles (role: worker, role: workstation etc) and have framework F register with role "worker".

