Hi,

I have a small Mesos cluster of machines. Users launch frameworks onto the 
cluster that will typically execute tens of tasks that take several hours to 
run. When all of these  tasks are done the framework ends. When offered 
resources, these frameworks will only accept those that are appropriate. By 
appropriate I mean that we start up slaves with an attribute "slavetype" set to 
"worker," "workstation" or "service" and that a resource is only appropriate 
for the described frameworks if they come from a slave with "slavetype" set to 
"worker."

A problem that I saw was this: a user launched a framework. It created tasks 
until all the cluster's appropriate resources were used. At that point I added 
another slave to the cluster with a "slavetype" of "worker." The slave 
registered correctly with the master and its resources began being offered to 
frameworks. In the logs I could see a repeating pattern of offers being sent to 
existing frameworks, I could see the master processing ACCEPT calls for offers 
and I could see the resources associated with the new slave being recovered 
because none of the frameworks they were offered to wanted them. What I never 
saw was these new  resources being offered to the framework that could have 
used them. Ideally, I would have liked these new resources to have been offered 
to that framework. (One note, another instance of the same framework was 
launched after seeing this problem and it was offered these new resources.)

Does DRF explain the behavior that I am seeing?

If so, can I get the desired behavior by using roles and weights for the 
framework?

If not, is the solution to write my own resource allocator for the master? Is 
there an allocator out there that might suit my needs?

Thanks,
Mike

Reply via email to