On 24/06/2015 16:31, Alex Gaudio wrote:
Does anyone have other ideas?
HTCondor deals with this by having a "defrag" demon, which periodically stops hosts accepting small jobs, so that it can coalesce small slots into larger ones.

http://research.cs.wisc.edu/htcondor/manual/latest/3_5Policy_Configuration.html#sec:SMP-defrag

You can configure policies based on how many drained machines are already available, and how many can be draining at once.

Maybe there would be a benefit if Mesos could work out what is the largest job any framework has waiting to run, so it knows whether draining is required and how far to drain down. This might take the form of a message to the framework: "suppose I offered you all the resources on the cluster, what is the largest single job you would want to run, and which machine(s) could it run on?" Or something like that.

Regards,

Brian.

Reply via email to