Can anyone tell how the Mesos allocation algorithm works: Does Mesos offer every free resource it has to one framework at a time? Or does the allocator divide the max offer size by the amount of active/registered frameworks? and in case of: FW1 has a high dominant resource fraction (>50%), which it does not release. FW2 and FW3 have a lot of churn for their tasks, both have outstanding short lived tasks in their queue (shorter than the mesos allocation interval), these 2 FWs accept all resources Mesos has to offer - if they get the offer. Reading the DRF paper and presentation, am I to assume the online DRF algorithm would favour FW2 and FW3 always before FW1? As one of the two (FW2/3) will always (or at least more likely to,) have a lower dominant resource than FW1. According to the presentation on DRF, the framework with the lowest dominant resource gets the offer. But this is a potential starvation e.g., if a framework has allocated memory, but needs a new offer with CPUs to actually do something. You might wonder why the framework didn’t use memory AND cpu from the same offer, but Spark for example does exactly this.
To give some context, I think I’m seeing this behaviour with Spark in fine-grained mode. I have 4 spark instances which are long-lived, emulating interactive queries. The first Spark instance to get an offer “installs” executors (with high memory demand) on every slave node it sees. The next framework tries to do the same, but for these later instances, theres not always enough executor memory, that’s why I end up with an instance, which was first to get the offer, with a lot of memory it doesn’t let go, but it also gets way less offers for CPU afterwards. In contrast the later spark instances with less long-living executors do not have a high memory usage, and get relatively more CPU offers. Of course setting a max amount of Spark executors per framework instance would mitigate this, but then I’m basically back to static allocation of resources. Thanks in advance, Hans

