On Mon, Aug 24, 2015 at 5:42 PM, Hans van den Bogert <[email protected]> wrote:
> Can anyone tell how the Mesos allocation algorithm works: > Does Mesos offer every free resource it has to one framework at a time? Or > does the allocator divide the max offer size by the amount of > active/registered frameworks? > and > in case of: > FW1 has a high dominant resource fraction (>50%), which it does not > release. FW2 and FW3 have a lot of churn for their tasks, both have > outstanding short lived tasks in their queue (shorter than the mesos > allocation interval), these 2 FWs accept all resources Mesos has to offer - > if they get the offer. > > Reading the DRF paper and presentation, am I to assume the online DRF > algorithm would favour FW2 and FW3 always before FW1? As one of the two > (FW2/3) will always (or at least more likely to,) have a lower dominant > resource than FW1. According to the presentation on DRF, the framework with > the lowest dominant resource gets the offer. But this is a potential > starvation e.g., if a framework has allocated memory, but needs a new offer > with CPUs to actually do something. You might wonder why the framework > didn’t use memory AND cpu from the same offer, but Spark for example does > exactly this. > I'd love to learn more from Mesos devs about the allocation algorithm. In my limited understanding, you are correct. > > To give some context, I think I’m seeing this behaviour with Spark in > fine-grained mode. I have 4 spark instances which are long-lived, emulating > interactive queries. The first Spark instance to get an offer “installs” > executors (with high memory demand) on every slave node it sees. The next > framework tries to do the same, but for these later instances, theres not > always enough executor memory, that’s why I end up with an instance, which > was first to get the offer, with a lot of memory it doesn’t let go, but it > also gets way less offers for CPU afterwards. In contrast the later spark > instances with less long-living executors do not have a high memory usage, > and get relatively more CPU offers. > Of course setting a max amount of Spark executors per framework instance > would mitigate this, but then I’m basically back to static allocation of > resources. > I've seen similar behavior with Spark's fine-grained mode. See my thread from a couple of days ago. I would recommend using coarse-grained mode with dynamic allocation (available in the future 1.5 version). We worked around this by using Mesos roles, and assigning Spark to a specific role. It seems Mesos will allocate resources based on roles, if configured. Unfortunately, `spark.mesos.role` is a new configuration parameter to be added in 1.5 as well, so we needed to use Spark 1.5 preview. iulian > > Thanks in advance, > > Hans > > > > > -- -- Iulian Dragos ------ Reactive Apps on the JVM www.typesafe.com

