Hi, While investigating fairness possibilities with Mesos for Spark workloads I’m trying to achieve for example a 4:1 weight ratio for two frameworks. Imagine a system with two Spark frameworks (in fine-grained mode if you’re familiar with Spark) and I want one the two frameworks to get four times more resources than the other when both are contending for resources.
In mesos I set two roles “F1” “F2", with a weight of 4 and 1 respectively. However during the times when both frameworks are in need of resources the latter gets close to zero offers. Having read and more carefully investigated DRF I understood that memory is the dominant resource in the case of framework 2 (F2) which Spark sets statically, i.e., it doesn’t release once acquired, and in my case that is ~25% per slave. So the allocator thinks that F2 has received enough resources, since its dominant resource is already above if weighted fair share. Thus all CPU offers go to framework 1 (F1). To remedy this first hurdle I recalculate, although somewhat contrived, the ratio to 3.2 ( =80% / 25% ). After using the 3.2:1 ratio things are a bit better but still framework 2 (F2), during high resource demand of both frameworks, only gets half of the resources it should get. At this point I was quite lost and tried changing several parameters, on of them was the allocation interval (master option --allocation_interval) and set it to a relatively low 50ms instead of the default 1000ms. Suddenly my ratio was being honored perfectly and I was getting roughly a 4:1 CPU ratio between the two Spark frameworks. (Verifying that my ratio 3.2:1, to circumvent spark’s static memory allocation, was working. ) Perhaps it’s because I’m using only 10 physical nodes, however I made unit-tests in the mesos-source to mimic my case, and there I could verify that the offers are made fairly according to the weights. Why is the fairness, expressed as being close as close to the defined role-weights, only honored when the allocation interval is relatively low? Hope someone can explain the phenomenon. Thanks, Hans

