Hi, 

While investigating fairness possibilities with Mesos for Spark workloads I’m 
trying to achieve for example a 4:1 weight ratio for two frameworks.
Imagine a system with two Spark frameworks (in fine-grained mode if you’re 
familiar with Spark) and I want one the two frameworks to get four times more 
resources than the other when both are contending for resources.

In mesos I set two roles “F1” “F2", with a weight of 4 and 1 respectively.

However during the times when both frameworks are in need of resources the 
latter gets close to zero offers. Having read and more carefully investigated 
DRF I understood that memory is the dominant resource in the case of framework 
2 (F2) which Spark sets statically, i.e., it doesn’t release once acquired, and 
in my case that is ~25% per slave.  So the allocator thinks that F2 has 
received enough resources, since its dominant resource is already above if 
weighted fair share. Thus all CPU offers go to framework 1 (F1).
To remedy this first hurdle I recalculate, although somewhat contrived, the 
ratio to 3.2 ( =80% / 25% ). 
After using the 3.2:1 ratio things are a bit better but still framework 2 (F2), 
during high resource demand of both frameworks, only gets half of the resources 
it should get. 

At this point I was quite lost and tried changing several parameters, on of 
them was the allocation interval (master option --allocation_interval) and set 
it to a relatively low 50ms instead of the default 1000ms.
Suddenly my ratio was being honored perfectly and I was getting roughly a 4:1 
CPU ratio between the two Spark frameworks. (Verifying that my ratio 3.2:1, to 
circumvent spark’s static memory allocation, was working. )

Perhaps it’s because I’m using only 10 physical nodes, however I made 
unit-tests in the mesos-source to mimic my case, and there I could verify that 
the offers are made fairly according to the weights.

Why is the fairness, expressed as being close as close to the defined 
role-weights, only honored when the allocation interval is relatively low? Hope 
someone can explain the phenomenon. 


Thanks,

Hans

Reply via email to