Hi Mohit, The scenario makes sense and unfortunately this is really a bug in how we do allocations.
The default allocator in Mesos implements a weighted fair-sharing algorithm called dominant resource fairness. This does well when there are tasks that are "short-lived" (or at least, don't live forever) and an adequate amount of resources to "go around". In your case, it sounds like there is a lot more computation (build jobs) then available resources. I've filed https://issues.apache.org/jira/browse/MESOS-1086 and put up a patch at https://reviews.apache.org/r/19090. Thanks for the detailed report Mohit! Ben. On Mon, Mar 10, 2014 at 7:58 PM, Mohit Soni <mohitsoni1...@gmail.com> wrote: > I was running a load test on a mesos-cluster, and observed that when mesos > is running lots of frameworks, offer starvation occurs for certain > frameworks, i.e. only a subset of frameworks registered with mesos gets > offers. Let me describe the scenario below: > > First phase: > At the beginning, there's only one framework registered with mesos, which > is 'Marathon'. The load generator, uses Marathon's API to launch let's say > 50 Jenkins masters, with mesos-plugin installed. Once all 50 masters are > launched, the mesos-cluster now have 51 frameworks registered in total, > because the mesos-plugin registers itself with mesos-master as a framework. > > Second phase: > Now, the load generator goes and triggers couple of build jobs on each > Jenkins Master. Each framework's Schedular will now have let's say 2 items > in it's build queue. Once framework get's a resource offer from Master, > it's schedular can perform the build tasks, if the offer matches the > resource constraints as specified by mesos-plugin. > > What I observed was, at the start of second phase, some frameworks > (jenkins masters) got offers and got their tasks scheduled to run. But, > rest of the frameworks, didn't get resource offers from mesos-master, and > the build jobs scheduled on those, got starved. Tailing jenkins logs on > these masters never showed: 'Received offers'. Also, according to mesos > master logs, mesos was sending offers to only a handful of frameworks. The > logs below show the message from a minute, but I saw the similar behavior > at other times, I have added a line break after each group of frameworks > getting offers: > > I0310 17:56:44.703126 1156 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0364 > I0310 17:56:45.722951 1156 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0371 > I0310 17:56:46.744184 1159 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0377 > I0310 17:56:47.768546 1158 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0380 > I0310 17:56:48.794517 1156 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0396 > > I0310 17:56:49.813484 1157 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0364 > I0310 17:56:50.833155 1159 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0371 > I0310 17:56:51.859712 1158 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0377 > I0310 17:56:52.879678 1153 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0380 > I0310 17:56:53.904261 1156 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0396 > > I0310 17:56:54.929472 1155 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0364 > I0310 17:56:55.947387 1153 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0371 > I0310 17:56:56.975060 1157 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0377 > I0310 17:56:57.996995 1159 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0380 > I0310 17:56:59.022555 1156 master.cpp:2250] Sending 24 offers to > framework 201403032301-1255541002-5050-1126-0396 > > Couple of questions: > 1. Does running multiple frameworks (say more than 10), have an impact on > resource allocation strategy ? > 2. If a registered framework keeps declining mesos offers for a while, > does mesos take that into account while sending offers ? > > Links: > 1. https://github.com/mesosphere/marathon > 2. https://github.com/jenkinsci/mesos-plugin > > -- > Mohit >