Have you inspected the framework page/tab in the Mesos master web UI? Perhaps, as you already suspect, DRF is only handing out resources to frameworks which have a lower dominant resource. So you could check if your spark instance has a high dominant resource due to the executors taking up a lot of memory.
I’m having alike problems in a, albeit contrived, environment where there are 4 long running spark instances, where the first instance (only first by a small time value) gets offered all mesos-slaves and runs the executor. The next instances have a lower chance of getting the same amount of memory, but as their dominant resource is lower (memory) they more often get CPU resources compared to that first instance. Counter intuitively, the first instance finishes last. > On 19 Aug 2015, at 14:07, Iulian Dragoș <[email protected]> wrote: > > I am facing a problem with a framework not getting any resource offers for > 15-20 minutes, while other frameworks (8-9 of them) continuously get offers. > > The framework is Spark (running in fine-grained mode), and is launched with > Chronos. After a few tasks successfully executed, it stops getting offers, > though looking at the master logs we see other frameworks getting offers > every few seconds. For some reason, the Spark one isn't getting them for a > very long period of time. > > Can such behaviour be explained by the DRF algorithm? How could I debug this? > > thanks, > iulian > > -- > > -- > Iulian Dragos > > ------ > Reactive Apps on the JVM > www.typesafe.com <http://www.typesafe.com/> > er

