Have you inspected the framework page/tab in the Mesos master web UI? Perhaps, 
as you already suspect, DRF is only handing out resources to frameworks which 
have a lower dominant resource. So you could check if your spark instance has a 
high dominant resource due to the executors taking up a lot of memory.

I’m having alike problems in a, albeit contrived, environment where there are 4 
long running spark instances, where the first  instance (only first by a small 
time value) gets offered all mesos-slaves and runs the executor. The next 
instances have a lower chance of getting the same amount of memory, but as 
their dominant resource is lower (memory) they more often get CPU resources 
compared to that first instance. Counter intuitively, the first instance 
finishes last. 

> On 19 Aug 2015, at 14:07, Iulian Dragoș <[email protected]> wrote:
> 
> I am facing a problem with a framework not getting any resource offers for 
> 15-20 minutes, while other frameworks (8-9 of them) continuously get offers.
> 
> The framework is Spark (running in fine-grained mode), and is launched with 
> Chronos. After a few tasks successfully executed, it stops getting offers, 
> though looking at the master logs we see other frameworks getting offers 
> every few seconds. For some reason, the Spark one isn't getting them for a 
> very long period of time.
> 
> Can such behaviour be explained by the DRF algorithm? How could I debug this?
> 
> thanks,
> iulian
> 
> -- 
> 
> --
> Iulian Dragos
> 
> ------
> Reactive Apps on the JVM
> www.typesafe.com <http://www.typesafe.com/>
> er

Reply via email to