Hey Claudiu, Mind posting some master logs with the simple setup that you described (3 shark cli instances)? That would help us better diagnose the problem.
On Fri, May 30, 2014 at 1:59 AM, Claudiu Barbura <[email protected] > wrote: > This is a critical issue for us as we have to shut down frameworks for > various components in our platform to work and this has created more > contention than before we deployed Mesos, when everyone had to wait in line > for their MR/Hive jobs to run. > > Any guidance, ideas would be extremely helpful at this point. > > Thank you, > Claudiu > > From: Claudiu Barbura <[email protected]> > Reply-To: "[email protected]" <[email protected]> > Date: Tuesday, May 27, 2014 at 11:57 PM > To: "[email protected]" <[email protected]> > Subject: Framework Starvation > > Hi, > > Following Ben’s suggestion at the Seattle Spark Meetup in April, I built > and deployed the 0-18.1-rc1 branch hoping that this wold solve the > framework starvation problem we have been seeing in the past 2 months now. > The hope was that https://issues.apache.org/jira/browse/MESOS-1086 would > also help us. Unfortunately it did not. > This bug is preventing us to run multiple spark and shark servers (http, > thrift), in load balanced fashion, Hadoop and Aurora in the same mesos > cluster. > > For example, if we start at least 3 frameworks, one Hadoop, one > SparkJobServer (one Spark context in fine-grained mode) and one Http > SharkServer (one JavaSharkContext that inherits from Spark Contexts, again > in fine-grained mode) and we run queries on all three of them, very soon we > notice the following behavior: > > > - only the last two frameworks that we run queries against receive > resource offers (master.cpp log entries in the log/mesos-master.INFO) > - the other frameworks are ignored and not allocated any resources > until we kill one the two privileged ones above > - As soon as one of the privileged framework is terminated, one of the > starved framework takes its place > - Any new Spark context created in coarse-grained mode (fixed number > of cores) will generally receive offers immediately (rarely it gets > starved) > - Hadoop behaves slightly differently when starved: task trackers are > started but never released, which means, if the first job (Hive query) is > small in terms of number of input splits, only one task tracker with a > small number of allocated ores is created, and then all subsequent queries, > regardless of size are only run in very limited mode with this one “small” > task tracker. Most of the time only the map phase of a big query is > completed while the reduce phase is hanging. Killing one of the registered > Spark context above releases resources for Mesos to complete the query and > gracefully shut down the task trackers (as noticed in the master log) > > We are using the default settings in terms of isolation, weights etc … the > only stand out configuration would be the memory allocation for slave > (export MESOS_resources=mem:35840 in mesos-slave-env.sh) but I am not sure > if this is ever enforced, as each framework has its own executor process > (JVM in our case) with its own memory allocation (we are not using cgroups > yet) > > A very easy to reproduce this bug is to start a minimum of 3 shark-cli > instances in a mesos cluster and notice that only two of them are being > offered resources and are running queries successfully. > I spent quite a bit of time in mesos, spark and hadoop-mesos code in an > attempt to find a possible workaround but no luck so far. > > Any guidance would be very appreciated. > > Thank you, > Claudiu > > >

