Hi, Following Ben's suggestion at the Seattle Spark Meetup in April, I built and deployed the 0-18.1-rc1 branch hoping that this wold solve the framework starvation problem we have been seeing in the past 2 months now. The hope was that https://issues.apache.org/jira/browse/MESOS-1086 would also help us. Unfortunately it did not. This bug is preventing us to run multiple spark and shark servers (http, thrift), in load balanced fashion, Hadoop and Aurora in the same mesos cluster.
For example, if we start at least 3 frameworks, one Hadoop, one SparkJobServer (one Spark context in fine-grained mode) and one Http SharkServer (one JavaSharkContext that inherits from Spark Contexts, again in fine-grained mode) and we run queries on all three of them, very soon we notice the following behavior: * only the last two frameworks that we run queries against receive resource offers (master.cpp log entries in the log/mesos-master.INFO) * the other frameworks are ignored and not allocated any resources until we kill one the two privileged ones above * As soon as one of the privileged framework is terminated, one of the starved framework takes its place * Any new Spark context created in coarse-grained mode (fixed number of cores) will generally receive offers immediately (rarely it gets starved) * Hadoop behaves slightly differently when starved: task trackers are started but never released, which means, if the first job (Hive query) is small in terms of number of input splits, only one task tracker with a small number of allocated ores is created, and then all subsequent queries, regardless of size are only run in very limited mode with this one "small" task tracker. Most of the time only the map phase of a big query is completed while the reduce phase is hanging. Killing one of the registered Spark context above releases resources for Mesos to complete the query and gracefully shut down the task trackers (as noticed in the master log) We are using the default settings in terms of isolation, weights etc ... the only stand out configuration would be the memory allocation for slave (export MESOS_resources=mem:35840 in mesos-slave-env.sh) but I am not sure if this is ever enforced, as each framework has its own executor process (JVM in our case) with its own memory allocation (we are not using cgroups yet) A very easy to reproduce this bug is to start a minimum of 3 shark-cli instances in a mesos cluster and notice that only two of them are being offered resources and are running queries successfully. I spent quite a bit of time in mesos, spark and hadoop-mesos code in an attempt to find a possible workaround but no luck so far. Any guidance would be very appreciated. Thank you, Claudiu

