Hi,

Following Ben's suggestion at the Seattle Spark Meetup in April, I built and 
deployed  the 0-18.1-rc1 branch hoping that this wold solve the framework 
starvation problem we have been seeing in the past 2 months now. The hope was 
that https://issues.apache.org/jira/browse/MESOS-1086 would also help us. 
Unfortunately it did not.
This bug is preventing us to run multiple spark and shark servers (http, 
thrift), in load balanced fashion, Hadoop and Aurora in the same mesos cluster.

For example, if we start at least 3 frameworks, one Hadoop, one SparkJobServer 
(one Spark context in fine-grained mode) and one Http SharkServer (one 
JavaSharkContext that inherits from Spark Contexts, again in fine-grained mode) 
and we run queries on all three of them, very soon we notice the following 
behavior:


  *   only the last two frameworks that we run queries against receive resource 
offers (master.cpp log entries in the log/mesos-master.INFO)
  *   the other frameworks are ignored and not allocated any resources until we 
kill one the two privileged ones above
  *   As soon as one of the privileged framework is terminated, one of the 
starved framework takes its place
  *   Any new Spark context created in coarse-grained mode (fixed number of 
cores) will generally receive offers immediately (rarely it gets starved)
  *   Hadoop behaves slightly differently when starved: task trackers are 
started but never released, which means, if the first job (Hive query) is small 
in terms of number of input splits, only one task tracker with a small number 
of allocated ores is created, and then all subsequent queries, regardless of 
size are only run in very limited mode with this one "small" task tracker. Most 
of the time only the map phase of a big query is completed while the reduce 
phase is hanging. Killing one of the registered Spark context above releases 
resources for Mesos to complete the query and gracefully shut down the task 
trackers (as noticed in the master log)

We are using the default settings in terms of isolation, weights etc ... the 
only stand out configuration would be the memory allocation for slave (export 
MESOS_resources=mem:35840 in mesos-slave-env.sh) but I am not sure if this is 
ever enforced, as each framework has its own executor process (JVM in our case) 
with its own memory allocation (we are not using cgroups yet)

A very easy to reproduce this bug is to start a minimum of 3 shark-cli 
instances in a mesos cluster and notice that only two of them are being offered 
resources and are running queries successfully.
I spent quite a bit of time in mesos, spark and hadoop-mesos code in an attempt 
to find a possible workaround  but no luck so far.

Any guidance would be very appreciated.

Thank you,
Claudiu


Reply via email to