Re: Limit on number of simultaneous Spark frameworks on Mesos?

Martin Weindel Thu, 21 Aug 2014 13:15:55 -0700

This sounds like you hit the same problem as I:

https://issues.apache.org/jira/browse/MESOS-1688


Note that there is a patch for Spark as work around for this dead lock:

https://github.com/apache/spark/pull/1860

Regards,

Martin

Am 20.08.2014 21:39, schrieb Cody Koeninger:

I'm seeing situations where starting e.g. a 4th spark job on Mesosresults in none of the jobs making progress. This happens even with--executor-memory set to values that should not come close toexceeding the availability per node, and even if the 4th job is doingsomething completely trivial (e.g. parallelize 1 to 10000 and sum).Killing one of the jobs typically allows the others to start proceeding.
While jobs are hung, I see the following in mesos master logs:
I0820 19:28:02.651296 24666 master.cpp:2282] Sending 7 offers toframework 20140820-170154-1315739402-5050-24660-0020I0820 19:28:02.654502 24668 master.cpp:1578] Processing reply foroffers: [ 20140820-170154-1315739402-5050-24660-96624 ] on slave20140724-150750-1315739402-5050-25405-6 (dn-04) for framework20140820-170154-1315739402-5050-24660-0020I0820 19:28:02.654722 24668 hierarchical_allocator_process.hpp:590]Framework 20140820-170154-1315739402-5050-24660-0020 filtered slave20140724-150750-1315739402-5050-25405-6 for 1secs
Am I correctly interpreting that to mean that spark is being offeredresources, but is rejecting them? Is there a way (short of patchingspark to add more logging) to figure out why resources are being rejected?
This is on the default fine-grained mode.

Re: Limit on number of simultaneous Spark frameworks on Mesos?

Reply via email to