Hi, There¹s a ³framework starvation² thread you should look up Š we provided a patch for it in 0.18 and promised Vinod to write a detailed blog about it Š but I¹ve been swamped and did not get to it yet. One of these days ..
Claudiu On 8/20/14, 1:23 PM, "Timothy Chen" <[email protected]> wrote: >Can you share your spark / mesos configurations and the spark job? I'd >like to repro it. > >Tim > >> On Aug 20, 2014, at 12:39 PM, Cody Koeninger <[email protected]> wrote: >> >> I'm seeing situations where starting e.g. a 4th spark job on Mesos >>results in none of the jobs making progress. This happens even with >>--executor-memory set to values that should not come close to exceeding >>the availability per node, and even if the 4th job is doing something >>completely trivial (e.g. parallelize 1 to 10000 and sum). Killing one >>of the jobs typically allows the others to start proceeding. >> >> While jobs are hung, I see the following in mesos master logs: >> >> I0820 19:28:02.651296 24666 master.cpp:2282] Sending 7 offers to >>framework 20140820-170154-1315739402-5050-24660-0020 >> I0820 19:28:02.654502 24668 master.cpp:1578] Processing reply for >>offers: [ 20140820-170154-1315739402-5050-24660-96624 ] on slave >>20140724-150750-1315739402-5050-25405-6 (dn-04) for framework >>20140820-170154-1315739402-5050-24660-0020 >> I0820 19:28:02.654722 24668 hierarchical_allocator_process.hpp:590] >>Framework 20140820-170154-1315739402-5050-24660-0020 filtered slave >>20140724-150750-1315739402-5050-25405-6 for 1secs >> >> Am I correctly interpreting that to mean that spark is being offered >>resources, but is rejecting them? Is there a way (short of patching >>spark to add more logging) to figure out why resources are being >>rejected? >> >> This is on the default fine-grained mode. >>

