Re: Framework Starvation

Vinod Kone Tue, 17 Jun 2014 18:55:30 -0700

Hey Claudiu,

I spent some time trying to understand the logs you posted. Whats strange
to me is that in the very beginning when framework's 1 and 2 are
registered, only one framework gets offers for a period of 9s. It's not
clear why this happens. I even wrote a test (
https://reviews.apache.org/r/22714/) to repro but wasn't able to.


It would probably be helpful to add more logging to the drf sorting
comparator function to understand why frameworks are sorted in such a way
when their share is same (0). My expectation is that after each allocation,
the 'allocations' for a framework should increase causing the sort function
to behave correctly. But that doesn't seem to be happening in your case.


I0604 22:12:43.715530 22270 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0000

I0604 22:12:44.276062 22273 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:44.756918 22292 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0000

I0604 22:12:45.794178 22276 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:46.841629 22291 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:47.884266 22262 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:48.926856 22268 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:49.966560 22280 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:51.007143 22267 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:52.047987 22280 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:53.089340 22291 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0001

I0604 22:12:54.130242 22263 master.cpp:2282] Sending 4 offers to framework
20140604-221214-302055434-5050-22260-0000


@vinodkone


On Fri, Jun 13, 2014 at 3:40 PM, Claudiu Barbura <[email protected]
> wrote:

>  Hi Vinod,
>
>  Attached are the patch files.Hadoop has to be treated differently as it
> requires resources in order to shut down task trackers after a job is
> complete. Therefore we set the role name so that Mesos allocates resources
> for it first, ahead of the rest of the frameworks under the default role
> (*).
> This is not ideal, we are going to loo into the Hadoop Mesos framework
> code and fix if possible. Luckily, Hadoop is the only framework we use on
> top of Mesos that allows a configurable role name to be passed in when
> registering a framework (unlike, Spark, Aurora, Storm etc)
> For the non-Hadoop frameworks, we are making sure that once a framework is
> running its jobs, Mesos no longer offers resources to it. In the same time,
> once a framework completes its job, we make sure its “client allocations”
> value is updated so that when it completes the execution of its jobs, it is
> placed back in the sorting list with a real chance of being offered again
> immediately (not starved!).
> What is also key is that mem type resources are ignored during share
> computation as only cpus are a good indicator of which frameworks are
> actually running jobs in the cluster.
>
>  Thanks,
> Claudiu
>
>   From: Claudiu Barbura <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Thursday, June 12, 2014 at 6:20 PM
>
> To: "[email protected]" <[email protected]>
> Subject: Re: Framework Starvation
>
>   Hi Vinod,
>
>  We have a fix (more like a hack) that works for us, but it requires us
> to run each Hadoop framework with a different role as we need to treat
> Hadoop differently than the rest of the frameworks (Spark, Shark, Aurora)
> which are running with the default role (*).
> We had to change the drf_sorter.cpp/hpp and
> hierarchical_allocator_process.cpp files.
>
>  Let me know if you need more info on this.
>
>  Thanks,
> Claudiu
>
>   From: Claudiu Barbura <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Thursday, June 5, 2014 at 2:41 AM
> To: "[email protected]" <[email protected]>
> Subject: Re: Framework Starvation
>
>   Hi Vinod,
>
>  I attached the master log after adding more logging to the sorter code.
> I believe the problem lies somewhere else however …
> in HierarchicalAllocatorProcess<RoleSorter, FrameworkSorter>::allocate()
>
>  I will continue to investigate in the meantime.
>
>  Thanks,
> Claudiu
>
>   From: Vinod Kone <[email protected]>
> Reply-To: "[email protected]" <[email protected]>
> Date: Tuesday, June 3, 2014 at 5:16 PM
> To: "[email protected]" <[email protected]>
> Subject: Re: Framework Starvation
>
>   Either should be fine. I don't think there are any changes in allocator
> since 0.18.0-rc1.
>
>
> On Tue, Jun 3, 2014 at 4:08 PM, Claudiu Barbura <
> [email protected]> wrote:
>
>>  Hi Vinod,
>>
>>  Should we use the same 0-18.1-rc1 branch or trunk code?
>>
>>  Thanks,
>> Claudiu
>>
>>   From: Vinod Kone <[email protected]>
>> Reply-To: "[email protected]" <[email protected]>
>>  Date: Tuesday, June 3, 2014 at 3:55 PM
>> To: "[email protected]" <[email protected]>
>> Subject: Re: Framework Starvation
>>
>>   Hey Claudiu,
>>
>>  Is it possible for you to run the same test but logging more
>> information about the framework shares? For example, it would be really
>> insightful if you can log each framework's share in DRFSorter::sort()
>> (see: master/drf_sorter.hpp). This will help us diagnose the problem. I
>> suspect one of our open tickets around allocation (MESOS-1119
>> <https://issues.apache.org/jira/browse/MESOS-1119>, MESOS-1130
>> <https://issues.apache.org/jira/browse/MESOS-1130> and MESOS-1187
>> <https://issues.apache.org/jira/browse/MESOS-1187>) is the issue. But it
>> would be good to have that logging data regardless to confirm.
>>
>>
>> On Mon, Jun 2, 2014 at 10:46 AM, Claudiu Barbura <
>> [email protected]> wrote:
>>
>>>  Hi Vinod,
>>>
>>>  I tried to attach the logs (2MB) and the email (see below) did not go
>>> through. I emailed your gmail account separately.
>>>
>>>  Thanks,
>>> Claudiu
>>>
>>>   From: Claudiu Barbura <[email protected]>
>>> Date: Monday, June 2, 2014 at 10:00 AM
>>>
>>> To: "[email protected]" <[email protected]>
>>> Subject: Re: Framework Starvation
>>>
>>>   Hi Vinod,
>>>
>>>  I attached the maser logs snapshots during starvation and after
>>> starvation.
>>>
>>>  There are 4 slave nodes and 1 master, all with of the same ec2
>>> instance type (cc2.8xlarge, 32 cores, 60GB RAM).
>>> I am running 4 shark-cli instances from the same master node, and
>>> running queries on all 4 of them … then “starvation” kicks in (see attached
>>> log_during_starvation file).
>>> After I terminate 2 of the shark-cli instances, the starved ones are
>>> receiving offers and are able to run queries again (see attached
>>> log_after_starvation file).
>>>
>>>  Let me know if you need the slave logs.
>>>
>>>  Thank you!
>>> Claudiu
>>>
>>>   From: Vinod Kone <[email protected]>
>>> Reply-To: "[email protected]" <[email protected]>
>>> Date: Friday, May 30, 2014 at 10:13 AM
>>> To: "[email protected]" <[email protected]>
>>> Subject: Re: Framework Starvation
>>>
>>>   Hey Claudiu,
>>>
>>>  Mind posting some master logs with the simple setup that you described
>>> (3 shark cli instances)? That would help us better diagnose the problem.
>>>
>>>
>>> On Fri, May 30, 2014 at 1:59 AM, Claudiu Barbura <
>>> [email protected]> wrote:
>>>
>>>>  This is a critical issue for us as we have to shut down frameworks
>>>> for various components in our platform to work and this has created more
>>>> contention than before we deployed Mesos, when everyone had to wait in line
>>>> for their MR/Hive jobs to run.
>>>>
>>>>  Any guidance, ideas would be extremely helpful at this point.
>>>>
>>>>  Thank you,
>>>> Claudiu
>>>>
>>>>   From: Claudiu Barbura <[email protected]>
>>>> Reply-To: "[email protected]" <[email protected]>
>>>> Date: Tuesday, May 27, 2014 at 11:57 PM
>>>> To: "[email protected]" <[email protected]>
>>>> Subject: Framework Starvation
>>>>
>>>>   Hi,
>>>>
>>>>  Following Ben’s suggestion at the Seattle Spark Meetup in April, I
>>>> built and deployed  the 0-18.1-rc1 branch hoping that this wold solve the
>>>> framework starvation problem we have been seeing in the past 2 months now.
>>>> The hope was that https://issues.apache.org/jira/browse/MESOS-1086 would
>>>> also help us. Unfortunately it did not.
>>>> This bug is preventing us to run multiple spark and shark servers
>>>> (http, thrift), in load balanced fashion, Hadoop and Aurora in the same
>>>> mesos cluster.
>>>>
>>>>  For example, if we start at least 3 frameworks, one Hadoop, one
>>>> SparkJobServer (one Spark context in fine-grained mode) and one Http
>>>> SharkServer (one JavaSharkContext that inherits from Spark Contexts, again
>>>> in fine-grained mode) and we run queries on all three of them, very soon we
>>>> notice the following behavior:
>>>>
>>>>
>>>>    - only the last two frameworks that we run queries against receive
>>>>    resource offers (master.cpp log entries in the log/mesos-master.INFO)
>>>>    - the other frameworks are ignored and not allocated any resources
>>>>    until we kill one the two privileged ones above
>>>>    - As soon as one of the privileged framework is terminated, one of
>>>>    the starved framework takes its place
>>>>    - Any new Spark context created in coarse-grained mode (fixed
>>>>    number of cores) will generally receive offers immediately (rarely it 
>>>> gets
>>>>    starved)
>>>>    - Hadoop behaves slightly differently when starved: task trackers
>>>>    are started but never released, which means, if the first job (Hive 
>>>> query)
>>>>    is small in terms of number of input splits, only one task tracker with 
>>>> a
>>>>    small number of allocated ores is created, and then all subsequent 
>>>> queries,
>>>>    regardless of size are only run in very limited mode with this one 
>>>> “small”
>>>>    task tracker. Most of the time only the map phase of a big query is
>>>>    completed while the reduce phase is hanging. Killing one of the 
>>>> registered
>>>>    Spark context above releases resources for Mesos to complete the query 
>>>> and
>>>>    gracefully shut down the task trackers (as noticed in the master log)
>>>>
>>>> We are using the default settings in terms of isolation, weights etc …
>>>> the only stand out configuration would be the memory allocation for slave
>>>> (export MESOS_resources=mem:35840 in mesos-slave-env.sh) but I am not sure
>>>> if this is ever enforced, as each framework has its own executor process
>>>> (JVM in our case) with its own memory allocation (we are not using cgroups
>>>> yet)
>>>>
>>>>  A very easy to reproduce this bug is to start a minimum of 3
>>>> shark-cli instances in a mesos cluster and notice that only two of them are
>>>> being offered resources and are running queries successfully.
>>>> I spent quite a bit of time in mesos, spark and hadoop-mesos code in an
>>>> attempt to find a possible workaround  but no luck so far.
>>>>
>>>>  Any guidance would be very appreciated.
>>>>
>>>>  Thank you,
>>>> Claudiu
>>>>
>>>>
>>>>
>>>
>>
>

Re: Framework Starvation

Reply via email to