On Thu, Jun 19, 2014 at 10:46 AM, Vinod Kone <[email protected]> wrote:
> Waiting to see your blog post :) > > That said, what baffles me is that in the very beginning when only two > frameworks are present and no tasks have been launched, one framework is > getting more allocations than other (see the log lines I posted in the > earlier email), which is unexpected. > > > @vinodkone > > > On Tue, Jun 17, 2014 at 9:41 PM, Claudiu Barbura < > [email protected]> wrote: > >> Hi Vinod, >> >> Yo are looking at logs I had posted before we implemented our fix >> (files attached in my last email). >> I will write a detailed blog post on the issue … after the Spark Summit >> at the end of this month. >> >> What wold happen before is that frameworks with the same share (0) >> would also have the smallest allocation in the beginning, and after sorting >> the list they would be at the top, always offered all the resources before >> other frameworks that had been already offered, running tasks with a share >> and allocation > 0. >> >> Thanks, >> Claudiu >> >> From: Vinod Kone <[email protected]> >> Reply-To: "[email protected]" <[email protected]> >> Date: Wednesday, June 18, 2014 at 4:54 AM >> >> To: "[email protected]" <[email protected]> >> Subject: Re: Framework Starvation >> >> Hey Claudiu, >> >> I spent some time trying to understand the logs you posted. Whats >> strange to me is that in the very beginning when framework's 1 and 2 are >> registered, only one framework gets offers for a period of 9s. It's not >> clear why this happens. I even wrote a test ( >> https://reviews.apache.org/r/22714/) to repro but wasn't able to. >> >> It would probably be helpful to add more logging to the drf sorting >> comparator function to understand why frameworks are sorted in such a way >> when their share is same (0). My expectation is that after each allocation, >> the 'allocations' for a framework should increase causing the sort function >> to behave correctly. But that doesn't seem to be happening in your case. >> >> >> I0604 22:12:43.715530 22270 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0000 >> >> I0604 22:12:44.276062 22273 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:44.756918 22292 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0000 >> >> I0604 22:12:45.794178 22276 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:46.841629 22291 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:47.884266 22262 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:48.926856 22268 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:49.966560 22280 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:51.007143 22267 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:52.047987 22280 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:53.089340 22291 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0001 >> >> I0604 22:12:54.130242 22263 master.cpp:2282] Sending 4 offers to >> framework 20140604-221214-302055434-5050-22260-0000 >> >> >> @vinodkone >> >> >> On Fri, Jun 13, 2014 at 3:40 PM, Claudiu Barbura < >> [email protected]> wrote: >> >>> Hi Vinod, >>> >>> Attached are the patch files.Hadoop has to be treated differently as >>> it requires resources in order to shut down task trackers after a job is >>> complete. Therefore we set the role name so that Mesos allocates resources >>> for it first, ahead of the rest of the frameworks under the default role >>> (*). >>> This is not ideal, we are going to loo into the Hadoop Mesos framework >>> code and fix if possible. Luckily, Hadoop is the only framework we use on >>> top of Mesos that allows a configurable role name to be passed in when >>> registering a framework (unlike, Spark, Aurora, Storm etc) >>> For the non-Hadoop frameworks, we are making sure that once a framework >>> is running its jobs, Mesos no longer offers resources to it. In the same >>> time, once a framework completes its job, we make sure its “client >>> allocations” value is updated so that when it completes the execution of >>> its jobs, it is placed back in the sorting list with a real chance of being >>> offered again immediately (not starved!). >>> What is also key is that mem type resources are ignored during share >>> computation as only cpus are a good indicator of which frameworks are >>> actually running jobs in the cluster. >>> >>> Thanks, >>> Claudiu >>> >>> From: Claudiu Barbura <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Thursday, June 12, 2014 at 6:20 PM >>> >>> To: "[email protected]" <[email protected]> >>> Subject: Re: Framework Starvation >>> >>> Hi Vinod, >>> >>> We have a fix (more like a hack) that works for us, but it requires us >>> to run each Hadoop framework with a different role as we need to treat >>> Hadoop differently than the rest of the frameworks (Spark, Shark, Aurora) >>> which are running with the default role (*). >>> We had to change the drf_sorter.cpp/hpp and >>> hierarchical_allocator_process.cpp files. >>> >>> Let me know if you need more info on this. >>> >>> Thanks, >>> Claudiu >>> >>> From: Claudiu Barbura <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Thursday, June 5, 2014 at 2:41 AM >>> To: "[email protected]" <[email protected]> >>> Subject: Re: Framework Starvation >>> >>> Hi Vinod, >>> >>> I attached the master log after adding more logging to the sorter code. >>> I believe the problem lies somewhere else however … >>> in HierarchicalAllocatorProcess<RoleSorter, FrameworkSorter>::allocate() >>> >>> I will continue to investigate in the meantime. >>> >>> Thanks, >>> Claudiu >>> >>> From: Vinod Kone <[email protected]> >>> Reply-To: "[email protected]" <[email protected]> >>> Date: Tuesday, June 3, 2014 at 5:16 PM >>> To: "[email protected]" <[email protected]> >>> Subject: Re: Framework Starvation >>> >>> Either should be fine. I don't think there are any changes in >>> allocator since 0.18.0-rc1. >>> >>> >>> On Tue, Jun 3, 2014 at 4:08 PM, Claudiu Barbura < >>> [email protected]> wrote: >>> >>>> Hi Vinod, >>>> >>>> Should we use the same 0-18.1-rc1 branch or trunk code? >>>> >>>> Thanks, >>>> Claudiu >>>> >>>> From: Vinod Kone <[email protected]> >>>> Reply-To: "[email protected]" <[email protected]> >>>> Date: Tuesday, June 3, 2014 at 3:55 PM >>>> To: "[email protected]" <[email protected]> >>>> Subject: Re: Framework Starvation >>>> >>>> Hey Claudiu, >>>> >>>> Is it possible for you to run the same test but logging more >>>> information about the framework shares? For example, it would be really >>>> insightful if you can log each framework's share in DRFSorter::sort() >>>> (see: master/drf_sorter.hpp). This will help us diagnose the problem. I >>>> suspect one of our open tickets around allocation (MESOS-1119 >>>> <https://issues.apache.org/jira/browse/MESOS-1119>, MESOS-1130 >>>> <https://issues.apache.org/jira/browse/MESOS-1130> and MESOS-1187 >>>> <https://issues.apache.org/jira/browse/MESOS-1187>) is the issue. But >>>> it would be good to have that logging data regardless to confirm. >>>> >>>> >>>> On Mon, Jun 2, 2014 at 10:46 AM, Claudiu Barbura < >>>> [email protected]> wrote: >>>> >>>>> Hi Vinod, >>>>> >>>>> I tried to attach the logs (2MB) and the email (see below) did not >>>>> go through. I emailed your gmail account separately. >>>>> >>>>> Thanks, >>>>> Claudiu >>>>> >>>>> From: Claudiu Barbura <[email protected]> >>>>> Date: Monday, June 2, 2014 at 10:00 AM >>>>> >>>>> To: "[email protected]" <[email protected]> >>>>> Subject: Re: Framework Starvation >>>>> >>>>> Hi Vinod, >>>>> >>>>> I attached the maser logs snapshots during starvation and after >>>>> starvation. >>>>> >>>>> There are 4 slave nodes and 1 master, all with of the same ec2 >>>>> instance type (cc2.8xlarge, 32 cores, 60GB RAM). >>>>> I am running 4 shark-cli instances from the same master node, and >>>>> running queries on all 4 of them … then “starvation” kicks in (see >>>>> attached >>>>> log_during_starvation file). >>>>> After I terminate 2 of the shark-cli instances, the starved ones are >>>>> receiving offers and are able to run queries again (see attached >>>>> log_after_starvation file). >>>>> >>>>> Let me know if you need the slave logs. >>>>> >>>>> Thank you! >>>>> Claudiu >>>>> >>>>> From: Vinod Kone <[email protected]> >>>>> Reply-To: "[email protected]" <[email protected]> >>>>> Date: Friday, May 30, 2014 at 10:13 AM >>>>> To: "[email protected]" <[email protected]> >>>>> Subject: Re: Framework Starvation >>>>> >>>>> Hey Claudiu, >>>>> >>>>> Mind posting some master logs with the simple setup that you >>>>> described (3 shark cli instances)? That would help us better diagnose the >>>>> problem. >>>>> >>>>> >>>>> On Fri, May 30, 2014 at 1:59 AM, Claudiu Barbura < >>>>> [email protected]> wrote: >>>>> >>>>>> This is a critical issue for us as we have to shut down frameworks >>>>>> for various components in our platform to work and this has created more >>>>>> contention than before we deployed Mesos, when everyone had to wait in >>>>>> line >>>>>> for their MR/Hive jobs to run. >>>>>> >>>>>> Any guidance, ideas would be extremely helpful at this point. >>>>>> >>>>>> Thank you, >>>>>> Claudiu >>>>>> >>>>>> From: Claudiu Barbura <[email protected]> >>>>>> Reply-To: "[email protected]" <[email protected]> >>>>>> Date: Tuesday, May 27, 2014 at 11:57 PM >>>>>> To: "[email protected]" <[email protected]> >>>>>> Subject: Framework Starvation >>>>>> >>>>>> Hi, >>>>>> >>>>>> Following Ben’s suggestion at the Seattle Spark Meetup in April, I >>>>>> built and deployed the 0-18.1-rc1 branch hoping that this wold solve the >>>>>> framework starvation problem we have been seeing in the past 2 months >>>>>> now. >>>>>> The hope was that https://issues.apache.org/jira/browse/MESOS-1086 would >>>>>> also help us. Unfortunately it did not. >>>>>> This bug is preventing us to run multiple spark and shark servers >>>>>> (http, thrift), in load balanced fashion, Hadoop and Aurora in the same >>>>>> mesos cluster. >>>>>> >>>>>> For example, if we start at least 3 frameworks, one Hadoop, one >>>>>> SparkJobServer (one Spark context in fine-grained mode) and one Http >>>>>> SharkServer (one JavaSharkContext that inherits from Spark Contexts, >>>>>> again >>>>>> in fine-grained mode) and we run queries on all three of them, very soon >>>>>> we >>>>>> notice the following behavior: >>>>>> >>>>>> >>>>>> - only the last two frameworks that we run queries against >>>>>> receive resource offers (master.cpp log entries in the >>>>>> log/mesos-master.INFO) >>>>>> - the other frameworks are ignored and not allocated any >>>>>> resources until we kill one the two privileged ones above >>>>>> - As soon as one of the privileged framework is terminated, one >>>>>> of the starved framework takes its place >>>>>> - Any new Spark context created in coarse-grained mode (fixed >>>>>> number of cores) will generally receive offers immediately (rarely it >>>>>> gets >>>>>> starved) >>>>>> - Hadoop behaves slightly differently when starved: task trackers >>>>>> are started but never released, which means, if the first job (Hive >>>>>> query) >>>>>> is small in terms of number of input splits, only one task tracker >>>>>> with a >>>>>> small number of allocated ores is created, and then all subsequent >>>>>> queries, >>>>>> regardless of size are only run in very limited mode with this one >>>>>> “small” >>>>>> task tracker. Most of the time only the map phase of a big query is >>>>>> completed while the reduce phase is hanging. Killing one of the >>>>>> registered >>>>>> Spark context above releases resources for Mesos to complete the >>>>>> query and >>>>>> gracefully shut down the task trackers (as noticed in the master log) >>>>>> >>>>>> We are using the default settings in terms of isolation, weights etc >>>>>> … the only stand out configuration would be the memory allocation for >>>>>> slave >>>>>> (export MESOS_resources=mem:35840 in mesos-slave-env.sh) but I am not >>>>>> sure >>>>>> if this is ever enforced, as each framework has its own executor process >>>>>> (JVM in our case) with its own memory allocation (we are not using >>>>>> cgroups >>>>>> yet) >>>>>> >>>>>> A very easy to reproduce this bug is to start a minimum of 3 >>>>>> shark-cli instances in a mesos cluster and notice that only two of them >>>>>> are >>>>>> being offered resources and are running queries successfully. >>>>>> I spent quite a bit of time in mesos, spark and hadoop-mesos code in >>>>>> an attempt to find a possible workaround but no luck so far. >>>>>> >>>>>> Any guidance would be very appreciated. >>>>>> >>>>>> Thank you, >>>>>> Claudiu >>>>>> >>>>>> >>>>>> >>>>> >>>> >>> >> >

