Christopher, have you tried to adjust the master allocation_interval flag?
On Thu, Jun 18, 2015 at 12:20 AM, Christopher Ketchum <[email protected]> wrote: > Hi, > > I think those logs were misleading, sorry. I am running the tests in > Pycharm, which aggregates all the logs onto one console so I selected only > the mesos messages that explicitly said they were from master. Here are > those logs without my editing. Again, the last two messages are almost a > second apart. The resources are recovered very quickly, but not offered up > for another second. Is this delay to try to increase the offer size? Is > that delay adjustable? > > I0617 11:34:08.581996 184418304 master.cpp:4623] Updating the latest state > of task 1 of framework 20150617-113405-16777343-5050-6614-0000 to > TASK_FINISHED > > I0617 11:34:08.582051 188174336 hierarchical.hpp:648] Recovered > cpus(*):3.9 (total allocatable: mem(*):7136; disk(*):109424; > ports(*):[31000-32000]; cpus(*):3.9) on slave > 20150617-113405-16777343-5050-6614-S0 from framework > 20150617-113405-16777343-5050-6614-0000 > > I0617 11:34:08.582778 185491456 master.cpp:4690] Removing task 1 with > resources cpus(*):3.9 of framework 20150617-113405-16777343-5050-6614-0000 > on slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051 > (localhost) > > I0617 11:34:08.582839 185491456 master.cpp:2787] Forwarding status update > acknowledgement 3fec98f2-9f50-4968-9708-c7663f36b62d for task 1 of > framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python)) > at [email protected]:54818 to > slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051 > (localhost) > > I0617 11:34:08.583075 256425984 status_update_manager.cpp:389] Received > status update acknowledgement (UUID: 3fec98f2-9f50-4968-9708-c7663f36b62d) > for task 1 of framework 20150617-113405-16777343-5050-6614-0000 > > I0617 11:34:09.446701 184954880 master.cpp:3760] Sending 1 offers to > framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python)) > at [email protected]:54818 > > Thanks, > Christopher > > On Jun 17, 2015, at 1:54 PM, Vinod Kone <[email protected]> wrote: > > Looks like the hierarchical allocator doesn't trigger an allocation when > resources are recovered from a finished task (likely a bug. can you file a > ticket?). Instead it depends on the periodic allocation interval (default > 1s, configurable via flags.allocation_interval) for the next allocation. In > the meanwhile, you can reduce the default allocation interval via the flag > to speed it up. > > On Wed, Jun 17, 2015 at 11:59 AM, Christopher Ketchum <[email protected]> > wrote: > >> You can see there is about a second delay between the last two messages. >> Its not a huge amount of time but it is noticeable, especially when testing >> with many short tasks. >> >> I0617 11:34:08.582778 185491456 master.cpp:4690] Removing task 1 with >> resources cpus(*):3.9 of framework 20150617-113405-16777343-5050-6614-0000 >> on slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051 >> (localhost) >> >> I0617 11:34:08.582839 185491456 master.cpp:2787] Forwarding status update >> acknowledgement 3fec98f2-9f50-4968-9708-c7663f36b62d for task 1 of >> framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python)) >> at [email protected]:54818 to >> slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051 >> (localhost) >> >> I0617 11:34:09.446701 184954880 master.cpp:3760] Sending 1 offers to >> framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python)) >> at [email protected]:54818 >> >> On Jun 17, 2015, at 10:18 AM, Vinod Kone <[email protected]> wrote: >> >> Can you paste the master logs for when the task is finished and the next >> offer is sent? >> >> On Wed, Jun 17, 2015 at 9:11 AM, Christopher Ketchum <[email protected]> >> wrote: >> >>> Hi everyone, >>> >>> Thanks for the responses. To clarify, I’m only running one framework >>> with a single slave for testing purposes, and it is the re-offers that I am >>> trying to adjust. When I watch the program run I see tasks updating to >>> TASK_FINISHED, but there is a noticeable delay where my framework has the >>> next task queued but the master has not yet reoffered those resources, so >>> the program pauses until it gets the next offer. >>> >>> I am mainly concerned that I haven’t configured something properly, and >>> when I scale up the delays will compound. Of course, it is also possible >>> that with multiple slaves able to offer resources these delays will >>> disappear. >>> >>> Thanks again, >>> Christopher >>> >>> On Jun 14, 2015, at 8:11 AM, Alex Gaudio <[email protected]> wrote: >>> >>> Hi Christopher, >>> >>> To let a particular mesos framework receive more offers than other >>> frameworks, we assign our frameworks weights. The higher the weight, the >>> more frequently the framework will receive an offer. See the '--weights' >>> and '--roles' options in the config: >>> http://mesos.apache.org/documentation/latest/configuration/. >>> Basically, a higher weight > 1 means more offers get sent to your >>> framework. The mesos source code for how weighting works is shown here: >>> >>> https://github.com/apache/mesos/blob/9e7b890a917fcf0ac4cd1738f060ba97af847b65/src/master/allocator/sorter/drf/sorter.cpp#L306 >>> and >>> https://github.com/apache/mesos/blob/9e7b890a917fcf0ac4cd1738f060ba97af847b65/src/master/allocator/sorter/drf/sorter.cpp#L41 >>> . >>> >>> What you may want to do is create a "role" called "development_mode" and >>> then assign the role a high weight (like 40). You would then assign your >>> framework to the "development_mode" role. What we've actually done is >>> created roles named the numbers 1,2,3,4,5,10,20,30,40, where each role maps >>> to a weight of that number ... and we then we allow frameworks to choose >>> which role they start up as. At Mesoscon, I will be speaking about why we >>> do this and how we are solving some general issues with the DRF algorithm, >>> if you're interested! >>> >>> Alex >>> >>> >>> >>> On Sun, Jun 14, 2015 at 5:58 AM Alex Rukletsov <[email protected]> >>> wrote: >>> >>>> Christopher, >>>> >>>> try adjusting master allocation_interval flag. It specifies often the >>>> allocator performs batch allocations to frameworks. As Ondrej pointed out, >>>> if you framework explicitly declines offers, it won't be re-offered the >>>> same resources for some period of time. >>>> >>>> On Sat, Jun 13, 2015 at 8:30 PM, Ondrej Smola <[email protected]> >>>> wrote: >>>> >>>>> Hi Christopher, >>>>> >>>>> i dont know about any way way how to speed up first resource offer - >>>>> in my experience new offers arrive almost immediately after framework >>>>> registration. It depends on the infrastructure you are testing your >>>>> framework on - are there any >>>>> other frameworks running? As is discussed in an another thread offers >>>>> should be send to multiple frameworks at once. There may be small >>>>> delay based on initial registration and network delay. If you speak >>>>> about "reoffers" - reoffering >>>>> decline offers - there should param to set interval for reoffer. For >>>>> example in Go you can decline offer this way (it is also important to >>>>> decline every non used offer): >>>>> >>>>> driver.DeclineOffer(offer.Id, &mesos.Filters{RefuseSeconds: >>>>> proto.Float64(5)}) >>>>> >>>>> Look to mesos UI - it shoud give you information abou what offers are >>>>> offered to which frameworks, mesos master logs also give you this >>>>> information. >>>>> >>>>> >>>>> 2015-06-13 18:23 GMT+02:00 Christopher Ketchum <[email protected]>: >>>>> > Hi, >>>>> > >>>>> > I was wondering if there was any way to adjust the rate of resource >>>>> offers to the framework. I am writing a mesos framework, and when I am >>>>> testing it I am noticing a slight pause were the framework seems to be >>>>> waiting for another resource offer. I would like to know if there is any >>>>> way to speed these offers up, just to make testing a little faster. >>>>> > >>>>> > Thanks, >>>>> > Chris >>>>> >>>> >>>> >>> >> >> > >

