Christopher,

have you tried to adjust the master allocation_interval flag?

On Thu, Jun 18, 2015 at 12:20 AM, Christopher Ketchum <[email protected]>
wrote:

> Hi,
>
> I think those logs were misleading, sorry. I am running the tests in
> Pycharm, which aggregates all the logs onto one console so I selected only
> the mesos messages that explicitly said they were from master. Here are
> those logs without my editing. Again, the last two messages are almost a
> second apart. The resources are recovered very quickly, but not offered up
> for another second. Is this delay to try to increase the offer size? Is
> that delay adjustable?
>
> I0617 11:34:08.581996 184418304 master.cpp:4623] Updating the latest state
> of task 1 of framework 20150617-113405-16777343-5050-6614-0000 to
> TASK_FINISHED
>
> I0617 11:34:08.582051 188174336 hierarchical.hpp:648] Recovered
> cpus(*):3.9 (total allocatable: mem(*):7136; disk(*):109424;
> ports(*):[31000-32000]; cpus(*):3.9) on slave
> 20150617-113405-16777343-5050-6614-S0 from framework
> 20150617-113405-16777343-5050-6614-0000
>
> I0617 11:34:08.582778 185491456 master.cpp:4690] Removing task 1 with
> resources cpus(*):3.9 of framework 20150617-113405-16777343-5050-6614-0000
> on slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051
> (localhost)
>
> I0617 11:34:08.582839 185491456 master.cpp:2787] Forwarding status update
> acknowledgement 3fec98f2-9f50-4968-9708-c7663f36b62d for task 1 of
> framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python))
> at [email protected]:54818 to
> slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051
> (localhost)
>
> I0617 11:34:08.583075 256425984 status_update_manager.cpp:389] Received
> status update acknowledgement (UUID: 3fec98f2-9f50-4968-9708-c7663f36b62d)
> for task 1 of framework 20150617-113405-16777343-5050-6614-0000
>
> I0617 11:34:09.446701 184954880 master.cpp:3760] Sending 1 offers to
> framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python))
> at [email protected]:54818
>
> Thanks,
> Christopher
>
> On Jun 17, 2015, at 1:54 PM, Vinod Kone <[email protected]> wrote:
>
> Looks like the hierarchical allocator doesn't trigger an allocation when
> resources are recovered from a finished task (likely a bug. can you file a
> ticket?). Instead it depends on the periodic allocation interval (default
> 1s, configurable via flags.allocation_interval) for the next allocation. In
> the meanwhile, you can reduce the default allocation interval via the flag
> to speed it up.
>
> On Wed, Jun 17, 2015 at 11:59 AM, Christopher Ketchum <[email protected]>
> wrote:
>
>> You can see there is about a second delay between the last two messages.
>> Its not a huge amount of time but it is noticeable, especially when testing
>> with many short tasks.
>>
>> I0617 11:34:08.582778 185491456 master.cpp:4690] Removing task 1 with
>> resources cpus(*):3.9 of framework 20150617-113405-16777343-5050-6614-0000
>> on slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051
>> (localhost)
>>
>> I0617 11:34:08.582839 185491456 master.cpp:2787] Forwarding status update
>> acknowledgement 3fec98f2-9f50-4968-9708-c7663f36b62d for task 1 of
>> framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python))
>> at [email protected]:54818 to
>> slave 20150617-113405-16777343-5050-6614-S0 at slave(1)@127.0.0.1:5051
>> (localhost)
>>
>> I0617 11:34:09.446701 184954880 master.cpp:3760] Sending 1 offers to
>> framework 20150617-113405-16777343-5050-6614-0000 (Test Framework (Python))
>> at [email protected]:54818
>>
>> On Jun 17, 2015, at 10:18 AM, Vinod Kone <[email protected]> wrote:
>>
>> Can you paste the master logs for when the task is finished and the next
>> offer is sent?
>>
>> On Wed, Jun 17, 2015 at 9:11 AM, Christopher Ketchum <[email protected]>
>> wrote:
>>
>>> Hi everyone,
>>>
>>> Thanks for the responses. To clarify, I’m only running one framework
>>> with a single slave for testing purposes, and it is the re-offers that I am
>>> trying to adjust. When I watch the program run I see tasks updating to
>>> TASK_FINISHED, but there is a noticeable delay where my framework has the
>>> next task queued but the master has not yet reoffered those resources, so
>>> the program pauses until it gets the next offer.
>>>
>>> I am mainly concerned that I haven’t configured something properly, and
>>> when I scale up the delays will compound. Of course, it is also possible
>>> that with multiple slaves able to offer resources these delays will
>>> disappear.
>>>
>>> Thanks again,
>>> Christopher
>>>
>>> On Jun 14, 2015, at 8:11 AM, Alex Gaudio <[email protected]> wrote:
>>>
>>> Hi Christopher,
>>>
>>> To let a particular mesos framework receive more offers than other
>>> frameworks, we assign our frameworks weights.  The higher the weight, the
>>> more frequently the framework will receive an offer.  See the '--weights'
>>> and '--roles' options in the config:
>>> http://mesos.apache.org/documentation/latest/configuration/.
>>> Basically, a higher weight > 1 means more offers get sent to your
>>> framework.  The mesos source code for how weighting works is shown here:
>>>
>>> https://github.com/apache/mesos/blob/9e7b890a917fcf0ac4cd1738f060ba97af847b65/src/master/allocator/sorter/drf/sorter.cpp#L306
>>> and
>>> https://github.com/apache/mesos/blob/9e7b890a917fcf0ac4cd1738f060ba97af847b65/src/master/allocator/sorter/drf/sorter.cpp#L41
>>> .
>>>
>>> What you may want to do is create a "role" called "development_mode" and
>>> then assign the role a high weight (like 40).  You would then assign your
>>> framework to the "development_mode" role.  What we've actually done is
>>> created roles named the numbers 1,2,3,4,5,10,20,30,40, where each role maps
>>> to a weight of that number ... and we then we allow frameworks to choose
>>> which role they start up as.  At Mesoscon, I will be speaking about why we
>>> do this and how we are solving some general issues with the DRF algorithm,
>>> if you're interested!
>>>
>>> Alex
>>>
>>>
>>>
>>> On Sun, Jun 14, 2015 at 5:58 AM Alex Rukletsov <[email protected]>
>>> wrote:
>>>
>>>> Christopher,
>>>>
>>>> try adjusting master allocation_interval flag. It specifies often the
>>>> allocator performs batch allocations to frameworks. As Ondrej pointed out,
>>>> if you framework explicitly declines offers, it won't be re-offered the
>>>> same resources for some period of time.
>>>>
>>>> On Sat, Jun 13, 2015 at 8:30 PM, Ondrej Smola <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi Christopher,
>>>>>
>>>>> i dont know about any way way how to speed up first resource offer -
>>>>> in my experience new offers arrive almost immediately after framework
>>>>> registration. It depends on the infrastructure you are testing your
>>>>> framework on - are there any
>>>>> other frameworks running? As is discussed in an another thread offers
>>>>> should be send to multiple frameworks at once. There may be small
>>>>> delay based on initial registration and network delay. If you speak
>>>>> about "reoffers" - reoffering
>>>>> decline offers - there should param to set interval for reoffer. For
>>>>> example in Go you can decline offer this way (it is also important to
>>>>> decline every non used offer):
>>>>>
>>>>> driver.DeclineOffer(offer.Id, &mesos.Filters{RefuseSeconds:
>>>>> proto.Float64(5)})
>>>>>
>>>>> Look to mesos UI - it shoud give you information abou what offers are
>>>>> offered to which frameworks, mesos master logs also give you this
>>>>> information.
>>>>>
>>>>>
>>>>> 2015-06-13 18:23 GMT+02:00 Christopher Ketchum <[email protected]>:
>>>>> > Hi,
>>>>> >
>>>>> > I was wondering if there was any way to adjust the rate of resource
>>>>> offers to the framework. I am writing a mesos framework, and when I am
>>>>> testing it I am noticing a slight pause were the framework seems to be
>>>>> waiting for another resource offer. I would like to know if there is any
>>>>> way to speed these offers up, just to make testing a little faster.
>>>>> >
>>>>> > Thanks,
>>>>> > Chris
>>>>>
>>>>
>>>>
>>>
>>
>>
>
>

Reply via email to