Re: Metric for tasks queued/waiting?

Niklas Nielsen Wed, 23 Sep 2015 16:24:07 -0700

Created a ticket for us to continue the discussion:
https://issues.apache.org/jira/browse/MESOS-3507


We can try to capture the explicit use-case from Aaron and maybe create
another ticket to track a more-or-less generic path we could go down.

Cheers,
Niklas

On 23 September 2015 at 15:55, Sharma Podila <[email protected]> wrote:

> Discussing in a separate place/JIRA ticket sounds good.
> Basically, representing contention using a summary of pending resource
> requests from each framework could be the hints to mesos master. However,
> this gets into intricacies, not the least of which is diversity of resource
> requests, qualified by queue depth.
> Another way to think of this could be that each framework could trigger a
> scale up individually (say, by hitting a mesos master or another
> independent service's endpoint to add additional agents/slaves). Even
> uncoordinated scale up actions from multiple frameworks should result in
> the same end result, modulo reservations/limits/etc. Then, mesos master
> needs to deal with only scale down, which it could perform based on offer
> rejections from frameworks, implying nobody needs that many agents/slaves.
>
> Maybe that's more details than needed in this discussion...
>
>
>
> On Wed, Sep 23, 2015 at 2:05 PM, Niklas Nielsen <[email protected]>
> wrote:
>
>> I'd love to see this solved in a general way; "How does the framework
>> communicate (insert intent, metric, hint, etc) to mesos".
>>
>> In one way, the 'webui_url' of in the framework info conveys "This is how
>> you get to my web ui". As providing a webui was a common pattern for the
>> frameworks.
>>
>> This could be expanded, so the framework can report an 'apiui_url' or
>> maybe even more specific "metrics_url" where the mesos master (or other
>> frameworks and 3rd party tooling) can get insights into queue depths,
>> resource preferences, etc.
>>
>> We can start discussing this further in a JIRA ticket :)
>>
>> Niklas
>>
>> On 23 September 2015 at 13:54, Alex Gaudio <[email protected]> wrote:
>>
>>> Hi Aaron,
>>>
>>> You might consider trying to solve the autoscaling problem with Relay, a
>>> Python tool I use to solve this problem.  Feel free to shoot me an email if
>>> you are interested.
>>>
>>> github.com/sailthru/relay
>>>
>>> Alex
>>>
>>> On Wed, Sep 23, 2015, 11:03 AM David Greenberg <[email protected]>
>>> wrote:
>>>
>>>> In addition, this technique could be implemented in the allocator with
>>>> an understanding of global demand:
>>>> https://www.youtube.com/watch?v=BkBMYUe76oI
>>>>
>>>> That would allow for tunable fair-sharing based on DRF-principles.
>>>>
>>>> On Wed, Sep 23, 2015 at 10:59 AM haosdent <[email protected]> wrote:
>>>>
>>>>> Feel free to open a story in jira if you think you ideas are awesome.
>>>>> :-)
>>>>>
>>>> On Sep 23, 2015 10:54 PM, "Sharma Podila" <[email protected]> wrote:
>>>>>
>>>>>> Ah, OK, thanks. Yes, Fenzo is a Java library.
>>>>>>
>>>>>> It might be a nice addition to Mesos master to get a global view of
>>>>>> contention for resources. In addition to autoscaling, it would be useful 
>>>>>> in
>>>>>> the allocator.
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <[email protected]> wrote:
>>>>>>
>>>>>>> Thanks Sharma,
>>>>>>>
>>>>>>> I was in the audience for a talk you did about Fenzo at MesosCon :)
>>>>>>> It looked great but we're a python shop primarily so the Java 
>>>>>>> requirement
>>>>>>> would be a problem for us.
>>>>>>>
>>>>>>> The scaling in the scheduler makes total sense, (obvious when you
>>>>>>> think about it!), I was naively hoping for some sort of knowledge of 
>>>>>>> that
>>>>>>> back in the Mesos master as we were hoping to have scaling be 
>>>>>>> independent
>>>>>>> of schedulers. I think this'll need a re-think!
>>>>>>>
>>>>>>> Thanks for your help!
>>>>>>>
>>>>>>> Aaron
>>>>>>>
>>>>>>> ------------------------------
>>>>>>> *From:* Sharma Podila [[email protected]]
>>>>>>> *Sent:* 23 September 2015 15:22
>>>>>>>
>>>>>>> *To:* [email protected]
>>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>>
>>>>>>> Jobs/tasks wait in framework schedulers, not mesos master.
>>>>>>> Autoscaling triggers must come from schedulers, not only because that's 
>>>>>>> who
>>>>>>> knows the pending task set size, but, also because it knows how many of
>>>>>>> them need to be launched right away, on what kind of machines.
>>>>>>>
>>>>>>> We built such an autoscaling capability in our framework schedulers.
>>>>>>> The autoscaling is achieved by our library Fenzo
>>>>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently.
>>>>>>> Also read about Fenzo autoscaling here
>>>>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should
>>>>>>> look into using that if you are developing your own scheduler. Or, have
>>>>>>> your scheduler team pick up Fenzo for autoscaling.
>>>>>>>
>>>>>>> Also, note that scaling up is temptingly easy by watching the
>>>>>>> pending task queue. But, scaling down requires bin packing, etc. Other
>>>>>>> issues pop up as well, for example:
>>>>>>>
>>>>>>> - what if a user submits tasks that cannot be satisfied? Will
>>>>>>> autoscale keep increasing the cluster size unbounded?
>>>>>>> - what if you would like to have a heterogeneous mix of hosts and
>>>>>>> tasks? which kind of hosts do you need to autoscale based on which tasks
>>>>>>> are pending?
>>>>>>>
>>>>>>> These are automatically addressed in Fenzo.
>>>>>>>
>>>>>>> Sharma
>>>>>>>
>>>>>>>
>>>>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <[email protected]> wrote:
>>>>>>>
>>>>>>>> No, I basically had the same question as Jim (but maybe didn't word
>>>>>>>> it so well ;))
>>>>>>>>
>>>>>>>> I'll have a look at your response there :)
>>>>>>>>
>>>>>>>> ------------------------------
>>>>>>>> *From:* haosdent [[email protected]]
>>>>>>>> *Sent:* 23 September 2015 10:12
>>>>>>>> *To:* [email protected]
>>>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>>>
>>>>>>>> Does /metrics/snapshot not satisfy your requirement?
>>>>>>>>
>>>>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <[email protected]>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi all,
>>>>>>>>>
>>>>>>>>> Is there any way to get a metric of all tasks currently
>>>>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics 
>>>>>>>>> seem
>>>>>>>>> to cover ever other kind of task state? This would be quite useful for
>>>>>>>>> auto-scaling purposes..
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>> Aaron
>>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> --
>>>>>>>> Best Regards,
>>>>>>>> Haosdent Huang
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>
>

Re: Metric for tasks queued/waiting?

Reply via email to