Re: Metric for tasks queued/waiting?

Niklas Nielsen Wed, 23 Sep 2015 14:07:00 -0700

I'd love to see this solved in a general way; "How does the framework
communicate (insert intent, metric, hint, etc) to mesos".


In one way, the 'webui_url' of in the framework info conveys "This is how
you get to my web ui". As providing a webui was a common pattern for the
frameworks.

This could be expanded, so the framework can report an 'apiui_url' or maybe
even more specific "metrics_url" where the mesos master (or other
frameworks and 3rd party tooling) can get insights into queue depths,
resource preferences, etc.

We can start discussing this further in a JIRA ticket :)

Niklas

On 23 September 2015 at 13:54, Alex Gaudio <[email protected]> wrote:

> Hi Aaron,
>
> You might consider trying to solve the autoscaling problem with Relay, a
> Python tool I use to solve this problem.  Feel free to shoot me an email if
> you are interested.
>
> github.com/sailthru/relay
>
> Alex
>
> On Wed, Sep 23, 2015, 11:03 AM David Greenberg <[email protected]>
> wrote:
>
>> In addition, this technique could be implemented in the allocator with an
>> understanding of global demand:
>> https://www.youtube.com/watch?v=BkBMYUe76oI
>>
>> That would allow for tunable fair-sharing based on DRF-principles.
>>
>> On Wed, Sep 23, 2015 at 10:59 AM haosdent <[email protected]> wrote:
>>
>>> Feel free to open a story in jira if you think you ideas are awesome. :-)
>>>
>> On Sep 23, 2015 10:54 PM, "Sharma Podila" <[email protected]> wrote:
>>>
>>>> Ah, OK, thanks. Yes, Fenzo is a Java library.
>>>>
>>>> It might be a nice addition to Mesos master to get a global view of
>>>> contention for resources. In addition to autoscaling, it would be useful in
>>>> the allocator.
>>>>
>>>>
>>>>
>>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <[email protected]> wrote:
>>>>
>>>>> Thanks Sharma,
>>>>>
>>>>> I was in the audience for a talk you did about Fenzo at MesosCon :) It
>>>>> looked great but we're a python shop primarily so the Java requirement
>>>>> would be a problem for us.
>>>>>
>>>>> The scaling in the scheduler makes total sense, (obvious when you
>>>>> think about it!), I was naively hoping for some sort of knowledge of that
>>>>> back in the Mesos master as we were hoping to have scaling be independent
>>>>> of schedulers. I think this'll need a re-think!
>>>>>
>>>>> Thanks for your help!
>>>>>
>>>>> Aaron
>>>>>
>>>>> ------------------------------
>>>>> *From:* Sharma Podila [[email protected]]
>>>>> *Sent:* 23 September 2015 15:22
>>>>>
>>>>> *To:* [email protected]
>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>
>>>>> Jobs/tasks wait in framework schedulers, not mesos master. Autoscaling
>>>>> triggers must come from schedulers, not only because that's who knows the
>>>>> pending task set size, but, also because it knows how many of them need to
>>>>> be launched right away, on what kind of machines.
>>>>>
>>>>> We built such an autoscaling capability in our framework schedulers.
>>>>> The autoscaling is achieved by our library Fenzo
>>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently.
>>>>> Also read about Fenzo autoscaling here
>>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look
>>>>> into using that if you are developing your own scheduler. Or, have your
>>>>> scheduler team pick up Fenzo for autoscaling.
>>>>>
>>>>> Also, note that scaling up is temptingly easy by watching the pending
>>>>> task queue. But, scaling down requires bin packing, etc. Other issues pop
>>>>> up as well, for example:
>>>>>
>>>>> - what if a user submits tasks that cannot be satisfied? Will
>>>>> autoscale keep increasing the cluster size unbounded?
>>>>> - what if you would like to have a heterogeneous mix of hosts and
>>>>> tasks? which kind of hosts do you need to autoscale based on which tasks
>>>>> are pending?
>>>>>
>>>>> These are automatically addressed in Fenzo.
>>>>>
>>>>> Sharma
>>>>>
>>>>>
>>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <[email protected]> wrote:
>>>>>
>>>>>> No, I basically had the same question as Jim (but maybe didn't word
>>>>>> it so well ;))
>>>>>>
>>>>>> I'll have a look at your response there :)
>>>>>>
>>>>>> ------------------------------
>>>>>> *From:* haosdent [[email protected]]
>>>>>> *Sent:* 23 September 2015 10:12
>>>>>> *To:* [email protected]
>>>>>> *Subject:* Re: Metric for tasks queued/waiting?
>>>>>>
>>>>>> Does /metrics/snapshot not satisfy your requirement?
>>>>>>
>>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <[email protected]> wrote:
>>>>>>
>>>>>>> Hi all,
>>>>>>>
>>>>>>> Is there any way to get a metric of all tasks currently
>>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics 
>>>>>>> seem
>>>>>>> to cover ever other kind of task state? This would be quite useful for
>>>>>>> auto-scaling purposes..
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Aaron
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Best Regards,
>>>>>> Haosdent Huang
>>>>>>
>>>>>
>>>>>
>>>>

Re: Metric for tasks queued/waiting?

Reply via email to