Discussing in a separate place/JIRA ticket sounds good. Basically, representing contention using a summary of pending resource requests from each framework could be the hints to mesos master. However, this gets into intricacies, not the least of which is diversity of resource requests, qualified by queue depth. Another way to think of this could be that each framework could trigger a scale up individually (say, by hitting a mesos master or another independent service's endpoint to add additional agents/slaves). Even uncoordinated scale up actions from multiple frameworks should result in the same end result, modulo reservations/limits/etc. Then, mesos master needs to deal with only scale down, which it could perform based on offer rejections from frameworks, implying nobody needs that many agents/slaves.
Maybe that's more details than needed in this discussion... On Wed, Sep 23, 2015 at 2:05 PM, Niklas Nielsen <[email protected]> wrote: > I'd love to see this solved in a general way; "How does the framework > communicate (insert intent, metric, hint, etc) to mesos". > > In one way, the 'webui_url' of in the framework info conveys "This is how > you get to my web ui". As providing a webui was a common pattern for the > frameworks. > > This could be expanded, so the framework can report an 'apiui_url' or > maybe even more specific "metrics_url" where the mesos master (or other > frameworks and 3rd party tooling) can get insights into queue depths, > resource preferences, etc. > > We can start discussing this further in a JIRA ticket :) > > Niklas > > On 23 September 2015 at 13:54, Alex Gaudio <[email protected]> wrote: > >> Hi Aaron, >> >> You might consider trying to solve the autoscaling problem with Relay, a >> Python tool I use to solve this problem. Feel free to shoot me an email if >> you are interested. >> >> github.com/sailthru/relay >> >> Alex >> >> On Wed, Sep 23, 2015, 11:03 AM David Greenberg <[email protected]> >> wrote: >> >>> In addition, this technique could be implemented in the allocator with >>> an understanding of global demand: >>> https://www.youtube.com/watch?v=BkBMYUe76oI >>> >>> That would allow for tunable fair-sharing based on DRF-principles. >>> >>> On Wed, Sep 23, 2015 at 10:59 AM haosdent <[email protected]> wrote: >>> >>>> Feel free to open a story in jira if you think you ideas are awesome. >>>> :-) >>>> >>> On Sep 23, 2015 10:54 PM, "Sharma Podila" <[email protected]> wrote: >>>> >>>>> Ah, OK, thanks. Yes, Fenzo is a Java library. >>>>> >>>>> It might be a nice addition to Mesos master to get a global view of >>>>> contention for resources. In addition to autoscaling, it would be useful >>>>> in >>>>> the allocator. >>>>> >>>>> >>>>> >>>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <[email protected]> wrote: >>>>> >>>>>> Thanks Sharma, >>>>>> >>>>>> I was in the audience for a talk you did about Fenzo at MesosCon :) >>>>>> It looked great but we're a python shop primarily so the Java requirement >>>>>> would be a problem for us. >>>>>> >>>>>> The scaling in the scheduler makes total sense, (obvious when you >>>>>> think about it!), I was naively hoping for some sort of knowledge of that >>>>>> back in the Mesos master as we were hoping to have scaling be independent >>>>>> of schedulers. I think this'll need a re-think! >>>>>> >>>>>> Thanks for your help! >>>>>> >>>>>> Aaron >>>>>> >>>>>> ------------------------------ >>>>>> *From:* Sharma Podila [[email protected]] >>>>>> *Sent:* 23 September 2015 15:22 >>>>>> >>>>>> *To:* [email protected] >>>>>> *Subject:* Re: Metric for tasks queued/waiting? >>>>>> >>>>>> Jobs/tasks wait in framework schedulers, not mesos master. >>>>>> Autoscaling triggers must come from schedulers, not only because that's >>>>>> who >>>>>> knows the pending task set size, but, also because it knows how many of >>>>>> them need to be launched right away, on what kind of machines. >>>>>> >>>>>> We built such an autoscaling capability in our framework schedulers. >>>>>> The autoscaling is achieved by our library Fenzo >>>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently. >>>>>> Also read about Fenzo autoscaling here >>>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should look >>>>>> into using that if you are developing your own scheduler. Or, have your >>>>>> scheduler team pick up Fenzo for autoscaling. >>>>>> >>>>>> Also, note that scaling up is temptingly easy by watching the pending >>>>>> task queue. But, scaling down requires bin packing, etc. Other issues pop >>>>>> up as well, for example: >>>>>> >>>>>> - what if a user submits tasks that cannot be satisfied? Will >>>>>> autoscale keep increasing the cluster size unbounded? >>>>>> - what if you would like to have a heterogeneous mix of hosts and >>>>>> tasks? which kind of hosts do you need to autoscale based on which tasks >>>>>> are pending? >>>>>> >>>>>> These are automatically addressed in Fenzo. >>>>>> >>>>>> Sharma >>>>>> >>>>>> >>>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <[email protected]> wrote: >>>>>> >>>>>>> No, I basically had the same question as Jim (but maybe didn't word >>>>>>> it so well ;)) >>>>>>> >>>>>>> I'll have a look at your response there :) >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* haosdent [[email protected]] >>>>>>> *Sent:* 23 September 2015 10:12 >>>>>>> *To:* [email protected] >>>>>>> *Subject:* Re: Metric for tasks queued/waiting? >>>>>>> >>>>>>> Does /metrics/snapshot not satisfy your requirement? >>>>>>> >>>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <[email protected]> wrote: >>>>>>> >>>>>>>> Hi all, >>>>>>>> >>>>>>>> Is there any way to get a metric of all tasks currently >>>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics >>>>>>>> seem >>>>>>>> to cover ever other kind of task state? This would be quite useful for >>>>>>>> auto-scaling purposes.. >>>>>>>> >>>>>>>> Thanks, >>>>>>>> Aaron >>>>>>>> >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> Haosdent Huang >>>>>>> >>>>>> >>>>>> >>>>> >

