Created a ticket for us to continue the discussion: https://issues.apache.org/jira/browse/MESOS-3507
We can try to capture the explicit use-case from Aaron and maybe create another ticket to track a more-or-less generic path we could go down. Cheers, Niklas On 23 September 2015 at 15:55, Sharma Podila <[email protected]> wrote: > Discussing in a separate place/JIRA ticket sounds good. > Basically, representing contention using a summary of pending resource > requests from each framework could be the hints to mesos master. However, > this gets into intricacies, not the least of which is diversity of resource > requests, qualified by queue depth. > Another way to think of this could be that each framework could trigger a > scale up individually (say, by hitting a mesos master or another > independent service's endpoint to add additional agents/slaves). Even > uncoordinated scale up actions from multiple frameworks should result in > the same end result, modulo reservations/limits/etc. Then, mesos master > needs to deal with only scale down, which it could perform based on offer > rejections from frameworks, implying nobody needs that many agents/slaves. > > Maybe that's more details than needed in this discussion... > > > > On Wed, Sep 23, 2015 at 2:05 PM, Niklas Nielsen <[email protected]> > wrote: > >> I'd love to see this solved in a general way; "How does the framework >> communicate (insert intent, metric, hint, etc) to mesos". >> >> In one way, the 'webui_url' of in the framework info conveys "This is how >> you get to my web ui". As providing a webui was a common pattern for the >> frameworks. >> >> This could be expanded, so the framework can report an 'apiui_url' or >> maybe even more specific "metrics_url" where the mesos master (or other >> frameworks and 3rd party tooling) can get insights into queue depths, >> resource preferences, etc. >> >> We can start discussing this further in a JIRA ticket :) >> >> Niklas >> >> On 23 September 2015 at 13:54, Alex Gaudio <[email protected]> wrote: >> >>> Hi Aaron, >>> >>> You might consider trying to solve the autoscaling problem with Relay, a >>> Python tool I use to solve this problem. Feel free to shoot me an email if >>> you are interested. >>> >>> github.com/sailthru/relay >>> >>> Alex >>> >>> On Wed, Sep 23, 2015, 11:03 AM David Greenberg <[email protected]> >>> wrote: >>> >>>> In addition, this technique could be implemented in the allocator with >>>> an understanding of global demand: >>>> https://www.youtube.com/watch?v=BkBMYUe76oI >>>> >>>> That would allow for tunable fair-sharing based on DRF-principles. >>>> >>>> On Wed, Sep 23, 2015 at 10:59 AM haosdent <[email protected]> wrote: >>>> >>>>> Feel free to open a story in jira if you think you ideas are awesome. >>>>> :-) >>>>> >>>> On Sep 23, 2015 10:54 PM, "Sharma Podila" <[email protected]> wrote: >>>>> >>>>>> Ah, OK, thanks. Yes, Fenzo is a Java library. >>>>>> >>>>>> It might be a nice addition to Mesos master to get a global view of >>>>>> contention for resources. In addition to autoscaling, it would be useful >>>>>> in >>>>>> the allocator. >>>>>> >>>>>> >>>>>> >>>>>> On Wed, Sep 23, 2015 at 7:29 AM, Aaron Carey <[email protected]> wrote: >>>>>> >>>>>>> Thanks Sharma, >>>>>>> >>>>>>> I was in the audience for a talk you did about Fenzo at MesosCon :) >>>>>>> It looked great but we're a python shop primarily so the Java >>>>>>> requirement >>>>>>> would be a problem for us. >>>>>>> >>>>>>> The scaling in the scheduler makes total sense, (obvious when you >>>>>>> think about it!), I was naively hoping for some sort of knowledge of >>>>>>> that >>>>>>> back in the Mesos master as we were hoping to have scaling be >>>>>>> independent >>>>>>> of schedulers. I think this'll need a re-think! >>>>>>> >>>>>>> Thanks for your help! >>>>>>> >>>>>>> Aaron >>>>>>> >>>>>>> ------------------------------ >>>>>>> *From:* Sharma Podila [[email protected]] >>>>>>> *Sent:* 23 September 2015 15:22 >>>>>>> >>>>>>> *To:* [email protected] >>>>>>> *Subject:* Re: Metric for tasks queued/waiting? >>>>>>> >>>>>>> Jobs/tasks wait in framework schedulers, not mesos master. >>>>>>> Autoscaling triggers must come from schedulers, not only because that's >>>>>>> who >>>>>>> knows the pending task set size, but, also because it knows how many of >>>>>>> them need to be launched right away, on what kind of machines. >>>>>>> >>>>>>> We built such an autoscaling capability in our framework schedulers. >>>>>>> The autoscaling is achieved by our library Fenzo >>>>>>> <https://github.com/Netflix/Fenzo> which we open sourced recently. >>>>>>> Also read about Fenzo autoscaling here >>>>>>> <https://github.com/Netflix/Fenzo/wiki/Autoscaling>. You should >>>>>>> look into using that if you are developing your own scheduler. Or, have >>>>>>> your scheduler team pick up Fenzo for autoscaling. >>>>>>> >>>>>>> Also, note that scaling up is temptingly easy by watching the >>>>>>> pending task queue. But, scaling down requires bin packing, etc. Other >>>>>>> issues pop up as well, for example: >>>>>>> >>>>>>> - what if a user submits tasks that cannot be satisfied? Will >>>>>>> autoscale keep increasing the cluster size unbounded? >>>>>>> - what if you would like to have a heterogeneous mix of hosts and >>>>>>> tasks? which kind of hosts do you need to autoscale based on which tasks >>>>>>> are pending? >>>>>>> >>>>>>> These are automatically addressed in Fenzo. >>>>>>> >>>>>>> Sharma >>>>>>> >>>>>>> >>>>>>> On Wed, Sep 23, 2015 at 4:56 AM, Aaron Carey <[email protected]> wrote: >>>>>>> >>>>>>>> No, I basically had the same question as Jim (but maybe didn't word >>>>>>>> it so well ;)) >>>>>>>> >>>>>>>> I'll have a look at your response there :) >>>>>>>> >>>>>>>> ------------------------------ >>>>>>>> *From:* haosdent [[email protected]] >>>>>>>> *Sent:* 23 September 2015 10:12 >>>>>>>> *To:* [email protected] >>>>>>>> *Subject:* Re: Metric for tasks queued/waiting? >>>>>>>> >>>>>>>> Does /metrics/snapshot not satisfy your requirement? >>>>>>>> >>>>>>>> On Wed, Sep 23, 2015 at 4:50 PM, Aaron Carey <[email protected]> >>>>>>>> wrote: >>>>>>>> >>>>>>>>> Hi all, >>>>>>>>> >>>>>>>>> Is there any way to get a metric of all tasks currently >>>>>>>>> waiting/queued in Mesos (across all schedulers)? The snapshot metrics >>>>>>>>> seem >>>>>>>>> to cover ever other kind of task state? This would be quite useful for >>>>>>>>> auto-scaling purposes.. >>>>>>>>> >>>>>>>>> Thanks, >>>>>>>>> Aaron >>>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> Best Regards, >>>>>>>> Haosdent Huang >>>>>>>> >>>>>>> >>>>>>> >>>>>> >> >

