Hi All, Only 9 have responded so far to the survey. Your response is really important to understand communities preference.
Thanks in Advance, Dhilip On Mon, Aug 1, 2016 at 4:37 PM, DhilipKumar Sankaranarayanan < s.dhilipku...@gmail.com> wrote: > Hi All, > > Sorry for the long gap. We had an interesting discussion last week at > Mesosphere HQ again on this topic before the Mesos SF Meetup. > > The discussion revolved around several areas and suggestions on the > proposed design. > > One of the main item that popped up was the approach through which we > should achieve Mesos Federation. The intend was to take the approach that > will be more sensible for the community and easy to adopt by most. > > *Approach 1:* (Peer to Peer with a separate policy Engine) Already > Proposed Design > *Approach 2:* (Hierarchical Design) Design similar to Kubernetes > Federation where we introduce a Federation Layer in-between Framework and > the Masters. > > Both the designs have their unique advantages and dis-advantages. So here > is the survey link please provide your feedback, this should set the ball > rolling for us. > > https://goo.gl/forms/DpVRV9Zh3kunhJkP2 > > If you have third approach to be include please write to me, ill be happy > to add that in the survey > > Regardless of the design chosen, following enhancement to the master will > be helpful to reduce "offers" traffic across continents. > > Enhancement: A framework will be able to send RequestResource( constrains) > to the master, the master then only sends those offers that match the > constrain. > > Regards, > Dhilip > > > > > On Fri, Jul 15, 2016 at 3:46 PM, DhilipKumar Sankaranarayanan < > s.dhilipku...@gmail.com> wrote: > >> Hi All, >> >> I got a chance to bring this up during yesterdays Community Sync. It was >> great discussing with you all. >> >> As a general feedback the role of policy engine in the design needs to be >> clearer, i will update the Document with more information on PE very soon. >> >> We are yet to get more insight on the License issues like bringing in a >> Mozzilla 2.0 library into an Apache 2.0 project. >> >> It will be fantastic to get more thoughts on this from the community so >> please share if you or your organisation had thought about it. >> >> HI Alex, >> >> Thanks again. >> >> a) Yes you are correct, thats exactly what we thought, a Framework could >> simply query and learn about its next step (bursting or load balancing). >> b) We are currently thinking that the Framework will run in only one >> place and should be able to connect to other datacenters. Each data >> centres could have some Frameworks running the local and some part of a >> federation. >> >> Regards, >> Dhilip >> >> >> On Thu, Jul 14, 2016 at 9:17 AM, Alexander Gallego <agall...@concord.io> >> wrote: >> >>> >>> >>> On Thu, Jul 14, 2016 at 2:40 AM, DhilipKumar Sankaranarayanan < >>> s.dhilipku...@gmail.com> wrote: >>> >>>> HI Alex, >>>> >>>> Thanks for taking a look. We have simplified the design since the >>>> conference. The Allocation and Anonymous modules where only helping us to >>>> control the offers sent to the frameworks. Now we think that Roles and >>>> Quota in Moses elegantly solve this problem and we could take advantage of >>>> it. >>>> >>> >>> Sounds good, given that the design is entirely different now, can you >>> share some of these thoughts. >>> >>> >>>> >>>> The current design does not propose Mesos Modules, the POC we >>>> demonstrated @ the mesoscon is slightly out of date in that respect. >>>> >>>> The current design only enforces that any Policy Engine implementation >>>> should honour certain REST apis. This also removes Consul out of the >>>> picture, but at Huawei our implementation would pretty much consider Consul >>>> or something similar. >>>> >>>> 1) Failure semantics >>>> I do agree it is not straight forward to declare that a DC is lost just >>>> because framework lost the connection intermittently. Probing the >>>> 'Gossiper' we would know that the DC is still active but not just reachable >>>> to us, In that case its worth the wait. If the DC in question is not >>>> reachable from everyother DC, only then we could come to such conclusion. >>>> >>>> >>> >>> how do you envision frameworks integrating w/ this. Are you saying that >>> frameworks should poll the HTTP endpoint of the Gossiper? >>> >>> >>> >>>> 2) Can you share more details about the allocator modules. >>>> As mentioned earlier these modules are no longer relevant we have much >>>> simpler way to achieve this. >>>> >>>> 3) High Availability >>>> I think you are talking about the below section? >>>> "Sequence Diagram for High Availability >>>> >>>> (Incase of local datacenter failure) >>>> Very Similar to cloud bursting use-case scenario. " >>>> The sequence diagram only represents flow of events in case if the >>>> current datacenter fails and the framework needs to connect to a new one. >>>> It is not talking about the approach you mentioned. I will update doc >>>> couple more diagrams soon to make it more understandable. We would >>>> certainly like to have a federated K/V storage layer across the DCs which >>>> is why Consul was considered in the first place. >>>> >>>> >>> Does this mean that you have to run the actual framework code in all of >>> the DC's ? or you have yet to iron this out? >>> >>> >>> >>> >>>> 4) Metrics / Monitoring - probably down the line >>>> The experimental version of gossiper already queries the maser at a >>>> frequent interval and exchange it amongst them. >>>> >>>> Ultimately DC federation is a hard problem to solve. We have plenty of >>>> use cases which is why we wanted to reach out to the community, share our >>>> experience and build something that is useful for all of us. >>>> >>>> >>> Thanks !! excited about this work. >>> >>> >>>> Regards, >>>> Dhilip >>>> >>>> >>>> On Wed, Jul 13, 2016 at 7:58 PM, Alexander Gallego <agall...@concord.io >>>> > wrote: >>>> >>>>> This is very cool work, i had a chat w/ another company thinking about >>>>> doing the exact same thing. >>>>> >>>>> I think the proposal is missing several details that make it hard to >>>>> evaluate on paper (also saw your presentation). >>>>> >>>>> >>>>> 1) Failure semantics, seem to be the same from the proposed design. >>>>> >>>>> >>>>> As a framework author, how do you suggest you deal w/ tasks on >>>>> multiple clusters, i.e.: i feel like there have to be richer semantics >>>>> about the task at least on the mesos.proto level where the state is >>>>> STATUS_FAILED_DC_OUTAGE or smth along those lines. >>>>> >>>>> We respawn operators and having this information may allow me as a >>>>> framework author to wait a little longer before trying to declare that >>>>> task >>>>> as dead (KILLED/FAILED/LOST) if I spawn it on a different data center. >>>>> >>>>> Would love to get details on how you were thinking of extending the >>>>> failure semantics for multi datacenters. >>>>> >>>>> >>>>> 2) Can you share more details about the allocator modules. >>>>> >>>>> >>>>> After reading the proposal, I anderstand it as follows. >>>>> >>>>> >>>>> [ gossiper ] -> [ allocator module ] -> [mesos master] >>>>> >>>>> >>>>> Is this correct ? if so, you are saying that you can tell the mesos >>>>> master to run a task that was fulfilled by a framework on a different >>>>> data >>>>> center? >>>>> >>>>> Is the constraint that you are forced to run a scheduler per framework >>>>> on each data center? >>>>> >>>>> >>>>> >>>>> 3) High availability >>>>> >>>>> >>>>> High availability on a multi dc layout means something entirely >>>>> different. So are all frameworks now on standby on every other cluster? >>>>> the >>>>> problem i see with this is that the metadata stored by each framework to >>>>> support HA now has to spans multiple DC's. It would be nice to perhaps at >>>>> the mesos level extend/expose an API for setting state. >>>>> >>>>> a) On the normal mesos layout, this key=value data store would be >>>>> zookeeper. >>>>> >>>>> b) On the multi dc layout it could be zookeeper per data center but >>>>> then one can piggy back on the gossiper to replicate that state in the >>>>> other data centers. >>>>> >>>>> >>>>> 4) Metrics / Monitoring - probably down the line, but would be good to >>>>> also piggy back some of the mesos master endpoints >>>>> through the gossip architecture. >>>>> >>>>> >>>>> >>>>> Again very cool work, would love to get some more details on the >>>>> actual implementation that you built plus some of the points above. >>>>> >>>>> - Alex >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> On Wed, Jul 13, 2016 at 6:11 PM, DhilipKumar Sankaranarayanan < >>>>> s.dhilipku...@gmail.com> wrote: >>>>> >>>>>> Hi All, >>>>>> >>>>>> Please find the initial version of the Design Document >>>>>> <https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing> >>>>>> for Federating Mesos Clusters. >>>>>> >>>>>> >>>>>> https://docs.google.com/document/d/1U4IY_ObAXUPhtTa-0Rw_5zQxHDRnJFe5uFNOQ0VUcLg/edit?usp=sharing >>>>>> >>>>>> We at Huawei had been working on this federation project for the past >>>>>> few months. We also got an opportunity to present this in recent >>>>>> MesosCon >>>>>> 2016. From further discussions and feedback we have received so far, we >>>>>> have greatly simplified the design. >>>>>> >>>>>> Also I see that no one assigned to this JIRA now could i get that >>>>>> assigned to myself ? It would be great to know if there is anyone willing >>>>>> to shepherd this too. >>>>>> >>>>>> I would also like to bring this up in the community Sync that happens >>>>>> tomorrow. >>>>>> >>>>>> We would love to hear your thoughts. We will be glad to see >>>>>> collaborate with you in the implementation. >>>>>> >>>>>> Regards, >>>>>> Dhilip >>>>>> >>>>>> >>>>>> Reference: >>>>>> JIRA: https://issues.apache.org/jira/browse/MESOS-3548 >>>>>> Slides: >>>>>> http://www.slideshare.net/mKrishnaKumar1/federated-mesos-clusters-for-global-data-center-designs >>>>>> Video : >>>>>> https://www.youtube.com/watch?v=kqyVQzwwD5E&index=17&list=PLGeM09tlguZQVL7ZsfNMffX9h1rGNVqnC >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> >>>>> >>>>> >>>>> >>>>> Alexander Gallego >>>>> Co-Founder & CTO >>>>> >>>> >>>> >>> >>> >>> -- >>> >>> >>> >>> >>> >>> Alexander Gallego >>> Co-Founder & CTO >>> >> >> >