Re: [BULK]Re: New PMC Chair
Congratulations! -- Grégoire From: Andreas Peters Sent: Thursday, April 29, 2021 6:36 PM To: d...@mesos.apache.org; Vinod Kone; user Subject: [BULK]Re: New PMC Chair Great to hear. :-) Am 29.04.21 um 16:35 schrieb Vinod Kone: > Hi community, > > Just wanted to let you all know that the board passed the resolution to > elect a new PMC chair! > > Hearty congratulations to *Qian Zhang* for becoming the new Apache Mesos > PMC chair and VP of the project. > > Thanks, >
Re: [BULK]Re: Call for new committers
Hello, still interested as well -- Grégoire From: Charles-François Natali Sent: Monday, March 15, 2021 9:53 AM To: user Subject: [BULK]Re: Call for new committers Hi, I'm still interested. Would it be possible to get a rough roadmap of the proposed course of action? So far there's been several email threads asking for feature requests, contributors etc, but AFAICT no feedback, so it's hard to know exactly what's going on, and I imagine it would make it easier for people to step up if there was a clear direction. Cheers, On Sun, 14 Mar 2021, 13:05 Stéphane Cottin, mailto:stephane.cot...@vixns.com>> wrote: Hi, I will contribute to mesos and would like to become a Mesos committer. Stéphane On 14 Mar 2021, at 12:59, Qian Zhang wrote: > Hi folks, > > Please reply to this mail if you plan to actively contribute to Mesos and > want to become a new Mesos committer, thanks! > > > Regards, > Qian Zhang
Re: [BULK]Call for active contributors
Already answered to the other thread. I'm in. -- Grégoire From: Qian Zhang Sent: Thursday, March 4, 2021 3:38 PM To: mesos ; user Subject: [BULK]Call for active contributors Hi folks, Please reply to this mail if you plan to actively contribute to Mesos and want to become a committer and PMC member in future. Regards, Qian Zhang
Re: Next Steps
Hello all, here at Criteo, we heavily use Mesos and plan to do so for a foreseeable future alongside other alternatives. I am ok to become committer and help the project if you are looking for contributors. It seems finding committers will be doable but finding a PMC chair will be difficult. To give some context on our usage, Criteo is running 12 Mesos cluster running a light fork of Mesos 1.9.x. Each cluster has 10+ distinct marathons frameworks, a flink framework, an instance of Aurora and an in-house framework. We strongly appreciate the ability to scale the number of nodes (3500 on the largest cluster and growing), the simplicity of the project overall and the extensibility through modules. -- Grégoire
Re: [BULK]Re: cgroup CPUSET for mesos agent
Hello, I'd like to give you a return of experience because we've worked on this last year. We've used CFS bandwidth isolation for several years and encountered many issues (lack of predictability, bugs present in old linux kernels and lack of cache/memory locality). At some point, we've implemented a custom isolator to manage cpusets (using https://github.com/criteo/mesos-command-modules/ as a base to write an isolator in a scripting language). The isolator had a very simple behavior: upon new task, look at which cpus are not within a cpuset cgroup, select (if possible) cpus from the same numa node and create cpuset cgroup for the starting task. In practice, it provided a general decrease of cpu consumption (up to 8% of some cpu intensive applications) and better ability to reason about the cpu isolation model. The allocation is optimistic: it tries to use cpus from the same numa node but if it's not possible, task is spread accross nodes. In practice it happens very rarely because of one small optimization to assign cpus from the most loaded numa node (decreasing fragmentation of available cpus accross numa nodes). I'd be glad to give more details if you are interested -- Grégoire
Re: [BULK]Task Pinning
Hello, we had the same concern a few months ago when trying to address issues encountered with cfs bandwidth mechanism. Eventually, we ended up implementing our own module to use cpuset cgroups for our task using our generic isolator mechanism (https://github.com/criteo/mesos-command-modules/). Our observation is that using cpusets instead of cfs bandwidth reduced cpu consumption from 4-10% without any loss in performance. We are also currently working on leveraging numa topology (in a very simple manner) to better allocate cpus to tasks. Would be happy to discuss with other users having the same kind of challenges! -- Grégoire From: Abel Souza Sent: Monday, October 21, 2019 4:41 PM To: user Subject: [BULK]Task Pinning Hi, Does anyone know if pinning capabilities will ever be available to Mesos? Someone registered an issue at Jira (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FMESOS-5342&data=02%7C01%7Cg.seux%40criteo.com%7Ca1f9cfe9118f400d6d6f08d75634ccbd%7C2a35d8fd574d48e3927c8c398e225a01%7C1%7C1%7C637072657076867754&sdata=SqTxAjWI3Ea%2BDdydhQEEAGWmHrO5bXn3kPg%2Bpconk9c%3D&reserved=0), started an implementation (https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fct-clmsn%2Fmesos-cpusets&data=02%7C01%7Cg.seux%40criteo.com%7Ca1f9cfe9118f400d6d6f08d75634ccbd%7C2a35d8fd574d48e3927c8c398e225a01%7C1%7C1%7C637072657076867754&sdata=Xy0utL9uI%2FOA5r%2B5gB6Q%2BWKjpwnbbozZA%2FVorIFU98w%3D&reserved=0), but apparently it never went through mainline. I successfully compiled it in my testbed and loaded it into the Mesos master agent, but it keeps crashing the master during the submission process. So before moving on into potential fixes to these crashes, I would like to know if someone knows about possible updates to this specific capability in future Mesos releases. Thank you, /Abel
Re: [BULK]Re: large task scheduling on multi-framework cluster
Hello Benjamin, > Note that with the newest marathon that is capable of handling multiple > roles, you would not need to run a dedicated marathon instance. True it is not strictly necessary. We use this as an easy way to deal with various needs: - quota on some roles (a multi role marathon could address this) - easy authorization configuration (otherwise we would need to configure authorizations to only allow specific users to use some roles) - general resiliency: if one marathon has a random bug and start deleting all its apps, at least the others are unlikely to do this at the same time! - performance: each marathon handle less tasks and less healthchecks But it is not really the topic of my question, I gave this as a context precision. Anyone encounting the same issue when scheduling large tasks? -- Grégoire
large task scheduling on multi-framework cluster
Hello, I'm wondering how other mesos users deal with scheduling of large tasks (using all resources offered by most agents). On our cluster, we have various application launched mainly by marathon. Some of those applications have large instances (30 cpus) which use all resources from agents (most of our agents expose 30 cpus to mesos). Beyond these large applications (many instances, many resource per instance) we have a lot more applications whose instances are of various size (from 1 to 10 cpus). Our issue lies with scheduling, since marathon uses offers from mesos as they come and it creates fragmentation: most agents have small tasks running which prevents big tasks to be scheduled. In an ideal world, mesos (or marathon) would make sure some apps (let's say frameworks if mesos takes that responsibility) have guarantees on large offers. We also have non-marathon in-house frameworks which have similar needs to launch large tasks. Our current solution is to: * use a dedicated marathon instance (and a dedicated role) for those big applications * dedicate agents to this role Of course, this require extra work since our mesos clusters are now sharded (it creates additional toil in term of maintenance & capacity planning). Our thinking is that mesos allocator might be improved to distribute offers with a better heuristic than currently (offers are randomly sorted). A bit similar to what was suggested on http://mail-archives.apache.org/mod_mbox/mesos-user/201906.mbox/%3cCAHReGaiY0nJ0AevMvKbxAZsy2Xc=jmtszcucdxryzbvwkvv...@mail.gmail.com%3e, we could imagine to sort offers (offers from most used slaves first). So I'm curious on how other users handle this kind of needs! Regards, -- Grégoire Seux