Re: [BULK]Re: New PMC Chair

2021-04-29 Thread Grégoire Seux
Congratulations!

-- ​
Grégoire


From: Andreas Peters
Sent: Thursday, April 29, 2021 6:36 PM
To: d...@mesos.apache.org; Vinod Kone; user
Subject: [BULK]Re: New PMC Chair

Great to hear. :-)

Am 29.04.21 um 16:35 schrieb Vinod Kone:
> Hi community,
>
> Just wanted to let you all know that the board passed the resolution to
> elect a new PMC chair!
>
> Hearty congratulations to *Qian Zhang* for becoming the new Apache Mesos
> PMC chair and VP of the project.
>
> Thanks,
>



Re: [BULK]Re: Call for new committers

2021-03-15 Thread Grégoire Seux
Hello,

still interested as well

-- ​
Grégoire

From: Charles-François Natali 
Sent: Monday, March 15, 2021 9:53 AM
To: user 
Subject: [BULK]Re: Call for new committers

Hi,

I'm still interested.

Would it be possible to get a rough roadmap of the proposed course of action?

So far there's been several email threads asking for feature requests, 
contributors etc, but AFAICT no feedback, so it's hard to know exactly what's 
going on, and I imagine it would make it easier for people to step up if there 
was a clear direction.

Cheers,



On Sun, 14 Mar 2021, 13:05 Stéphane Cottin, 
mailto:stephane.cot...@vixns.com>> wrote:
Hi,

I will contribute to mesos and would like to become a Mesos committer.

Stéphane

On 14 Mar 2021, at 12:59, Qian Zhang wrote:

> Hi folks,
>
> Please reply to this mail if you plan to actively contribute to Mesos and
> want to become a new Mesos committer, thanks!
>
>
> Regards,
> Qian Zhang


Re: [BULK]Call for active contributors

2021-03-04 Thread Grégoire Seux
Already answered to the other thread. I'm in.

-- ​
Grégoire

From: Qian Zhang 
Sent: Thursday, March 4, 2021 3:38 PM
To: mesos ; user 
Subject: [BULK]Call for active contributors

Hi folks,

Please reply to this mail if you plan to actively contribute to Mesos and want 
to become a committer and PMC member in future.


Regards,
Qian Zhang


Re: Next Steps

2021-02-26 Thread Grégoire Seux
Hello all,

here at Criteo, we heavily use Mesos and plan to do so for a foreseeable future 
alongside other alternatives.
I am ok to become committer and help the project if you are looking for 
contributors.
It seems finding committers will be doable but finding a PMC chair will be 
difficult.

To give some context on our usage, Criteo is running 12 Mesos cluster running a 
light fork of Mesos 1.9.x.
Each cluster has 10+ distinct marathons frameworks, a flink framework, an 
instance of Aurora and an in-house framework.
We strongly appreciate the ability to scale the number of nodes (3500 on the 
largest cluster and growing), the simplicity of the project overall and the 
extensibility through modules.

--
Grégoire


Re: [BULK]Re: cgroup CPUSET for mesos agent

2020-07-07 Thread Grégoire Seux
Hello,

I'd like to give you a return of experience because we've worked on this last 
year.
We've used CFS bandwidth isolation for several years and encountered many 
issues (lack of predictability, bugs present in old linux kernels and lack of 
cache/memory locality). At some point, we've implemented a custom isolator to 
manage cpusets (using https://github.com/criteo/mesos-command-modules/ as a 
base to write an isolator in a scripting language).

The isolator had a very simple behavior: upon new task, look at which cpus are 
not within a cpuset cgroup, select (if possible) cpus from the same numa node 
and create cpuset cgroup for the starting task.
In practice, it provided a general decrease of cpu consumption (up to 8% of 
some cpu intensive applications) and better ability to reason about the cpu 
isolation model.
The allocation is optimistic: it tries to use cpus from the same numa node but 
if it's not possible, task is spread accross nodes. In practice it happens very 
rarely because of one small optimization to assign cpus from the most loaded 
numa node (decreasing fragmentation of available cpus accross numa nodes).

I'd be glad to give more details if you are interested

--
Grégoire


Re: [BULK]Task Pinning

2019-10-21 Thread Grégoire Seux
Hello,

we had the same concern a few months ago when trying to address issues 
encountered with cfs bandwidth mechanism.
Eventually, we ended up implementing our own module to use cpuset cgroups for 
our task using our generic isolator mechanism 
(https://github.com/criteo/mesos-command-modules/).
Our observation is that using cpusets instead of cfs bandwidth reduced cpu 
consumption from 4-10% without any loss in performance.
We are also currently working on leveraging numa topology (in a very simple 
manner) to better allocate cpus to tasks.

Would be happy to discuss with other users having the same kind of challenges!

-- ​
Grégoire

From: Abel Souza 
Sent: Monday, October 21, 2019 4:41 PM
To: user 
Subject: [BULK]Task Pinning

Hi,

Does anyone know if pinning capabilities will ever be available to Mesos?

Someone registered an issue at Jira
(https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fissues.apache.org%2Fjira%2Fbrowse%2FMESOS-5342&data=02%7C01%7Cg.seux%40criteo.com%7Ca1f9cfe9118f400d6d6f08d75634ccbd%7C2a35d8fd574d48e3927c8c398e225a01%7C1%7C1%7C637072657076867754&sdata=SqTxAjWI3Ea%2BDdydhQEEAGWmHrO5bXn3kPg%2Bpconk9c%3D&reserved=0),
 started an
implementation 
(https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgithub.com%2Fct-clmsn%2Fmesos-cpusets&data=02%7C01%7Cg.seux%40criteo.com%7Ca1f9cfe9118f400d6d6f08d75634ccbd%7C2a35d8fd574d48e3927c8c398e225a01%7C1%7C1%7C637072657076867754&sdata=Xy0utL9uI%2FOA5r%2B5gB6Q%2BWKjpwnbbozZA%2FVorIFU98w%3D&reserved=0),
 but
apparently it never went through mainline. I successfully compiled it in
my testbed and loaded it into the Mesos master agent, but it keeps
crashing the master during the submission process.

So before moving on into potential fixes to these crashes, I would like
to know if someone knows about possible updates to this specific
capability in future Mesos releases.

Thank you,

/Abel



Re: [BULK]Re: large task scheduling on multi-framework cluster

2019-10-07 Thread Grégoire Seux
Hello  Benjamin,

> Note that with the newest marathon that is capable of handling multiple 
> roles, you would not need to run a dedicated marathon instance.
True it is not strictly necessary. We use this as an easy way to deal with 
various needs:
- quota on some roles (a multi role marathon could address this)
- easy authorization configuration (otherwise we would need to configure 
authorizations to only allow specific users to use some roles)
- general resiliency: if one marathon has a random bug and start deleting all 
its apps, at least the others are unlikely to do this at the same time!
- performance: each marathon handle less tasks and less healthchecks

But it is not really the topic of my question, I gave this as a context 
precision.

Anyone encounting the same issue when scheduling large tasks?

-- 
Grégoire





large task scheduling on multi-framework cluster

2019-10-01 Thread Grégoire Seux
Hello,

I'm wondering how other mesos users deal with scheduling of large tasks (using 
all resources offered by most agents).

On our cluster, we have various application launched mainly by marathon. Some 
of those applications have large instances (30 cpus) which use all resources 
from agents (most of our agents expose 30 cpus to mesos). Beyond these large 
applications (many instances, many resource per instance) we have a lot more 
applications whose instances are of various size (from 1 to 10 cpus).

Our issue lies with scheduling, since marathon uses offers from mesos as they 
come and it creates fragmentation: most agents have small tasks running which 
prevents big tasks to be scheduled. In an ideal world, mesos (or marathon) 
would make sure some apps (let's say frameworks if mesos takes that 
responsibility) have guarantees on large offers. We also have non-marathon 
in-house frameworks which have similar needs to launch large tasks.

Our current solution is to:

  *   use a dedicated marathon instance (and a dedicated role) for those big 
applications
  *   dedicate agents to this role

Of course, this require extra work since our mesos clusters are now sharded (it 
creates additional toil in term of maintenance & capacity planning).
Our thinking is that mesos allocator might be improved to distribute offers 
with a better heuristic than currently (offers are randomly sorted). A bit 
similar to what was suggested on 
http://mail-archives.apache.org/mod_mbox/mesos-user/201906.mbox/%3cCAHReGaiY0nJ0AevMvKbxAZsy2Xc=jmtszcucdxryzbvwkvv...@mail.gmail.com%3e,
 we could imagine to sort offers (offers from most used slaves first).

So I'm curious on how other users handle this kind of needs!

Regards,

-- ​
Grégoire Seux