Re: Can you help provide a future for Apache Mesos?

2023-11-13 Thread Charles-François Natali
Still using Mesos - it's stable, boring - in a good way - and great for our
specific use case.

I'm a committer, happy to continue reviewing and merging small changes or
address security issues.

Cheers,





On Mon, Nov 13, 2023, 13:34 Andreas Peters  wrote:

> I'm still with Mesos and can do what I done the last years. Keep an eye at
> issues and give support via Slack and Jira.
>
>
> Am 13.11.23 um 13:51 schrieb Shane Curcuru:
>
> As a mature and successful project, Mesos hasn't seen much new development
> in the past couple of years.  The question now for everyone on these lists
> is:
>
> - Is Mesos still a maintained project, where even if no new features are
> developed, there's at least a group to respond to security issues and make
> new releases?  Or is it time to 'deprecate' Mesos, and move the project as
> a whole to the Apache Attic? [1]
>
> It feels like there are still plenty of users who rely on Mesos; what we
> need now is for enough people here to step up and volunteer to stick around
> and be available to fix security issues in the future.
>
> Thanks to Qian for raising this question in March [2], where several
> people did speak up.  I'd like to clarify what the ASF board's requirements
> are for an 'active' Apache project.
>
> We don't actually need people doing active development on a project.
> What's really needed is at least three PMC members who are monitoring the
> project's lists and issues, and who could be available in the future *if* a
> serious security issue or other major bug were discovered.
>
> So we're not looking for people with time to do active development - just
> enough reliable volunteers who could monitor for major issues that are
> reported, and make a new release if security fixes are needed.  Does that
> help, and does that make sense?
>
> We will also be running a Roll Call of the PMC [3] now, so the board can
> understand how many PMC members (who have access to security issue details,
> for example) could still stick around to monitor lists.  Along with that
> roll call, we'll also be reminding the existing PMC that they can vote in
> any existing committers who will also step up and volunteer.
>
> * What can you do?
>
> If you are still using Mesos, have enough time to check the mailing
> lists/issue tracker periodically for any security or giant breaking bugs
> (i.e. not small bugs), and might be able to help someday with a fix or
> making a new release of Mesos, then speak up now!  Be sure to say what
> specific kinds of tasks you might be able to take on if they arise.
>
> Remember: we don't need active development, just some folks making sure
> any security bugs are addressed in the future (if they come up).
>
>


Re: Seeking contributions to fix Mesos website build

2022-06-27 Thread Charles-François Natali
The Jenkins configuration has been fixed, it now only fails on the
RubyGems errors:
https://ci-builds.apache.org/job/Mesos/job/Mesos-Websitebot/1307/console

Cheers,


On Fri, 24 Jun 2022 at 06:07, Andreas Peters  wrote:
>
> I will have a look over weekend. :-)
>
>
> On 24.06.22 00:38, Charles-François Natali wrote:
> > I changed the configuration to clone from GitHub because cloning from ASF
> > gitbox would randomly fail with a network error.
> >
> > I cannot reconcile what you say with the extracts below though, it looks
> > like cloning cannot find the revision from the ASF gitbox, not from GitHub,
> > are they mixed somehow?
> >
> > Looking at the below the github url looks wrong, it should be
> > mesos-site.git not mesos.git, I'll fix.
> >
> > However the original issue regarding the dependencies update might be
> > different altogether, might be good if someone - maybe Andreas? - could
> > have a look.
> >
> >
> > Cheers,
> >
> >
> >
> > On Thu, Jun 23, 2022, 23:36 Tomek Janiszewski  wrote:
> >
> >> It looks like job configuration has changed and it's using GitHub repo now
> >> and could not find revision so build does not even start.
> >>
> >> Now:
> >>> Cloning repository https://gitbox.apache.org/repos/asf/mesos-site.git
> >>> Avoid second fetch
> >>   > git rev-parse origin/asf-site^{commit} # timeout=10
> >>   > git rev-parse asf-site^{commit} # timeout=10
> >> ERROR: Couldn't find any revision to build. Verify the repository and
> >> branch configuration for this job.
> >> Finished: FAILURE
> >>
> >> Before:
> >>
> >>> Cloning repository https://github.com/apache/mesos.git
> >>> Avoid second fetch
> >>   > git rev-parse origin/asf-site^{commit} # timeout=10
> >> Checking out Revision f14e3b07149b7213aa3552c50ff059155b33effd
> >> (origin/asf-site)
> >>   > git config core.sparsecheckout # timeout=10
> >>   > git checkout -f f14e3b07149b7213aa3552c50ff059155b33effd # timeout=10
> >> Commit message: "Updated
> >>
> >> czw., 23 cze 2022, 23:16 użytkownik Dave Lester 
> >> napisał:
> >>
> >>> Hi all,
> >>>
> >>> The build for the Mesos website is currently broken. It looks like
> >> several
> >>> dependency updates via dependabot were merged into the codebase on May
> >> 22nd
> >>> causing the issue along the way.
> >>>
> >>> The last successful website build
> >>> https://ci-builds.apache.org/job/Mesos/job/Mesos-Websitebot/1268/ is
> >>> based on the revision cd71826, which was committed May 4th.
> >>>
> >>> Can the two dependabot changes to “/site” be rolled back (9c7ccc2 and
> >>> 9019e3a) or better yet is there a community member with Middleman / Ruby
> >>> debugging experience that can help fix the build?
> >>>
> >>> Thanks,
> >>> Dave
> >>>
> >>
> >
>
> --
> AVENTER UG (haftungsbeschraenkt)
> Köllner Chaussee 144
> 25337 Kölln-Reisiek
>
> Phone: (+49)4121 - 235582 0
> E-Mail: a...@aventer.biz
> Internet: https://www.aventer.biz
> Git: https://www.github.com/AVENTER-UG
>
> Geschäftsführer/CEO: Andreas Peters
> Unternehmenssitz/City: Kölln-Reisiek
> Handelsregister beim Amtsgericht/Legal Court: Itzehoe
> Handelsregister-Nummer/Company Registration Number: HRB 9995 PI


Re: Seeking contributions to fix Mesos website build

2022-06-23 Thread Charles-François Natali
I changed the configuration to clone from GitHub because cloning from ASF
gitbox would randomly fail with a network error.

I cannot reconcile what you say with the extracts below though, it looks
like cloning cannot find the revision from the ASF gitbox, not from GitHub,
are they mixed somehow?

Looking at the below the github url looks wrong, it should be
mesos-site.git not mesos.git, I'll fix.

However the original issue regarding the dependencies update might be
different altogether, might be good if someone - maybe Andreas? - could
have a look.


Cheers,



On Thu, Jun 23, 2022, 23:36 Tomek Janiszewski  wrote:

> It looks like job configuration has changed and it's using GitHub repo now
> and could not find revision so build does not even start.
>
> Now:
> > Cloning repository https://gitbox.apache.org/repos/asf/mesos-site.git
> > Avoid second fetch
>  > git rev-parse origin/asf-site^{commit} # timeout=10
>  > git rev-parse asf-site^{commit} # timeout=10
> ERROR: Couldn't find any revision to build. Verify the repository and
> branch configuration for this job.
> Finished: FAILURE
>
> Before:
>
> > Cloning repository https://github.com/apache/mesos.git
> > Avoid second fetch
>  > git rev-parse origin/asf-site^{commit} # timeout=10
> Checking out Revision f14e3b07149b7213aa3552c50ff059155b33effd
> (origin/asf-site)
>  > git config core.sparsecheckout # timeout=10
>  > git checkout -f f14e3b07149b7213aa3552c50ff059155b33effd # timeout=10
> Commit message: "Updated
>
> czw., 23 cze 2022, 23:16 użytkownik Dave Lester 
> napisał:
>
> > Hi all,
> >
> > The build for the Mesos website is currently broken. It looks like
> several
> > dependency updates via dependabot were merged into the codebase on May
> 22nd
> > causing the issue along the way.
> >
> > The last successful website build
> > https://ci-builds.apache.org/job/Mesos/job/Mesos-Websitebot/1268/ is
> > based on the revision cd71826, which was committed May 4th.
> >
> > Can the two dependabot changes to “/site” be rolled back (9c7ccc2 and
> > 9019e3a) or better yet is there a community member with Middleman / Ruby
> > debugging experience that can help fix the build?
> >
> > Thanks,
> > Dave
> >
>


Re: cgroupv2 and mesos

2022-05-11 Thread Charles-François Natali
Hi,


On Wed, May 11, 2022, 05:22 Andreas Peters  wrote:

> Hi,
>
> if we run mesos under Linux with cgroupsV2 then the agent through the
> following error message:
>
> "Failed to create docker: Failed to find a mounted cgroups hierarchy for
> the 'cpu' subsystem;"
>
> Well, since cgroupsV2 does not mount the subsystems like cpu, memory and
> so on, maybe we should remove these check lines?
>
>
> https://github.com/apache/mesos/blob/cd71826ab244db1f73e78fafe8a42181758a41e8/src/docker/docker.cpp#L146
>
>
> I done it in my test environment and could not found a issue so far.
> Docker in newer versions run very well under cgroupsV2. That's why I'
> thinking, maybe the "precheck" makes no sense in that point.
>

The problem is that without group v1 many things will break: CPU and memory
limiting, and also the freezer which is used to atomically kill tasks etc.

Ideally we'd add support for cgroup v2 but that's a significant amount of
work.

Cheers



> What do u think?
>
> Regards,
> Andreas
>


Re: Mesos Packaging

2021-09-04 Thread Charles-François Natali
Benjamin Bannier has kindly invited me to join the
https://github.com/orgs/mesos/ organisation, so I now have the
required permissions - I'll get in touch with Renán to get this new
repository up-and-running.

Cheers,

Charles

Le jeu. 2 sept. 2021 à 19:13, Charles-François Natali
 a écrit :
>
>
>
> On Wed, Sep 1, 2021, 20:56 Renán Del Valle  wrote:
>>
>> Hi folks,
>>
>> The packaging repo https://github.com/mesosphere/mesos-deb-packaging is
>> no longer maintained by D2iQ and could use a new maintainer and/or home.
>>
>> Since D2iQ is pretty much out of the picture, it'd probably be ideal to
>> create a fork at https://github.com/mesos/ in order to add patches here
>> and there to keep it compatible and up to date with newer distros.
>>
>> If some can make this happen, I'd like to volunteer to maintain it for now.
>
>
> Thanks Renán, that would be great!
>
> Qian, do you know who owns https://github.com/orgs/mesos/ ?
> Looking at the members I can see some overlap with PMC members, but since 
> it's not related to Apache I'm not sure who runs it.
>
> Alternatively we could host it with the ASF but I think that simply hosting 
> on github is simpler and lower-overhead.
>
> Cheers,
>
>
>
>>
>> -Renán
>>
>>


Re: Mesos Packaging

2021-09-02 Thread Charles-François Natali
On Wed, Sep 1, 2021, 20:56 Renán Del Valle  wrote:

> Hi folks,
>
> The packaging repo https://github.com/mesosphere/mesos-deb-packaging is
> no longer maintained by D2iQ and could use a new maintainer and/or home.
>
> Since D2iQ is pretty much out of the picture, it'd probably be ideal to
> create a fork at https://github.com/mesos/ in order to add patches here
> and there to keep it compatible and up to date with newer distros.
>
> If some can make this happen, I'd like to volunteer to maintain it for now.
>

Thanks Renán, that would be great!

Qian, do you know who owns https://github.com/orgs/mesos/ ?
Looking at the members I can see some overlap with PMC members, but since
it's not related to Apache I'm not sure who runs it.

Alternatively we could host it with the ASF but I think that simply hosting
on github is simpler and lower-overhead.

Cheers,




> -Renán
>
>
>


Re: Welcome Charles-François Natali as Mesos Committer and PMC Member!

2021-08-15 Thread Charles-François Natali
Thanks!

Cheers,

Charles


On Sun, Aug 15, 2021, 21:32 Andreas Peters  wrote:

> Congratulations Charles. :-) I'm very happy for you.
>
> Cheers,
> Andreas
>
> On 15.08.21 21:32, Andrei Sekretenko wrote:
> > Hi all,
> > Please welcome Charles-François as a committer and PMC member of the
> > Apache Mesos project!
> >
> > Charles started contributing to Mesos a bit more than a year ago.
> > During this time, he has discovered and fixed a significant number of
> > issues in all corners of Mesos, from libprocess and the build system
> > to the cgroups isolator, thus becoming the most active contributor
> > since November 2020.
> >
> > Also, Charles has been helping a lot with JIRA triage, mentoring other
> > contributors and moving forward a number of technical discussions.
> >
> > Charles, thanks for all your contributions so far and looking forward to
> > continuing to work with you on the project!
> >
> > - Andrei
> >
>
> --
> AVENTER UG (haftungsbeschraenkt)
> Köllner Chaussee 144
> 25337 Kölln-Reisiek
>
> Phone: (+49)4121 - 235582 0
> E-Mail: a...@aventer.biz
> Internet: https://www.aventer.biz
> Git: https://www.github.com/AVENTER-UG
>
> Geschäftsführer/CEO: Andreas Peters
> Unternehmenssitz/City: Kölln-Reisiek
> Handelsregister beim Amtsgericht/Legal Court: Itzehoe
> Handelsregister-Nummer/Company Registration Number: HRB 9995 PI
>
> Diese E-Mail kann vertrauliche und/oder rechtlich geschützte
> Informationen enthalten. Wenn Sie nicht der beabsichtigte Empfänger
> sind oder diese E-Mail irrtümlich erhalten haben, informieren Sie
> bitte sofort den Absender telefonisch oder per E-Mail und löschen
> Sie diese E-Mail aus Ihrem System. Das unerlaubte Kopieren sowie
> die unbefugte Weitergabe dieser Mail ist nicht gestattet.
>
>
> This email can be confidential and/or legally protected
> Information included. If you are not the intended recipient
> or have received this e-mail in error, inform your
> contact and the sender immediately by phone or email and delete this
> email from your system. Unauthorized copying as well
> the unauthorized forwarding of this mail is not permitted.
>


Re: Apache Mesos CI on ARM64

2021-07-01 Thread Charles-François Natali
Can't we just use Martin's VM?

On Thu, 1 Jul 2021, 17:15 Tomek Janiszewski,  wrote:

> It looks like we will not get a machine this month but I was pointed to
> https://www.oracle.com/cloud/free/#always-free it will be super slow but
> we can try.
>
> wt., 29 cze 2021 o 14:59 Tomek Janiszewski  napisał(a):
>
>> I can handle that. Let's wait for the machine, then we only need to
>> change the IP from the old arm machine to the new one and everything should
>> work as before.
>>
>> wt., 29 cze 2021 o 14:56 Qian Zhang  napisał(a):
>>
>>> I think currently Mesos CI only covers CentOS 7 and Ubuntu 16.04 on
>>> x86_64:
>>> https://ci-builds.apache.org//job/Mesos/job/Mesos-Buildbot/. I am not
>>> sure
>>> how to add a new platform in it, maybe create a JIRA ticket for the ASF
>>> Infrastructure team? it seems we did it before:
>>> https://issues.apache.org/jira/browse/INFRA-15261.
>>>
>>>
>>> Regards,
>>> Qian Zhang
>>>
>>>
>>> On Tue, Jun 29, 2021 at 3:26 AM Tomek Janiszewski 
>>> wrote:
>>>
>>> > I created a request to Works On ARM
>>> > https://github.com/WorksOnArm/cluster/issues/278
>>> > If you are interested about history of Mesos on ARM
>>> > https://twitter.com/search?q=ARM%20%40janiszt
>>> >
>>> > pon., 28 cze 2021 o 19:40 Charles-François Natali >> >
>>> > napisał(a):
>>> >
>>> > > Hey,
>>> > >
>>> > > I'm not sure who can arrange that - Qian maybe? - but yes having an
>>> ARM64
>>> > > CI would be great.
>>> > >
>>> > > Having access to Martin's VM really made it much simpler to debug the
>>> > > recent issue with libunwind.
>>> > >
>>> > > Generally having more CI hosts/pipelines would help, e.g. to build
>>> and
>>> > > test with ASAN and UBSAN.
>>> > >
>>> > >
>>> > >
>>> > > Charles
>>> > >
>>> > >
>>> > > On Mon, 28 Jun 2021, 16:20 Andreas Peters, 
>>> > wrote:
>>> > >
>>> > >> Hi!
>>> > >>
>>> > >> > Should we ask Mesos on ARM for new machine? Is anybody interested
>>> in
>>> > >> having
>>> > >> > ARM CI?
>>> > >>
>>> > >>
>>> > >> I think, if it does not take to much time for us then it would be a
>>> good
>>> > >> idea to build the packages for ARM again. :-)
>>> > >>
>>> > >> Best,
>>> > >> Andreas
>>> > >>
>>> > >>
>>> >
>>>
>>


Re: Apache Mesos CI on ARM64

2021-06-28 Thread Charles-François Natali
Hey,

I'm not sure who can arrange that - Qian maybe? - but yes having an ARM64
CI would be great.

Having access to Martin's VM really made it much simpler to debug the
recent issue with libunwind.

Generally having more CI hosts/pipelines would help, e.g. to build and test
with ASAN and UBSAN.



Charles


On Mon, 28 Jun 2021, 16:20 Andreas Peters,  wrote:

> Hi!
>
> > Should we ask Mesos on ARM for new machine? Is anybody interested in
> having
> > ARM CI?
>
>
> I think, if it does not take to much time for us then it would be a good
> idea to build the packages for ARM again. :-)
>
> Best,
> Andreas
>
>


Re: [jira] [Comment Edited] (MESOS-10224) [test] CSIVersion/StorageLocalResourceProviderTest.OperationUpdate fails.

2021-06-23 Thread Charles-François Natali
The last option is fine, i.e. checking that we don't reach past the end of
the file.

Weakening this check is fine since... It's not correct anymore.

On Wed, 23 Jun 2021, 17:52 Saad Ur Rahman (Jira),  wrote:

>
> [
> https://issues.apache.org/jira/browse/MESOS-10224?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=17367775#comment-17367775
> ]
>
> Saad Ur Rahman edited comment on MESOS-10224 at 6/23/21, 4:51 PM:
> --
>
> [~cf.natali], I am finally getting around to patching the issue here.
>
> My understanding of the routine is that it parses the linker library to
> generate a vector of library names and paths. It does this by casting
> memory blocks into structs to give them parsable structure.
>
> The failing conditional on line _#227_ is because of the droppings Ubuntu
> leaves at the end of the file. The data pointer should point to the end of
> the file to indicate complete parsing. The following conditional on line
> _#235_ ensures NUL termination.
>
> The solutions I can think of are to adjust for the end of data
> specifically for Ubuntu (if we can) by setting:
> {code:java}
> data = buffer->size();
> {code}
> Or we can do this if the data pointer is not at the end:
> {code:java}
> if ((size_t)(data - buffer->data()) < buffer->size())
> {code}
> Another thing we can is let it slide if the data pointer is less than or
> equal to the buffer size on line _#227_:
> {code:java}
> if ((size_t)(data - buffer->data()) > buffer->size()) {
>   return Error("Invalid format");
> }
> {code}
> What are your thoughts? All of the above are quick adjustments but they
> weaken the original checks.
>
>
> was (Author: surahman):
> [~cf.natali], I am finally getting around to patching the issue here.
>
> My understanding of the routine is that it parses the linker library to
> generate a vector of library names and paths. It does this by casting
> memory blocks into structs to give them parsable structure.
>
> The failing conditional on line _#227_ is because of the droppings Ubuntu
> leaves at the end of the file. The data pointer should point to the end of
> the file to indicate complete parsing. The following conditional on line
> _#235_ ensures NUL termination.
>
> The solutions I can think of are to adjust for the end of data
> specifically for Ubuntu (if we can) by setting:
> {code:java}
> data = buffer->size();
> {code}
> Or we can do this if the data pointer is not at the end:
> {code:java}
> if ((size_t)(data - buffer->data()) < buffer->size())
> {code}
> Another thing we can is let it slide if the data pointer is strictly less
> than the buffer size on line _#227_:
> {code:java}
> if ((size_t)(data - buffer->data()) > buffer->size()) {
>   return Error("Invalid format");
> }
> {code}
> What are your thoughts? All of the above are quick adjustments.
>
> > [test] CSIVersion/StorageLocalResourceProviderTest.OperationUpdate fails.
> > -
> >
> > Key: MESOS-10224
> > URL: https://issues.apache.org/jira/browse/MESOS-10224
> > Project: Mesos
> >  Issue Type: Bug
> >  Components: test
> >Affects Versions: 1.11.0
> >Reporter: Saad Ur Rahman
> >Priority: Major
> > Attachments: ld.so.cache
> >
> >
> > *OS:* Ubuntu 21.04
> > *Command:*
> > {code:java}
> > make -j 6 V=0 check{code}
> > Fails during the build and test suite run on two different machines with
> the same OS.
> > {code:java}
> > 3: [   OK ] CSIVersion/StorageLocalResourceProviderTest.Update/v0
> (479 ms)
> > 3: [--] 14 tests from
> CSIVersion/StorageLocalResourceProviderTest (27011 ms total)
> > 3:
> > 3: [--] Global test environment tear-down
> > 3: [==] 575 tests from 178 test cases ran. (202572 ms total)
> > 3: [  PASSED  ] 573 tests.
> > 3: [  FAILED  ] 2 tests, listed below:
> > 3: [  FAILED  ] LdcacheTest.Parse
> > 3: [  FAILED  ]
> CSIVersion/StorageLocalResourceProviderTest.OperationUpdate/v0, where
> GetParam() = "v0"
> > 3:
> > 3:  2 FAILED TESTS
> > 3:   YOU HAVE 34 DISABLED TESTS
> > 3:
> > 3:
> > 3:
> > 3: [FAIL]: 4 shard(s) have failed tests
> > 3/3 Test #3: MesosTests ...***Failed  1173.43 sec
> > {code}
> > Are there any pre-requisites required to get the build/tests to pass? I
> am trying to get all the tests to pass to make sure my build environment is
> setup correctly for development.
>
>
>
> --
> This message was sent by Atlassian Jira
> (v8.3.4#803005)
>


Re: *****SPAM***** Re: State of the Project

2021-05-31 Thread Charles-François Natali
Hey,

Could we please try to stay on topic?


Thanks,



On Mon, 31 May 2021, 20:55 Zahoor,  wrote:

> hmmm... Ethereum has a popular client in golang and attracts many rookie
> developers every day.
> But still one of the most complex and nicely maintained software on Earth.
>
> On Mon, May 31, 2021 at 7:08 PM Marc  wrote:
>
>> >
>> > Better to rewrite/redesign Mesos with a more popular language (like
>> > golang) to attract more developers.
>> > Just a mind voice.
>> >
>> > ../Zahoor
>>
>> H, I would argue exactly the opposite.
>> Going to these easier entry level languages attracts rookies coders and
>> rookie thinking. I have seen some really weird design and errors in
>> csi-ceph and influx.
>>
>
>
> --
> ./zahoor
>
> Web: http://zahoor.in
> Twit: @jmohamedzahoor
>


Re: Old Jira Tasks

2021-05-31 Thread Charles-François Natali
Hey,

I'm not sure, I was wondering about this.
The problem with closing old tickets is hat some of them are still
serious bugs and valid, so it'd be a bit sad to lose them.

Closing all tickets which look like user questions/errors/etc on the
other hand is fine I think.

Cheers,


Le lun. 31 mai 2021 à 09:05, Andreas Peters  a écrit :
>
> Hi,
>
> just to discuss! :-) Does it makes sense to close all abandoned bug
> reports older then (maybe) two years?
>
> I have no idea if the bugs still exists, and it's not even easy to get
> feedback of much younger Jira tasks.
>
> What do u think?
>
> Cheers,
> Andreas
>
>


Re: State of the Project

2021-05-30 Thread Charles-François Natali
Le dim. 30 mai 2021 à 16:09, Qian Zhang  a écrit :
> [...] So given the
> active committers and contributors that we have in the community, I do not
> think we can do anything big in the short term, instead we should do small
> things to gradually activate the community. Here are what in my mind:
> 1. Review and merge the outstanding PRs.
> 2. Review the tickets in JIRA and select some high priority ones to work on.
> 3. Add at least one new committer.

Yes I think it's important to mention, in response to Javi's point,
that one doesn't need to be an hard-core C++ dev to contribute.
The code base is actually very clean and easy to read, the main
problem is the use of libprocess/actor model which takes some getting
used to, especially for people who're more used to a reactor, green
thread, etc models. The libprocess doc [1] gives a good overview. The
stout doc [2] is also worth a read although nothing surprising about
its design.

But in any case I think there's a lot of valuable work which doesn't
require any C++, for example as Qian mentions going through the huge
backlog of JIRA issues.
I know it doesn't sound like the most exciting thing but it would
actually help a lot to do some triage, try to reproduce bugs, close
stale tickets, respond to user questions etc.

I remember seeing a few other people answering Qian's call for
contributors a couple months ago [3], it'd be great if they could
reach out if they're still interested - if not it's fine, I know we're
all busy with our lives :).

Cheers,

Charles


[1] https://github.com/apache/mesos/tree/master/3rdparty/libprocess
[2] https://github.com/apache/mesos/tree/master/3rdparty/stout
[3] 
https://mail-archives.apache.org/mod_mbox/mesos-dev/202103.mbox/%3CCABY6VOb%3DT8VxehVaS1YBrC5_odEwKhZzj3R4o3b-ykCytDw3JA%40mail.gmail.com%3E


>
> Please let me know for any comments / suggestions, thanks!
>
>
> Regards,
> Qian Zhang
>
>
> On Sun, May 30, 2021 at 7:59 PM Zahoor  wrote:
>
> > Hi
> >
> > Better to rewrite/redesign Mesos with a more popular language (like
> > golang) to attract more developers.
> > Just a mind voice.
> >
> > ../Zahoor
> >
> >
> > On Sun, May 30, 2021 at 4:08 PM Javi Roman 
> > wrote:
> >
> >> Totally agree that the main problem with this project is trying to
> >> increase the developer community.
> >>
> >> From my point of view, attracting new developers to a project of this
> >> complexity is difficult (C++ low level developers, creating Java and Python
> >> bindings is not easy). However, if we try to broaden the objectives of the
> >> project we may be able to attract other developers (not only C++
> >> developers) who can help.
> >>
> >> One idea I have always had is to incorporate the concept and technology
> >> of D2IQ DC/OS [1], in this way we would continue the abandoned work of D2IQ
> >> by extending Apache Mesos to a more user-friendly technology and broaden
> >> the base of developers with interest in (ReactJS, Go, Scala, databases).
> >>
> >> I would be interested in contributing in this line, being able to apply
> >> my knowledge in other areas, beyond C++ (which unfortunately I am not
> >> proficient in).
> >>
> >> [1] https://github.com/dcos
> >> --
> >> Javi Roman
> >>
> >> Twitter: @javiromanrh
> >> GitHub: github.com/javiroman
> >> Linkedin: es.linkedin.com/in/javiroman
> >> Big Data Blog: dataintensive.info
> >>
> >>
> >> On Sat, May 29, 2021 at 12:47 PM Charles-François Natali <
> >> cf.nat...@gmail.com> wrote:
> >>
> >>> Hi Renan,
> >>>
> >>> > Renaming the topic because apparently we need to have this discussion
> >>> again.
> >>>
> >>> Thanks for bringing this up again, because it is indeed still a problem.
> >>>
> >>> > Therefore, the PMC *must* add members or the project  *will* fizzle out
> >>> > and die.
> >>> >
> >>> > I'd also be curious to see if we even have enough PMC members to form a
> >>> > quorum at the moment as I only see Andrei Sekretenko reviewing pull
> >>> > requests on Github and the new chair Qian Zhang on emails. The project
> >>> > needs three PMC members for the project to be considered in a good
> >>> state
> >>> > according to the Apache guidelines [0].
> >>> >
> >>>
> >>> I must say I'm also a bit confused.
> >>> The new project chair was elected exactly a month ago [1].
> >>> Since then, the only t

Re: State of the Project

2021-05-29 Thread Charles-François Natali
Hi Renan,

> Renaming the topic because apparently we need to have this discussion again.

Thanks for bringing this up again, because it is indeed still a problem.

> Therefore, the PMC *must* add members or the project  *will* fizzle out
> and die.
>
> I'd also be curious to see if we even have enough PMC members to form a
> quorum at the moment as I only see Andrei Sekretenko reviewing pull
> requests on Github and the new chair Qian Zhang on emails. The project
> needs three PMC members for the project to be considered in a good state
> according to the Apache guidelines [0].
>

I must say I'm also a bit confused.
The new project chair was elected exactly a month ago [1].
Since then, the only thing I have seen - there might be more going on
being the scenes - is a single thread calling for input on new
technical direction [2], which as several people mentioned before is
not the most important issue the project is facing right now.
As far as I can tell, nothing as been done by the PMC/project chair to
address the more fundamental issue of the health of the community.
Now, Andrei has been doing a great job at reviewing MRs, but as
mentioned before he only has so much time available, and the project
can't have only one active committer.
So it would be good to hear from the project chair what they are
planning to do, if anything, to address this situation.
>From some private conversations I know that they have been busy with
other obligations in the past month so maybe it's only a bad timing
and just a transient state, however I don't think it's viable to
continue if even the project chair doesn't have any time to dedicate
to the project - not even replying to this thread.

> At this point I suggest the PMC does a roll call and get Apche board
> members involved so that they can be aware of the situation.

I'm not familiar with the ASF but yes it does sounds like a possible
course of action?

Cheers,

Charles



[1] 
https://mail-archives.apache.org/mod_mbox/mesos-dev/202104.mbox/%3CCAE0xwObaHPiSFM3KrY1SL--E864L48o_LF2E7PP2%3DUu3rk99gQ%40mail.gmail.com%3E
[2] 
https://mail-archives.apache.org/mod_mbox/mesos-dev/202105.mbox/%3CCABY6VOaOxSp%2BeMJm_jSTdY%3DD5Qp%3DT%2B89Cvaxqw7GLbFYr1qzew%40mail.gmail.com%3E









>
> -Renan
>
> [0]https://www.apache.org/dev/pmc.html
>
> On 5/24/21 10:21 AM, Charles-François Natali wrote:
> > Hey,
> >
> > Le lun. 24 mai 2021 à 14:12, Qian Zhang  a écrit :
> >>> several fixing bugs which basically make Mesos unusable on a recent Linux
> >> distro
> >> Can you please elaborate a bit on this? Do you mean Mesos not working on a
> >> recent Linux distro? If so, I think we can start to fix the issues and
> >> maybe do a patch release for that.
> > Yes, there are several issues on recent Linux distributions, e.g.
> > Debian Bullseye:
> > - https://github.com/apache/mesos/pull/387: compilaiton error,
> > although it's only in master not in the last release
> > - https://github.com/apache/mesos/pull/388: problem with the freezer
> > cgroup based task killer which causes over a dozen test to fail and
> > can leave the freezer frozen, tasks in uninterruptible state etc
> > - https://github.com/apache/mesos/pull/384: problem parsing
> > ld.so.cache which also breaks a lot of things
> >
> > You were tagged in some of this MRs, I tagged you in all of them, it'd
> > be great if you could have a look :).
> >
> > Cheers,
> >
> >>
> >> Regards,
> >> Qian Zhang
> >>
> >>
> >> On Fri, May 21, 2021 at 2:57 AM Charles-François Natali 
> >> 
> >> wrote:
> >>
> >>> Hey,
> >>>
> >>> Sorry for being a killjoy and repeating myself, but as mentioned in
> >>> the past, I don't think that technical direction is the most important
> >>> problem right now - community is.
> >>> Coming up with medium/long-term technical roadmap doesn't do much if
> >>> there are no contributors to implement them, and users to use them.
> >>>
> >>> The following issues which have been brought up are still not resolved:
> >>> - very few committers willing to review and merge MRs - currently only
> >>> Andrei Sekretenko is doing that, and I'm sure he's busy with his day
> >>> job so only has that much bandwidth
> >>> - very few people contribute MRs and triage/address JIRA issues -
> >>> AFAICT it's pretty much Andreas and me
> >>>
> >>> So I think the first thing to do would be to address those problems.
> >>> Some suggestions which come to mind:
> >>> - to the remaining committers who'd still like to salvage the projec

Re: [BULK]Discuss the possible technical directions of Mesos

2021-05-24 Thread Charles-François Natali
Hey,

Le lun. 24 mai 2021 à 14:12, Qian Zhang  a écrit :
> > several fixing bugs which basically make Mesos unusable on a recent Linux
> distro
> Can you please elaborate a bit on this? Do you mean Mesos not working on a
> recent Linux distro? If so, I think we can start to fix the issues and
> maybe do a patch release for that.

Yes, there are several issues on recent Linux distributions, e.g.
Debian Bullseye:
- https://github.com/apache/mesos/pull/387: compilaiton error,
although it's only in master not in the last release
- https://github.com/apache/mesos/pull/388: problem with the freezer
cgroup based task killer which causes over a dozen test to fail and
can leave the freezer frozen, tasks in uninterruptible state etc
- https://github.com/apache/mesos/pull/384: problem parsing
ld.so.cache which also breaks a lot of things

You were tagged in some of this MRs, I tagged you in all of them, it'd
be great if you could have a look :).

Cheers,

>
>
> Regards,
> Qian Zhang
>
>
> On Fri, May 21, 2021 at 2:57 AM Charles-François Natali 
> wrote:
>
> > Hey,
> >
> > Sorry for being a killjoy and repeating myself, but as mentioned in
> > the past, I don't think that technical direction is the most important
> > problem right now - community is.
> > Coming up with medium/long-term technical roadmap doesn't do much if
> > there are no contributors to implement them, and users to use them.
> >
> > The following issues which have been brought up are still not resolved:
> > - very few committers willing to review and merge MRs - currently only
> > Andrei Sekretenko is doing that, and I'm sure he's busy with his day
> > job so only has that much bandwidth
> > - very few people contribute MRs and triage/address JIRA issues -
> > AFAICT it's pretty much Andreas and me
> >
> > So I think the first thing to do would be to address those problems.
> > Some suggestions which come to mind:
> > - to the remaining committers who'd still like to salvage the project,
> > please take some time to review and merge MRs -
> > https://github.com/apache/mesos/pulls has a few open, several fixing
> > bugs which basically make Mesos unusable on a recent Linux distro
> > - to the various users who've said they were interested in keeping the
> > project alive: start contributing. It doesn't have to be anything big,
> > just get familiar with the code base:
> >   * start going through JIRA and triage bugs, closing invalid/stale
> > ones, tackling small issues
> >   * submit MRs so that the test suite passes on your OS
> >   * submit MRs to merge various commits you have in your private repos
> > if applicable
> >
> > Then in a few months, once the project  is back to having a small
> > active contributors base, they can together decide how to take the
> > project forward, and start addressing larger projects.
> >
> > Cheers,
> >
> > Charles
> >
> >
> >
> >
> >
> >
> > Le jeu. 20 mai 2021 à 18:16, Gregoire Seux  a écrit :
> > >
> > > Hi,
> > >
> > > Interesting set of suggestions! Here are a few comments:
> > >
> > >   *   Mesos feels simple to deploy (only very few components: zookeeper,
> > masters and agents), customization is done mostly through configuration
> > files. I don't think there is a strong need to make it easier (even though
> > I've used Mesos for years, so I'm pretty used to the difficulty if any)
> > >   *   Having to manage Zookeeper adds some complexity but since
> > Zookeeper piece is required to operate Marathon (which is our main
> > framework), I don't see much value in the investment required to get rid of
> > this dependency.
> > >   *   Taking advantage of NUMA topology by default would be a good
> > addition although I don't see it as strategic (at least we have solved this
> > on our clusters with custom modules)
> > >   *   I would love to see improvement on masters scalability for large
> > clusters (our largest cluster is 3500 nodes and may start to suffer from
> > the actor model)
> > >
> > > Something that I see as a very significant drawback to the ecosystem at
> > large is the difficulty to write frameworks. In addition to this, most
> > open-source frameworks feel abandoned. Without good frameworks, Mesos value
> > really decreases a lot (although it is very technically strong).
> > > I think, making Mesos thrive would necessarily go through a solution to
> > this issue.
> > >
> > > Something that I'd see as strategic would be the ability to deploy
> > complex workloads on Mesos without having to write a new framework. Random
> > idea: make Mesos really usable as a backend for Kubernetes (as a virtual
> > kubelet). This would remove a lot of barriers to use Mesos as a strong
> > engine to operate a fleet of servers while allowing to use the Kubernetes
> > API that apparently everybody loves.
> > >
> > > What do you think?
> > >
> > > --
> > > Grégoire Seux
> > >
> >


Re: [BULK]Discuss the possible technical directions of Mesos

2021-05-20 Thread Charles-François Natali
Hey,

Sorry for being a killjoy and repeating myself, but as mentioned in
the past, I don't think that technical direction is the most important
problem right now - community is.
Coming up with medium/long-term technical roadmap doesn't do much if
there are no contributors to implement them, and users to use them.

The following issues which have been brought up are still not resolved:
- very few committers willing to review and merge MRs - currently only
Andrei Sekretenko is doing that, and I'm sure he's busy with his day
job so only has that much bandwidth
- very few people contribute MRs and triage/address JIRA issues -
AFAICT it's pretty much Andreas and me

So I think the first thing to do would be to address those problems.
Some suggestions which come to mind:
- to the remaining committers who'd still like to salvage the project,
please take some time to review and merge MRs -
https://github.com/apache/mesos/pulls has a few open, several fixing
bugs which basically make Mesos unusable on a recent Linux distro
- to the various users who've said they were interested in keeping the
project alive: start contributing. It doesn't have to be anything big,
just get familiar with the code base:
  * start going through JIRA and triage bugs, closing invalid/stale
ones, tackling small issues
  * submit MRs so that the test suite passes on your OS
  * submit MRs to merge various commits you have in your private repos
if applicable

Then in a few months, once the project  is back to having a small
active contributors base, they can together decide how to take the
project forward, and start addressing larger projects.

Cheers,

Charles






Le jeu. 20 mai 2021 à 18:16, Gregoire Seux  a écrit :
>
> Hi,
>
> Interesting set of suggestions! Here are a few comments:
>
>   *   Mesos feels simple to deploy (only very few components: zookeeper, 
> masters and agents), customization is done mostly through configuration 
> files. I don't think there is a strong need to make it easier (even though 
> I've used Mesos for years, so I'm pretty used to the difficulty if any)
>   *   Having to manage Zookeeper adds some complexity but since Zookeeper 
> piece is required to operate Marathon (which is our main framework), I don't 
> see much value in the investment required to get rid of this dependency.
>   *   Taking advantage of NUMA topology by default would be a good addition 
> although I don't see it as strategic (at least we have solved this on our 
> clusters with custom modules)
>   *   I would love to see improvement on masters scalability for large 
> clusters (our largest cluster is 3500 nodes and may start to suffer from the 
> actor model)
>
> Something that I see as a very significant drawback to the ecosystem at large 
> is the difficulty to write frameworks. In addition to this, most open-source 
> frameworks feel abandoned. Without good frameworks, Mesos value really 
> decreases a lot (although it is very technically strong).
> I think, making Mesos thrive would necessarily go through a solution to this 
> issue.
>
> Something that I'd see as strategic would be the ability to deploy complex 
> workloads on Mesos without having to write a new framework. Random idea: make 
> Mesos really usable as a backend for Kubernetes (as a virtual kubelet). This 
> would remove a lot of barriers to use Mesos as a strong engine to operate a 
> fleet of servers while allowing to use the Kubernetes API that apparently 
> everybody loves.
>
> What do you think?
>
> --
> Grégoire Seux
>


Re: New PMC Chair

2021-04-29 Thread Charles-François Natali
Congratulations!



On Thu, 29 Apr 2021, 23:37 Andreas Peters,  wrote:

> Great to hear. :-)
>
> Am 29.04.21 um 16:35 schrieb Vinod Kone:
> > Hi community,
> >
> > Just wanted to let you all know that the board passed the resolution to
> > elect a new PMC chair!
> >
> > Hearty congratulations to *Qian Zhang* for becoming the new Apache Mesos
> > PMC chair and VP of the project.
> >
> > Thanks,
> >
>
>


Re: [VOTE] Move Apache Mesos to Attic

2021-04-08 Thread Charles-François Natali
Hey Andrei,

You make some very good points.

> I do not see how the approach "let's move the project out of ASF and
> lower the bar there" benefits the _future_ of the project, compared to
> lowering the committer bar while remaining in ASF.
> (I will reconsider my vote if someone explains how the latter is less
> harmful to the project than the former)

I guess some committers could prefer this approach if they are
attached to the brand name/history of the Mesos project, and fear that
this would tarnish the legacy if lowering the bar within the ASF
causes the project quality to go down.
That's a legitimate concern, but given that the alternative risks to
effectively kill the project - especially with the legal
considerations that a fork would apparently entail - I think it's
worth trying to keep it in the ASF.
The main reason I voted in favor to moving it to the Attic was so that
interested people could actually get a chance to contribute, because
as-is, it just seems really difficult - I actually made some
contributions and want to contribute more but the absence of support
when trying to e.g. close old tickets or get MRs reviewed turned me
off.

> To continue any development (including maintenance), the entry bar for
> the people allowed to merge code and participate in project decisions
> has to be lowered, this way or another.

Agreed.

FWIW, I'm a long-time committer for the reference Python
implementation (CPython), although I haven't contributed in a while.
And I remember a long time ago we had some discussions on the bar to
accept new contributors and committers, and at the time I was on the
side of being conservative and keeping the bar high. But some people
said that it was much better for the long-term viability of the
project to actually be fairly liberal and quickly grant bug tracker
permissions, and even commit permissions after a few contributions.
And they were right: it's a great way to get new people involved.
Waiting a few years to give commit rights to someone doesn't make
sense given how quickly people can change jobs, projects etc, so
having a high bar might just turn away some people would might
otherwise have become involved. And it doesn't have to affect the
quality as long as a core group of people are willing to take time to
mentor and review junior committers.
So my suggestion would be to be more liberal in granting people bug
tracker permissions and commit permissions - but of course some
minimum mentoring and code reviews would still be needed.

>  - I'm ready to provide technical mentoring (and, hopefully,
> reasonably quick reviews) in areas I'm familiar with (this,
> unfortunately, excludes most of the inner workings of the agent, for
> example)
>  - I can provide mentoring and code reviews in areas I'm less familiar
> with, but do not expect quick feedback from me there, as meaningfully
> reviewing an unfamiliar code can be 2 to 100 times slower

If you - and I assume Qian since he mentioned still being interested
and even willing to step up as chair - are willing to do some minimum
mentoring and code reviews then that would be great - I'd be happy to
submit some more MRs I want to work on, go through the tracker and
follow up/close tickets etc. Hopefully the other people who said they
were interested, like the guys from Criteo, Andreas, Javi, etc would
help too.

Also, to answer your questions, I don't really have a long-term plan
either - all I can tell you is that at work we're very happy with
Mesos: it fits exactly our needs, is rock-solid, scales perfectly well
and requires very little maintenance. It was actually the only product
fitting our criterias when we did a survey, and AFAICT this still
holds today. We have absolutely no reason to replace it.
So personally I'm happy with it continuing as a low-level distributed
operating system, doing maintenance, adding small features - or
potentially larger ones like NUMA topology support, etc. Like I said
before, I don't think curl is gaining new features every day, but I
still find it invaluable and plan to keep using it for the foreseeable
future.

Cheers,

Charles



Le jeu. 8 avr. 2021 à 23:56, Andrei Sekretenko  a écrit :
>
> Well, one of the strengths of this project used to be effective
> collaboration between people sitting in different locations and
> different time zones (and, at certain intervals of time, working in
> different companies).
>
> Talking about COVID... the pandemic, assuming that it will subside,
> should probably be one more reason to vote -1 at this point in time.
> I don't know about everyone, but the additional overhead it creates
> everywhere is, for example, absolutely not helping me spend any time
> outside my working hours on any kind of knowledge transfer in Mesos
> (including discussions on the future of the project, for that matter).
>
> On Thu, 8 Apr 2021 at 18:29, Andreas Peters  wrote:
> >
> > As I see, Andrei you are sitting close to me (Hamburg, Germany), thats
> > just 30km away 

Re: [VOTE] Move Apache Mesos to Attic

2021-04-08 Thread Charles-François Natali
I must say I'm really confused as well - what constitutes a fork in this
context?



On Thu, 8 Apr 2021, 21:07 Andreas Peters,  wrote:

> Hi Shane!
>
>
> > Forks are *NOT* allowed to use the ASF's brands or trademarks, so if
> > you're planning to advertise a new fork to do active development, my bet
> > is you'll need a new name.
>
> Does it mean, "my" fork cannot called Mesos?
>
> If it's so, then please do not move Mesos in the attic. There are
> peoples who like to care Mesos but if we have to rename it, then it's
> like you kill it.
>
> Cheers,
> Andreas
>
>


Re: [VOTE] Move Apache Mesos to Attic

2021-04-06 Thread Charles-François Natali
Hi Rich,

FWIW, I'm one of those people who said they were interested, and I
still voted to move it to the attic (even though my vote is non
binding as I'm not a committer).

Initially I also thought that we could try to revive it within the
ASF, but it quickly became clear that *none* of the current committers
is willing to go down that route, i.e. put in the effort needed to
onboard new committers. And without that, there's just no way forward.
Various people voiced other concerns as well, such as viability of the
project when other alternatives like Kubernetes exist, lack of clear
technical direction for the future, etc.
While they're relevant questions, I think currently they don't really
make sense since the current Mesos community is basically dead.
Finally, I think that the project should be moved to the Attic de
facto because AFAICT the Apache rules require at least 3 *active*
committers, and that's definitely not the case.

However I still do believe in the project for the reasons I outlined
in some of the previous threads, and I'm still interested in
contributing: I just think that the current structure of the project
is not suited for that anymore. And to be honest, I just want to move
on, I'm tired of those endless discussions - it's been almost 2 months
since the first thread stared, and nothing happened.

It's a shame that we won't be able to continue using
https://github.com/apache/mesos though, as it creates a much higher
barrier to continuing the project.

However if that's really not possible, then I guess that leaves no
other option: once the vote has passed, I guess I'll start a final
thread to gather people who'd be interested to create a new project
forked off master on github, so we can start from scratch with our own
repository, bug tracker etc. I hope those people who said they're
actually interested will be willing to take an active part.

Cheers,

Charles





Le mer. 7 avr. 2021 à 02:50, Rich Bowen  a écrit :
>
> I hope y'all can forgive me for sticking my nose in, as a concerned member. 
> Color me confused by this vote.
>
> A month ago, on this same list - 
> https://lists.apache.org/thread.html/r307db648e201182fcf39b0de63ba224b94965501e20e6cbcecc085e4%40%3Cdev.mesos.apache.org%3E
>  - Qian asked who was still interested in keeping the project going. SIX 
> people responded that, given the chance, they'd step up and keep it going.
>
> Around that same time - 
> https://lists.apache.org/thread.html/raed89cc5ab78531c48f56aa1989e1e7eb05f89a6941e38e9bc8803ff%40%3Cdev.mesos.apache.org%3E
>  - Vinod observed that the too-high barrier to granting committer rights has 
> been a major factor in the slowdown of the project.
>
> And yet, y'all are voting to attic the project.
>
> So, again, it's not my project, and I don't have a vote here, but the reason 
> the Board asks projects to have these attic conversations on the Dev list is 
> *specifically* so that interested people can say, hey, don't attic it, we'll 
> take it from here. Which six people, plus Qian, have done.
>
> Maybe it's time to lower the barrier to entry, and let these willing people 
> take the project forward, do so. The Board can work out the picky little 
> details of re-forming the PMC, if that's a difficulty.
>
>


Re: [VOTE] Move Apache Mesos to Attic

2021-04-06 Thread Charles-François Natali
+1 (non binding)

Hopefully this will allow the project to continue its life outside the ASF.

Can you confirm whether it'll be possible to continue using
https://github.com/apache/mesos and give write permissions to the various
people who expressed interest?

On Tue, 6 Apr 2021, 14:47 Alex R,  wrote:

> +1 (binding)
>
> Great times having worked with you folks!
>
> On Mon, 5 Apr 2021 at 21:50, Andrew Schwartzmeyer 
> wrote:
>
> > It seems to be the best (and really only) move at this point, +1.
> >
> > It was nice working with you all!
> >
> > Andy
> >
> > On 2021/04/05 18:31:13, Benjamin Bannier  wrote:
> > > With a heavy heart, but also curiosity about what will come next, +1.
> > >
> > >
> > > Benjamin
> > >
> > > On Mon, Apr 5, 2021 at 7:58 PM Vinod Kone 
> wrote:
> > >
> > > > Hi folks,
> > > >
> > > > Based on the recent conversations
> > > > <
> >
> https://lists.apache.org/thread.html/raed89cc5ab78531c48f56aa1989e1e7eb05f89a6941e38e9bc8803ff%40%3Cuser.mesos.apache.org%3E
> > >
> > > > on our mailing list, it seems to me that the majority consensus among
> > the
> > > > existing PMC is to move the project to the attic
> > > >  and let the interested community members
> > > > collaborate on a fork in Github.
> > > >
> > > > I would like to call a vote to dissolve the PMC and move the project
> to
> > > > the attic.
> > > >
> > > > Please reply to this thread with your vote. Only binding votes from
> > > > PMC/committers count towards the final tally but everyone in the
> > community
> > > > is encouraged to vote. See process here
> > > > .
> > > >
> > > > Thanks,
> > > >
> > >
> >
>


Re: [BULK]Re: [BULK]Call for active contributors

2021-03-04 Thread Charles-François Natali
Also in - looking forward to contribute further.

Le jeu. 4 mars 2021 à 16:50, Tomek Janiszewski  a écrit :
>
> I'm in
> I can help with ARM CI, UI and some C++
>
> czw., 4 mar 2021 o 16:04 Thomas Langé  napisał(a):
>>
>> Same for me, I'm available to share my knowledge and actively contribute to 
>> Mesos project.
>>
>> Thomas
>> 
>> From: Haijiang Chen 
>> Sent: Thursday, 4 March 2021 16:01
>> To: u...@mesos.apache.org 
>> Cc: mesos 
>> Subject: [BULK]Re: [BULK]Call for active contributors
>>
>> I am willing to contribute to Mesos on windows based on my current 
>> experiences.
>>
>> - Haijiang
>>
>> On Thu, Mar 4, 2021 at 10:56 PM Grégoire Seux  wrote:
>>
>> Already answered to the other thread. I'm in.
>>
>> --
>> Grégoire
>> 
>> From: Qian Zhang 
>> Sent: Thursday, March 4, 2021 3:38 PM
>> To: mesos ; user 
>> Subject: [BULK]Call for active contributors
>>
>> Hi folks,
>>
>> Please reply to this mail if you plan to actively contribute to Mesos and 
>> want to become a committer and PMC member in future.
>>
>>
>> Regards,
>> Qian Zhang


Re: Feature requests for Mesos

2021-03-01 Thread Charles-François Natali
I couldn't agree more.



On Mon, 1 Mar 2021, 15:08 Benjamin Bannier,  wrote:

> Hi Charles-François,
>
> thanks for your detailed message, you captured important points, and I
> think I agree with your sentiment here. Mesos might still have a place, and
> before thinking about what new features to add, the project first needs to
> solve more fundamental issues.
>
> My previous pessimistic assessment on this list came from a similar angle
> but I think with wider scope: a healthy project requires a healthy
> community where users can find help, but also can have some hope that
> important issues will get fixed. I have not been able to spend much time on
> Mesos in the last year, but was following Slack and the mailing lists (the
> ones with humans and the ones with bots). On the mailing lists I see users
> ask for help with issues they run into or questions, but only rarely will
> get a response from committers or other community members. Few new JIRA
> issues were filed in the since fall 2020, but hardly any of them have been
> triaged let alone fixed (this is on top of the existing bug backlog). I do
> not think one needs to be a committer to improve on that situation if one
> can get help getting patches discussed, reviewed and ultimately merged. It
> looks like Andrei and Qian have committed to help on the latter, but I have
> only rarely seen community members volunteer for the former.
>
> When I wrote that I thought starting a new project on top of Apache Mesos
> today might be not a good idea, I mainly came from that angle. While the
> software does work for many use cases it seems to be unmaintained with
> hardly any folks active in taking it further globally, beyond their own
> immediate needs, and willing to take on the needed work. Being a top-level
> Apache project with a strong history, Apache Mesos still has a brand, but I
> don't think it has lived up to the associated expectations. Similarly, big
> ownership gaps (technical and project-wise) have developed which neither
> active committers nor community members have filled. Again, one would not
> need to be a committer to develop expertise and contribute, and actually
> the natural and historic process was for folks to do exactly that with
> committership being a thing only after getting involved (see
> https://community.apache.org/newcommitter.html for Apache's high-level
> view
> on that). This is the issue of continued trust Renan mentioned in their
> message to the user mailing list which I also believe is critical so the
> project can live up to its promise (this is integral to being an Apache
> project, see e.g., https://www.apache.org/theapacheway).
>
> As a non-user with emotional attachment to the historic Apache Mesos brand,
> my list of areas in need of improvement to resurrect this project would be:
>
> - willingness of remaining active committers to be active on a regular
> basis in engagements with the community, both on the user and contributor
> side (in PRs, review requests, on mailing lists),
> - transparent and active discussions in the community, among committers and
> contributors, and among committers, in applicable form, beyond roll calls,
> - timely and consistent process to address user issues, and
> - consistent ownership of the bug and feature backlog.
>
> Note that work on new feature requests is absent from my list. That folks
> want to discuss that here and now seems to me to be another sign that the
> Mesos community is not in a good place given all its existing non-technical
> issues.
>
>
> Best,
>
> Benjamin
>


Re: Feature requests for Mesos

2021-02-28 Thread Charles-François Natali
I'm not sure that the comparison to Kubernetes is very apt - I don't think
anyone is under the illusion that Mesos is a contender for it, that ship
has long sailed.

Also, features availability is only one of the many aspects which should be
considered when choosing among potential candidate technologies:
reliability, scalability, operational complexity, extensibility and fitness
to the actual use case are all aspects which drove us to choose Mesos over
the many alternative back when we made the decision - which we haven't
regretted since.

To give a very bad analogy, Firefox is a much better and feature complete
browser than Curl, however it doesn't mean that the latter doesn't have its
use cases, even in 2021.

As far as I'm concerned, I'd be quite happy with Mesos continuing as a
"distributed systems kernel" as described on http://mesos.apache.org/,
allowing interested individuals and organisations to build on top of it -
we're certainly happy with this paradigm at my company.

Now obviously it doesn't mean that we shouldn't consider adding new
features - for example something which was requested by some users a few
months ago and which we ended up implementing ourselves at the framework,
agent resource and custom executor level was support for NUMA topology
awareness, which could probably make sense to add to Mesos.
For more inspiration for new features, I think Nomad and Slurm might be
worth looking into as inspiration.

Also, simply continuing to address existing bug reports and MRs would I
think be a good starting point to try to revive the project, get potential
new contributors familiar with the code base, etc.
And merging some changes maintained in third party forks such as the ones
mentioned by the people from Criteo who also mentioned an interest in
keeping Mesos going forward.

People like Javi Roman seem to have some idea on potential new features -
those would be great to hear as well!

However the main barrier to any progress I can see right now - and which
was discussed in the previous thread - is that none of the current
maintainers seem to have any time to dedicate to the project, including
reviewing existing patches/MR. I still haven't seen any suggestion from the
current project members on ways to address that.
I know some people objected that electing new committers might not be
enough to revive the project - I most certainly agree it might very well
not be sufficient, however I think it is a necessary condition, unless some
of the current maintainers are willing to dedicate some time to onboarding
potential new contributors.

Cheers,

Charles





On Sun, 28 Feb 2021, 16:42 Jorge Machado,  wrote:

> Hi Samuel,
>
> To be honest, I would not invest any more Time on Mesos. The features from
> Kubernetes are just way better. :)
>
> > On 28. Feb 2021, at 12:54, Samuel Marks  wrote:
> >
> > Decouple Apache ZooKeeper, enabling Apache Mesos to run completely
> without
> > ZooKeeper. Specifically enable a choice between ZooKeeper, etcd, and
> consul.
> >
> > My organisation is somewhat interested in contributing this. We tried in
> > the past but came across some hurdles on the Mesos organisation end. Open
> > to trying again, but will need a clear pathway to getting this accepted.
> >
> > Samuel Marks
> > Charity  | consultancy <
> https://offscale.io>
> > | open-source  | LinkedIn
> > 
> >
> >
> > On Sun, Feb 28, 2021 at 7:39 PM Qian Zhang  wrote:
> >
> >> Hi Folks,
> >>
> >> To reboot this awesome project, I'd like to collect feature requests for
> >> Mesos. Please let us know your requirements for Mesos and whether you or
> >> your organization would like to contribute to the implementation of the
> >> requirements. Thanks!
> >>
> >>
> >> Regards,
> >> Qian Zhang
> >>
>
>


Re: Next Steps

2021-02-26 Thread Charles-François Natali
As mentioned before I'd also be happy to contribute.

Concretely, what's the next step to move this forward?



On Fri, 26 Feb 2021, 11:15 Thomas Langé,  wrote:

> Hello,
>
> I'm part of Criteo team as well, and as Grégoire said, we plan to support
> Mesos internally for some time. I would like to
> propose my help as well as a committer, and contribute as much as I can to
> this project.
>
> Br,
>
> Thomas
> --
> *From:* Grégoire Seux 
> *Sent:* Friday, 26 February 2021 11:12
> *To:* priv...@mesos.apache.org ; dev <
> dev@mesos.apache.org>; user 
> *Subject:* Re: Next Steps
>
> Hello all,
>
> here at Criteo, we heavily use Mesos and plan to do so for a foreseeable
> future alongside other alternatives.
> I am ok to become committer and help the project if you are looking for
> contributors.
> It seems finding committers will be doable but finding a PMC chair will be
> difficult.
>
> To give some context on our usage, Criteo is running 12 Mesos cluster
> running a light fork of Mesos 1.9.x.
> Each cluster has 10+ distinct marathons frameworks, a flink framework, an
> instance of Aurora and an in-house framework.
> We strongly appreciate the ability to scale the number of nodes (3500 on
> the largest cluster and growing), the simplicity of the project overall and
> the extensibility through modules.
>
> --
> Grégoire
>


Re: Next Steps

2021-02-18 Thread Charles-François Natali
Speaking as someone who contributed a few patches and would like to get
more involved, I find it a bit difficult to get MRs reviewed and merged.
I think it's probably because the current committers have other priorities
now that D2iQ focus has shifted, which is understandable but makes it
harder for outsiders to contribute.
Is there anything which could be done about that?

Cheers,



On Thu, 18 Feb 2021, 14:30 Qian Zhang,  wrote:

> Hi Vinod,
>
> I am still interested in the project. As other folks said, we need to have
> a direction for the project. I think there are still a lot of Mesos
> users/customers in the mail list, can you please send another mail to
> collect their requirements / pain points on Mesos, and then we can try to
> set up a roadmap for the project to move forward.
>
>
> Regards,
> Qian Zhang
>
>
> On Thu, Feb 18, 2021 at 9:16 PM Andrei Sekretenko 
> wrote:
>
>> IIUC, Attic is not intended for projects which still have active users
>> and thus might be in need of fixing bugs.
>>
>> Key items about moving project to Attic:
>> > It is not intended to:
>> > - Rebuild community
>> > - Make bugfixes
>> > - Make releases
>>
>> >Projects whose PMC are unable to muster 3 votes for a release, who have
>> no active committers or are unable to fulfill their reporting duties to the
>> board are all good candidates for the Attic.
>>
>> As a D2iQ employee, I can say that if we find a bug critical for our
>> customers, we will be interested in fixing that. Should the project be
>> moved into Attic, the fix will be present only in forks (which might
>> mean our internal forks).
>>
>> I could imagine that other entities and people using Mesos are in a
>> similar position with regards to bugfixes.
>> If this is true, then moving the project to Attic in the near future
>> is not a proper solution to the issue of insufficient bandwidth of the
>> active PMC members/chair.
>>
>> ---
>> A long-term future of the project is a different story, which, in my
>> personal view, will "end" either in moving the project into Attic or
>> in shifting the project direction from what it used to be in the
>> recent few years to something substantially different. IMO, this
>> requires a  _separate_ discussion.
>>
>> Damien's questions sound like a good starting point for that
>> discussion, I'll try to answer them from my committer/PMC member
>> perspective when I have enough time.
>>
>> On Thu, 18 Feb 2021 at 12:49, Charles-François Natali
>>  wrote:
>> >
>> > Thanks Tomek, that's what I suspected.
>> > It would therefore make it much more difficult for anyone to carry on
>> since it would effectively have to be a fork, etc.
>> > I think it'd be a bit of a shame, but I understand Benjamin's point.
>> > I hope it can be avoided.
>> >
>> >
>> > Cheers,
>> >
>> >
>> >
>> > On Thu, 18 Feb 2021, 11:02 Tomek Janiszewski, 
>> wrote:
>> >>
>> >> Moving to attic is making project read only
>> >> https://attic.apache.org/
>> >> https://attic.apache.org/projects/aurora.html
>> >>
>> >> czw., 18 lut 2021, 11:56 użytkownik Charles-François Natali <
>> cf.nat...@gmail.com> napisał:
>> >>>
>> >>> I'm not familiar with the attic but would it still allow to actually
>> >>> develop, make commits to the repository etc?
>> >>>
>> >>>
>> >>> On Thu, 18 Feb 2021, 08:27 Benjamin Bannier, 
>> wrote:
>> >>>
>> >>> > Hi Vinod,
>> >>> >
>> >>> > > I would like to start a discussion around the future of the Mesos
>> >>> > project.
>> >>> > >
>> >>> > > As you are probably aware, the number of active committers and
>> >>> > contributors
>> >>> > > to the project have declined significantly over time. As of today,
>> >>> > there's
>> >>> > > no active development of any features or a public release
>> planned. On the
>> >>> > > flip side, I do know there are a few companies who are still
>> actively
>> >>> > using
>> >>> > > Mesos.
>> >>> >
>> >>> > Thanks for starting this discussion Vinod. Looking at Slack, mailing
>> >>> > lists, JIRA and reviewboard/github the project has wound down a lot
>> in
>> >>> > 

Re: Next Steps

2021-02-18 Thread Charles-François Natali
Thanks Tomek, that's what I suspected.
It would therefore make it much more difficult for anyone to carry on since
it would effectively have to be a fork, etc.
I think it'd be a bit of a shame, but I understand Benjamin's point.
I hope it can be avoided.


Cheers,



On Thu, 18 Feb 2021, 11:02 Tomek Janiszewski,  wrote:

> Moving to attic is making project read only
> https://attic.apache.org/
> https://attic.apache.org/projects/aurora.html
>
> czw., 18 lut 2021, 11:56 użytkownik Charles-François Natali <
> cf.nat...@gmail.com> napisał:
>
>> I'm not familiar with the attic but would it still allow to actually
>> develop, make commits to the repository etc?
>>
>>
>> On Thu, 18 Feb 2021, 08:27 Benjamin Bannier,  wrote:
>>
>> > Hi Vinod,
>> >
>> > > I would like to start a discussion around the future of the Mesos
>> > project.
>> > >
>> > > As you are probably aware, the number of active committers and
>> > contributors
>> > > to the project have declined significantly over time. As of today,
>> > there's
>> > > no active development of any features or a public release planned. On
>> the
>> > > flip side, I do know there are a few companies who are still actively
>> > using
>> > > Mesos.
>> >
>> > Thanks for starting this discussion Vinod. Looking at Slack, mailing
>> > lists, JIRA and reviewboard/github the project has wound down a lot in
>> > the last 12+ months.
>> >
>> > > Given that, we need to assess if there's interest in the community to
>> > keep
>> > > this project moving forward. Specifically, we need some active
>> committers
>> > > and PMC members who are going to manage the project. Ideally, these
>> would
>> > > be people who are using Mesos in some capacity and can make code
>> > > contributions.
>> >
>> > While I have seen a few non-committer folks contribute patches in the
>> > last months, I feel it might be too late to bootstrap an active
>> > community at this point.
>> >
>> > Apache Mesos is still mentioned prominently in the docs of a number of
>> > other projects which gives off the impression of an active and
>> > maintained project. In reality almost nobody is working on issues or
>> > available to help users, and basing a new project on Apache Mesos these
>> > days is probably not a good idea. I honestly do not see that to change
>> > should new people step up and IMO the most honest way forward would be
>> > to move the project to the attic to clearly communicate that the project
>> > has moved into another phase; this wouldn't preclude folks from using or
>> > further developing Apache Mesos, but would give a clear signal to users.
>> >
>> > > If there is no active interest, we will likely need to figure out
>> steps
>> > for
>> > > retiring the project.
>> > >
>> > > *Call for action: If you are interested in becoming a committer/PMC
>> > member
>> > > (including PMC chair) and actively maintain the project, please reply
>> to
>> > > this email.*
>> >
>> > Like I wrote above, I would be in favor of a vote to move Apache Mesos
>> > to the attic.
>> >
>> >
>> > Cheers,
>> >
>> > Benjamin
>> >
>>
>


Re: Next Steps

2021-02-18 Thread Charles-François Natali
I'm not familiar with the attic but would it still allow to actually
develop, make commits to the repository etc?


On Thu, 18 Feb 2021, 08:27 Benjamin Bannier,  wrote:

> Hi Vinod,
>
> > I would like to start a discussion around the future of the Mesos
> project.
> >
> > As you are probably aware, the number of active committers and
> contributors
> > to the project have declined significantly over time. As of today,
> there's
> > no active development of any features or a public release planned. On the
> > flip side, I do know there are a few companies who are still actively
> using
> > Mesos.
>
> Thanks for starting this discussion Vinod. Looking at Slack, mailing
> lists, JIRA and reviewboard/github the project has wound down a lot in
> the last 12+ months.
>
> > Given that, we need to assess if there's interest in the community to
> keep
> > this project moving forward. Specifically, we need some active committers
> > and PMC members who are going to manage the project. Ideally, these would
> > be people who are using Mesos in some capacity and can make code
> > contributions.
>
> While I have seen a few non-committer folks contribute patches in the
> last months, I feel it might be too late to bootstrap an active
> community at this point.
>
> Apache Mesos is still mentioned prominently in the docs of a number of
> other projects which gives off the impression of an active and
> maintained project. In reality almost nobody is working on issues or
> available to help users, and basing a new project on Apache Mesos these
> days is probably not a good idea. I honestly do not see that to change
> should new people step up and IMO the most honest way forward would be
> to move the project to the attic to clearly communicate that the project
> has moved into another phase; this wouldn't preclude folks from using or
> further developing Apache Mesos, but would give a clear signal to users.
>
> > If there is no active interest, we will likely need to figure out steps
> for
> > retiring the project.
> >
> > *Call for action: If you are interested in becoming a committer/PMC
> member
> > (including PMC chair) and actively maintain the project, please reply to
> > this email.*
>
> Like I wrote above, I would be in favor of a vote to move Apache Mesos
> to the attic.
>
>
> Cheers,
>
> Benjamin
>


Re: Next Steps

2021-02-17 Thread Charles-François Natali
Hi,

We're using Mesos at work, and are very happy with it.
I'd be interested in becoming a committer. I could probably get some other
colleagues interested as well but from a diversification point of view it'd
probably be better if more individuals/organisations got involved.

Happy to discuss further,

Charles


Re: Subject: [RESULT][VOTE] Release Apache Mesos 1.10.0 (rc1)

2020-06-19 Thread Charles-François Natali
Hey,

The website doesn't seem to have been updated to point to the most recent
release.

Cheers,



On Thu, 28 May 2020, 17:11 Andrei Sekretenko,  wrote:

> Hi all,
>
> The vote for Mesos 1.10.0 (rc1) has passed with the
> following votes.
>
> +1 (Binding)
> --
> Vinod Kone
> Benjamin Mahler
> Qian Zhang
> Greg Mann
>
> There were no 0 or -1 votes.
>
> Please find the release at:
> https://dist.apache.org/repos/dist/release/mesos/1.10.0
>
> It is recommended to use a mirror to download the release:
> http://www.apache.org/dyn/closer.cgi
>
> The CHANGELOG for the release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.10.0
>
> The mesos-1.10.0.jar has been released to:
> https://repository.apache.org
>
> The website (http://mesos.apache.org) will be updated shortly to reflect
> this release.
>
> Thanks,
> Andrei Sekretenko
>


Re: 1.10 release is nearing - please check `Target Version`s in JIRA

2020-05-09 Thread Charles-François Natali
Perfect, thanks!

Cheers,

Charles


On Fri, 8 May 2020, 17:56 Andrei Sekretenko, 
wrote:

> Hi,
> sorry, I probably should have kept things more transparent, here are the
> recent updates:
>  - 1.10.x was branched off yesterday.  All the work that lands on master
> from now on goes into 1.11 unless backported into 1.10. The preliminary
> 1.10 changelog can be seen here:
> https://github.com/apache/mesos/blob/1.10.x/CHANGELOG
>  - I'm wrapping up the final checks before tagging 1.10.0-rc1 release
> candidate and initiating a release vote (hope that nothing comes up in the
> remaining checks).
>
> On Thu, May 7, 2020 at 10:28 PM Charles-François Natali <
> cf.nat...@gmail.com>
> wrote:
>
> > Hey,
> >
> > Sorry if I missed something, but I didn't see any follow up on the
> release
> > - is there an ETA?
> >
> > Cheers,
> >
> >
> > On Fri, 20 Mar 2020, 22:57 Andrei Sekretenko, 
> > wrote:
> >
> > > Hi all,
> > > as some of you probably know, Mesos 1.10 release is near; expect that
> > next
> > > week the release candidate will be prepared and voted.
> > >
> > > At this point, please make sure that the work that needs to be landed
> > into
> > > 1.10 release but has not landed in master branch yet, is labelled by
> > > TargetVersion = 1.10.0 in the corresponding ASF JIRA ticket.
> > > ---
> > > There seem to be unresolved issues in JIRA that had recent activity,
> but
> > > are not labelled with TargetVersion = 1.10.0.
> > > It might be the case that some of them actually need to get to 1.10,
> but
> > > are effectively orphaned.
> > >
> > > Manually browsing through all the project in a search for these "1.10
> > > orphans", so I attempted to filter two lists of issues that have an
> > > increased probability of containing the orphans:
> > > https://jira.apache.org/jira/issues/?filter=12348429
> > > https://jira.apache.org/jira/issues/?filter=12348426
> > >
> > > Most of these issues do not look like 1.10 blockers to me, but if you
> > find
> > > among them (or elsewhere!) some that absolutely need to block 1.10
> > release,
> > > please label them with a proper TargetVersion.
> > >
> > > Best,
> > > Andrei Sekretenko
> > >
> >
>
>
> --
> --
> Andrei Sekretenko
>


Re: 1.10 release is nearing - please check `Target Version`s in JIRA

2020-05-07 Thread Charles-François Natali
Hey,

Sorry if I missed something, but I didn't see any follow up on the release
- is there an ETA?

Cheers,


On Fri, 20 Mar 2020, 22:57 Andrei Sekretenko,  wrote:

> Hi all,
> as some of you probably know, Mesos 1.10 release is near; expect that next
> week the release candidate will be prepared and voted.
>
> At this point, please make sure that the work that needs to be landed into
> 1.10 release but has not landed in master branch yet, is labelled by
> TargetVersion = 1.10.0 in the corresponding ASF JIRA ticket.
> ---
> There seem to be unresolved issues in JIRA that had recent activity, but
> are not labelled with TargetVersion = 1.10.0.
> It might be the case that some of them actually need to get to 1.10, but
> are effectively orphaned.
>
> Manually browsing through all the project in a search for these "1.10
> orphans", so I attempted to filter two lists of issues that have an
> increased probability of containing the orphans:
> https://jira.apache.org/jira/issues/?filter=12348429
> https://jira.apache.org/jira/issues/?filter=12348426
>
> Most of these issues do not look like 1.10 blockers to me, but if you find
> among them (or elsewhere!) some that absolutely need to block 1.10 release,
> please label them with a proper TargetVersion.
>
> Best,
> Andrei Sekretenko
>


Re: how is the agent available memory computed/updated?

2020-05-03 Thread Charles-François Natali
Thanks!

Le ven. 1 mai 2020 à 00:11, Vinod Kone  a écrit :
>
> I commented on the JIRA.
>
> On Thu, Apr 30, 2020 at 3:02 PM Charles-François Natali 
> wrote:
>
> > Thanks Vinod.
> >
> > Yes, I understand that Mesos assumes it's the only process managing
> > resources, makes sense.
> > Looking at the code and testing shows the agent reports as available
> > memory the total memory of the host, minus 1GB (or half the total
> > memory if the total memory is below 2GB)
> > (
> > https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L152
> > ).
> > So basically it means that if assumes that the OS doesn't use more
> > than 1GB. I guess if it's not the case one can just specify the memory
> > manually to the agent, so that's fine.
> >
> > Actually the reason I was wondering about this is because we recently
> > had a problem where containers couldn't be destroyed because of tasks
> > stuck in uninterruptible (D) state, which caused the memory to be
> > basically leaked, i.e. the agent was advertising the memory free while
> > it was still being used by the stuck processes. We ran into a similar
> > issue with GPUs - it's a known issue
> > https://issues.apache.org/jira/browse/MESOS-8038 - I posted an
> > analysis and potential fix, it'd be great if someone could have a look
> > :).
> >
> > Cheers,
> >
> > Charles
> >
> > Le jeu. 30 avr. 2020 à 15:36, Vinod Kone  a écrit :
> > >
> > > Mesos assumes that it is the only process managing resources of a box
> > (cpu,
> > > mem, disk). So if you have out of band processes using up resources it
> > > won't be reflected in the resource offers and the box can be
> > overcommitted.
> > > There is no runtime periodic check of available resources, it's only
> > > calculated once at startup.
> > >
> > > Resource detection logic is here:
> > >
> > https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L65
> > >
> > > On Thu, Apr 30, 2020 at 8:17 AM Charles-François Natali <
> > cf.nat...@gmail.com>
> > > wrote:
> > >
> > > > Hi,
> > > >
> > > > Could someone point me to some code/documentation explaining how the
> > > > agent available memory is computed, and when it is refreshed?
> > > >
> > > > For example, if I have an agent started, with some outstanding offers,
> > > > and I then start a process - not as a task managed by Mesos, but as an
> > > > external process which just allocates a lot of memory - and touches
> > > > it, not just committed - I can see the machine available memory go
> > > > down (as reported by free, and MemAvailable in /proc/meminfo), but the
> > > > agent doesn't rescind any offer, and never seems to actually refresh
> > > > it - event after starting/stopping tasks.
> > > >
> > > > Cheers,
> > > >
> > > > Charles
> > > >
> >


Re: how is the agent available memory computed/updated?

2020-04-30 Thread Charles-François Natali
Thanks Vinod.

Yes, I understand that Mesos assumes it's the only process managing
resources, makes sense.
Looking at the code and testing shows the agent reports as available
memory the total memory of the host, minus 1GB (or half the total
memory if the total memory is below 2GB)
(https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L152).
So basically it means that if assumes that the OS doesn't use more
than 1GB. I guess if it's not the case one can just specify the memory
manually to the agent, so that's fine.

Actually the reason I was wondering about this is because we recently
had a problem where containers couldn't be destroyed because of tasks
stuck in uninterruptible (D) state, which caused the memory to be
basically leaked, i.e. the agent was advertising the memory free while
it was still being used by the stuck processes. We ran into a similar
issue with GPUs - it's a known issue
https://issues.apache.org/jira/browse/MESOS-8038 - I posted an
analysis and potential fix, it'd be great if someone could have a look
:).

Cheers,

Charles

Le jeu. 30 avr. 2020 à 15:36, Vinod Kone  a écrit :
>
> Mesos assumes that it is the only process managing resources of a box (cpu,
> mem, disk). So if you have out of band processes using up resources it
> won't be reflected in the resource offers and the box can be overcommitted.
> There is no runtime periodic check of available resources, it's only
> calculated once at startup.
>
> Resource detection logic is here:
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L65
>
> On Thu, Apr 30, 2020 at 8:17 AM Charles-François Natali 
> wrote:
>
> > Hi,
> >
> > Could someone point me to some code/documentation explaining how the
> > agent available memory is computed, and when it is refreshed?
> >
> > For example, if I have an agent started, with some outstanding offers,
> > and I then start a process - not as a task managed by Mesos, but as an
> > external process which just allocates a lot of memory - and touches
> > it, not just committed - I can see the machine available memory go
> > down (as reported by free, and MemAvailable in /proc/meminfo), but the
> > agent doesn't rescind any offer, and never seems to actually refresh
> > it - event after starting/stopping tasks.
> >
> > Cheers,
> >
> > Charles
> >


how is the agent available memory computed/updated?

2020-04-30 Thread Charles-François Natali
Hi,

Could someone point me to some code/documentation explaining how the
agent available memory is computed, and when it is refreshed?

For example, if I have an agent started, with some outstanding offers,
and I then start a process - not as a task managed by Mesos, but as an
external process which just allocates a lot of memory - and touches
it, not just committed - I can see the machine available memory go
down (as reported by free, and MemAvailable in /proc/meminfo), but the
agent doesn't rescind any offer, and never seems to actually refresh
it - event after starting/stopping tasks.

Cheers,

Charles


Re: Scheduler driver doesn't detect loss of connection to the master without zookeeper

2019-12-30 Thread Charles-François Natali
Perfect, thanks!



On Mon, 30 Dec 2019, 13:42 Vinod Kone,  wrote:

> In latest versions of mesos that is handled via heartbeats.
>
> Thanks,
> Vinod
>
> > On Dec 30, 2019, at 4:37 AM, Charles-François Natali <
> cf.nat...@gmail.com> wrote:
> >
> > Thanks.
> >
> > That's what I thought. The problem though is that it is probably possible
> > that the zookeeper detector doesn't detect the failure while the
> connection
> > to the master fails. One way this could happen would be for example
> because
> > of a firewall causing the TCP connection from the framework to the master
> > to fail, while the zookeeper connections (from master to zk and framework
> > to zk) still work. Unlikely but possible I think. Having the driver
> detect
> > and fail upon EOF/socket error would guard against that.
> >
> >
> >
> >
> >
> >> On Thu, 26 Dec 2019, 18:07 Vinod Kone,  wrote:
> >>
> >> IIRC, the standalone master detector (the detector that's used when
> using a
> >> local ip address of the master and not zk) doesn't re-detect when master
> >> process restarts. It's a limitation of that detector since it's mainly
> used
> >> for testing purposes and not recommended for production use. For
> >> production, please use zookeeper master detector (this detector is used
> >> when using zookeeper).
> >>
> >> On Fri, Dec 20, 2019 at 5:11 AM Charles-François Natali <
> >> cf.nat...@gmail.com>
> >> wrote:
> >>
> >>> Hi,
> >>>
> >>> It seems that the C++ scheduler driver doesn't detect loss of the
> >>> connection to the master when not using zookeeper.
> >>>
> >>> A simple way to reproduce this is to start a server passing it e.g.
> >>> "--ip=127.0.0.1", start the scheduler driver passing it "
> 127.0.0.1:5050
> >> ",
> >>> and then send a SIGKILL to the master. The scheduler logs the
> following:
> >>>
> >>>
> >>> I1220 10:56:11.679347 10635 process.cpp:2928] Resuming
> >>> __reaper__(1)@192.168.65.76:34345 at 2019-12-20
> >>> 10:56:11.679366144+00:00
> >>> I1220 10:56:11.679392 10635 clock.cpp:279] Created a timer for
> >>> __reaper__(1)@192.168.65.76:34345 in 100ms in the future (2019-12-20
> >>> 10:56:11.779389952+00:00)
> >>> I1220 10:56:11.690646 10631 process.cpp:2928] Resuming
> >>> scheduler-6a93a8e3-5a8f-4195-bde2-718b5832d317@192.168.65.76:34345 at
> >>> 2019-12-20 10:56:11.690665984+00:00
> >>> I1220 10:56:11.690775 10632 process.cpp:2928] Resuming
> >>> __http__(1)@192.168.65.76:34345 at 2019-12-20 10:56:11.690784000+00:00
> >>> I1220 10:56:11.690806 10632 process.cpp:3088] Cleaning up
> >>> __http__(1)@192.168.65.76:34345
> >>> I1220 10:56:11.690914 10632 process.cpp:2928] Resuming
> >>> help@192.168.65.76:34345 at 2019-12-20 10:56:11.690921984+00:00
> >>>
> >>> An strace confirms that the process receives EOF when reading from the
> >>> socket, but Scheduler::disconnected isn't called.
> >>> It's that expected?
> >>>
> >>> Or is it assumed that the scheduler relies on zookeeper for detection?
> >>>
> >>> Cheers,
> >>>
> >>> Charles
> >>>
> >>
>


Re: Scheduler driver doesn't detect loss of connection to the master without zookeeper

2019-12-30 Thread Charles-François Natali
Thanks.

That's what I thought. The problem though is that it is probably possible
that the zookeeper detector doesn't detect the failure while the connection
to the master fails. One way this could happen would be for example because
of a firewall causing the TCP connection from the framework to the master
to fail, while the zookeeper connections (from master to zk and framework
to zk) still work. Unlikely but possible I think. Having the driver detect
and fail upon EOF/socket error would guard against that.





On Thu, 26 Dec 2019, 18:07 Vinod Kone,  wrote:

> IIRC, the standalone master detector (the detector that's used when using a
> local ip address of the master and not zk) doesn't re-detect when master
> process restarts. It's a limitation of that detector since it's mainly used
> for testing purposes and not recommended for production use. For
> production, please use zookeeper master detector (this detector is used
> when using zookeeper).
>
> On Fri, Dec 20, 2019 at 5:11 AM Charles-François Natali <
> cf.nat...@gmail.com>
> wrote:
>
> > Hi,
> >
> > It seems that the C++ scheduler driver doesn't detect loss of the
> > connection to the master when not using zookeeper.
> >
> > A simple way to reproduce this is to start a server passing it e.g.
> > "--ip=127.0.0.1", start the scheduler driver passing it "127.0.0.1:5050
> ",
> > and then send a SIGKILL to the master. The scheduler logs the following:
> >
> >
> > I1220 10:56:11.679347 10635 process.cpp:2928] Resuming
> > __reaper__(1)@192.168.65.76:34345 at 2019-12-20
> > 10:56:11.679366144+00:00
> > I1220 10:56:11.679392 10635 clock.cpp:279] Created a timer for
> > __reaper__(1)@192.168.65.76:34345 in 100ms in the future (2019-12-20
> > 10:56:11.779389952+00:00)
> > I1220 10:56:11.690646 10631 process.cpp:2928] Resuming
> > scheduler-6a93a8e3-5a8f-4195-bde2-718b5832d317@192.168.65.76:34345 at
> > 2019-12-20 10:56:11.690665984+00:00
> > I1220 10:56:11.690775 10632 process.cpp:2928] Resuming
> > __http__(1)@192.168.65.76:34345 at 2019-12-20 10:56:11.690784000+00:00
> > I1220 10:56:11.690806 10632 process.cpp:3088] Cleaning up
> > __http__(1)@192.168.65.76:34345
> > I1220 10:56:11.690914 10632 process.cpp:2928] Resuming
> > help@192.168.65.76:34345 at 2019-12-20 10:56:11.690921984+00:00
> >
> > An strace confirms that the process receives EOF when reading from the
> > socket, but Scheduler::disconnected isn't called.
> > It's that expected?
> >
> > Or is it assumed that the scheduler relies on zookeeper for detection?
> >
> > Cheers,
> >
> > Charles
> >
>


Scheduler driver doesn't detect loss of connection to the master without zookeeper

2019-12-20 Thread Charles-François Natali
Hi,

It seems that the C++ scheduler driver doesn't detect loss of the
connection to the master when not using zookeeper.

A simple way to reproduce this is to start a server passing it e.g.
"--ip=127.0.0.1", start the scheduler driver passing it "127.0.0.1:5050",
and then send a SIGKILL to the master. The scheduler logs the following:


I1220 10:56:11.679347 10635 process.cpp:2928] Resuming
__reaper__(1)@192.168.65.76:34345 at 2019-12-20
10:56:11.679366144+00:00
I1220 10:56:11.679392 10635 clock.cpp:279] Created a timer for
__reaper__(1)@192.168.65.76:34345 in 100ms in the future (2019-12-20
10:56:11.779389952+00:00)
I1220 10:56:11.690646 10631 process.cpp:2928] Resuming
scheduler-6a93a8e3-5a8f-4195-bde2-718b5832d317@192.168.65.76:34345 at
2019-12-20 10:56:11.690665984+00:00
I1220 10:56:11.690775 10632 process.cpp:2928] Resuming
__http__(1)@192.168.65.76:34345 at 2019-12-20 10:56:11.690784000+00:00
I1220 10:56:11.690806 10632 process.cpp:3088] Cleaning up
__http__(1)@192.168.65.76:34345
I1220 10:56:11.690914 10632 process.cpp:2928] Resuming
help@192.168.65.76:34345 at 2019-12-20 10:56:11.690921984+00:00

An strace confirms that the process receives EOF when reading from the
socket, but Scheduler::disconnected isn't called.
It's that expected?

Or is it assumed that the scheduler relies on zookeeper for detection?

Cheers,

Charles


[MESOS-10007] random "Failed to get exit status for Command" for short-lived commands

2019-10-19 Thread Charles-François Natali
Hi,

I'm wondering if there's anything I could do to help
https://issues.apache.org/jira/browse/MESOS-10007 move forward?

Basically it's a race condition in libprocess/command executor causing
spurious errors to be reported for short-lived tasks.
I've got a detailed code path of the race and a repro, however I'm not
sure what's the best way to fix it - any suggestion?

Cheers,

Charles