New PMC Chair

2021-04-29 Thread Vinod Kone
Hi community,

Just wanted to let you all know that the board passed the resolution to
elect a new PMC chair!

Hearty congratulations to *Qian Zhang* for becoming the new Apache Mesos
PMC chair and VP of the project.

Thanks,


Re: [VOTE] Move Apache Mesos to Attic

2021-04-14 Thread Vinod Kone
To circle back here, the vote to install a new PMC chair has passed and the
request for the same is sent to the ASF Board. It will become official once
the board meets and passes the resolution. I am expecting this to happen
around the end of the month.

On Fri, Apr 9, 2021 at 9:33 AM Andreas Peters  wrote:

> Thanks Vinod. :-)
>
>
>
>
> Am 09.04.21 um 01:55 schrieb Vinod Kone:
> > Hi folks,
> >
> > Thanks for further responses and strong interest from a few folks in
> > keeping the project going!
> >
> > I also had a chance to talk to some ASF members and it sounds like the
> ASF
> > preferred option forward in this situation is to keep the project going
> in
> > ASF.
> >
> > Given all the above, I'm cancelling this vote thread.
> >
> > I will start another thread to elect a new PMC chair and let them handle
> > adding new PMC members / committers.
> >
> > Thanks,
> >
> >
> > On Thu, Apr 8, 2021 at 4:30 PM Samuel Marks  wrote:
> >
> >> In reference to new responses:
> >> 0. If you need a 501c3 equivalent to hold the copyright, I run a charity
> >> concentrating on facilitating large-scale screening programmes and
> intend
> >> to use Mesos to handle multiple clusters on the same set of servers (
> >> https://sydneyscientific.org), and would be happy to be that
> placeholder
> >> and potentially fund further R
> >> 1. On the moving CI side I remember years ago adding an issue about OOM
> for
> >> building it on a small AWS instance and it being closed as wontfix.
> >> Hopefully time is spent on refactoring and modularity to get over this,
> >> making moving to anyones CI viable
> >> 2. The Apache brand alone makes Mesos more attractive; so would hate to
> >> lose it
> >>
> >> Just my 2¢
> >>
> >> Samuel Marks
> >> Charity | consultancy | open-source | LinkedIn
> >>
> >> On Fri, 9 Apr 2021, 5:02 am Andreas Peters, 
> >> wrote:
> >>
> >>>> Hopefully the other people who said they
> >>>> were interested, like the guys from Criteo, Andreas, Javi, etc would
> >>>> help too.
> >>>
> >>> I will and I'm sure the others too. I already try to help with the
> >>> tickets in jira. Thanks to Benjamin Mahler who told me that it's ok if
> I
> >>> do it.
> >>>
> >>> Andreas
> >>>
> >>>
> >>
> >
>
>


Re: [VOTE] Move Apache Mesos to Attic

2021-04-08 Thread Vinod Kone
Hi folks,

Thanks for further responses and strong interest from a few folks in
keeping the project going!

I also had a chance to talk to some ASF members and it sounds like the ASF
preferred option forward in this situation is to keep the project going in
ASF.

Given all the above, I'm cancelling this vote thread.

I will start another thread to elect a new PMC chair and let them handle
adding new PMC members / committers.

Thanks,


On Thu, Apr 8, 2021 at 4:30 PM Samuel Marks  wrote:

> In reference to new responses:
> 0. If you need a 501c3 equivalent to hold the copyright, I run a charity
> concentrating on facilitating large-scale screening programmes and intend
> to use Mesos to handle multiple clusters on the same set of servers (
> https://sydneyscientific.org), and would be happy to be that placeholder
> and potentially fund further R
> 1. On the moving CI side I remember years ago adding an issue about OOM for
> building it on a small AWS instance and it being closed as wontfix.
> Hopefully time is spent on refactoring and modularity to get over this,
> making moving to anyones CI viable
> 2. The Apache brand alone makes Mesos more attractive; so would hate to
> lose it
>
> Just my 2¢
>
> Samuel Marks
> Charity | consultancy | open-source | LinkedIn
>
> On Fri, 9 Apr 2021, 5:02 am Andreas Peters, 
> wrote:
>
> > > Hopefully the other people who said they
> > > were interested, like the guys from Criteo, Andreas, Javi, etc would
> > > help too.
> >
> > I will and I'm sure the others too. I already try to help with the
> > tickets in jira. Thanks to Benjamin Mahler who told me that it's ok if I
> > do it.
> >
> > Andreas
> >
> >
>


Re: [VOTE] Move Apache Mesos to Attic

2021-04-06 Thread Vinod Kone
Hi Rich,

Thanks for chiming in and providing your perspective.

Charles already did a great job summarizing some of the current sentiments
in the community above. I wanted to add a couple more points based on my
discussions with folks in the community and PMC.

Yes, there are some folks who are still interested in making some (minor)
contributions but for that they just need a single repo to collaborate on.
ASF has been a great home and steward for the Mesos project, but at this
stage in its lifecycle, Mesos project could actually benefit from an ultra
lightweight process and collaboration model. A public GitHub repo with
requisite permissions for collaborators would serve these purposes well
compared to the ASF process (PMC, voting, board reports etc).

As an aside, would the ASF Board have any issue with the community forking
the project and collaborating at https://github.com/mesos/mesos ?

Thanks,
Vinod

On Tue, Apr 6, 2021 at 9:01 PM Samuel Marks  wrote:

> Who runs this one? https://github.com/mesos
>
> Samuel Marks
> Charity  | consultancy 
> | open-source  | LinkedIn
> 
>
>
> On Wed, Apr 7, 2021 at 11:42 AM Charles-François Natali <
> cf.nat...@gmail.com>
> wrote:
>
> > Hi Rich,
> >
> > FWIW, I'm one of those people who said they were interested, and I
> > still voted to move it to the attic (even though my vote is non
> > binding as I'm not a committer).
> >
> > Initially I also thought that we could try to revive it within the
> > ASF, but it quickly became clear that *none* of the current committers
> > is willing to go down that route, i.e. put in the effort needed to
> > onboard new committers. And without that, there's just no way forward.
> > Various people voiced other concerns as well, such as viability of the
> > project when other alternatives like Kubernetes exist, lack of clear
> > technical direction for the future, etc.
> > While they're relevant questions, I think currently they don't really
> > make sense since the current Mesos community is basically dead.
> > Finally, I think that the project should be moved to the Attic de
> > facto because AFAICT the Apache rules require at least 3 *active*
> > committers, and that's definitely not the case.
> >
> > However I still do believe in the project for the reasons I outlined
> > in some of the previous threads, and I'm still interested in
> > contributing: I just think that the current structure of the project
> > is not suited for that anymore. And to be honest, I just want to move
> > on, I'm tired of those endless discussions - it's been almost 2 months
> > since the first thread stared, and nothing happened.
> >
> > It's a shame that we won't be able to continue using
> > https://github.com/apache/mesos though, as it creates a much higher
> > barrier to continuing the project.
> >
> > However if that's really not possible, then I guess that leaves no
> > other option: once the vote has passed, I guess I'll start a final
> > thread to gather people who'd be interested to create a new project
> > forked off master on github, so we can start from scratch with our own
> > repository, bug tracker etc. I hope those people who said they're
> > actually interested will be willing to take an active part.
> >
> > Cheers,
> >
> > Charles
> >
> >
> >
> >
> >
> > Le mer. 7 avr. 2021 à 02:50, Rich Bowen  a écrit :
> > >
> > > I hope y'all can forgive me for sticking my nose in, as a concerned
> > member. Color me confused by this vote.
> > >
> > > A month ago, on this same list -
> >
> https://lists.apache.org/thread.html/r307db648e201182fcf39b0de63ba224b94965501e20e6cbcecc085e4%40%3Cdev.mesos.apache.org%3E
> > - Qian asked who was still interested in keeping the project going. SIX
> > people responded that, given the chance, they'd step up and keep it
> going.
> > >
> > > Around that same time -
> >
> https://lists.apache.org/thread.html/raed89cc5ab78531c48f56aa1989e1e7eb05f89a6941e38e9bc8803ff%40%3Cdev.mesos.apache.org%3E
> > - Vinod observed that the too-high barrier to granting committer rights
> has
> > been a major factor in the slowdown of the project.
> > >
> > > And yet, y'all are voting to attic the project.
> > >
> > > So, again, it's not my project, and I don't have a vote here, but the
> > reason the Board asks projects to have these attic conversations on the
> Dev
> > list is *specifically* so that interested people can say, hey, don't
> attic
> > it, we'll take it from here. Which six people, plus Qian, have done.
> > >
> > > Maybe it's time to lower the barrier to entry, and let these willing
> > people take the project forward, do so. The Board can work out the picky
> > little details of re-forming the PMC, if that's a difficulty.
> > >
> > >
> >
>


[VOTE] Move Apache Mesos to Attic

2021-04-05 Thread Vinod Kone
Hi folks,

Based on the recent conversations

on our mailing list, it seems to me that the majority consensus among the
existing PMC is to move the project to the attic 
and let the interested community members collaborate on a fork in Github.

I would like to call a vote to dissolve the PMC and move the project to the
attic.

Please reply to this thread with your vote. Only binding votes from
PMC/committers count towards the final tally but everyone in the community
is encouraged to vote. See process here
.

Thanks,


Re: Next Steps

2021-03-15 Thread Vinod Kone
>
>
> How many man hours where spend on mesos in 2020, 2019 and 2018?
>
>
Roughly 5-6 ppl (in 2020),10-11 (in 2019), 16-18 (in 2018)


Re: Next Steps

2021-03-15 Thread Vinod Kone
Hi folks,

Sorry for the radio silence on my part for the last couple weeks. My Apache
emails were not getting delivered to my inbox due to some filter mixup on
my end. Sorry about that.

I've read through the various threads and here's how I summarize the
situation. We basically have 2 camps

*Attic:*
Most existing PMC members who have chimed in so far seemed to be in favor
of moving the project to Attic. The exception is Qian (who is willing to
step up to be the new PMC chair, thanks Qian!). The main argument for this
seems to be that it'll be hard to re-activate the project at this juncture
with new PMC members / committers. Also that it signals the current state
of the project more accurately.

*Re-activate:*
There are some active users in the community who would like to see this
project stay alive and are even willing to step up to become committers /
contributors. Some of these users are working for companies who are using
Mesos in production. They would like to know potential new roadmap (there
is a separate thread going on for this) and manpower needed (my take is 6-8
ppl to cover different areas of the project).

*My take:*

In addition to the public threads, we've had a thread on our private
mailing list to see which of the current committers are interested in being
active. And so far that thread has gotten *0* responses. This is
unfortunate because, except for Qian no existing committer/PMC members are
willing or able to contribute or mentor new contributors.

Additionally, the current guidelines
 we have for
adding new committers is a pretty high bar and I don't think any of the
current contributors would be immediately eligible to be voted in as
committers. This means we either need to change the guidelines or we should
have some existing committers mentor some of the contributors into
committers. Given the lack of commitment from most of the existing PMC,
this will fall solely on Qian's shoulders which is quite a burden.

Since the existing committers are unable or unwilling to mentor new
contributors into new committers, I think moving the project to attic is
the right move. If there is no objection to this, I'm happy to call a vote
for this.

We could still explore the possibility of activating "
https://github.com/mesos/mesos; as the one true fork outside of ASF so that
the interested parties can still contribute and collaborate. And if the
project continues to thrive here, we can reach back out to ASF to
re-activate the project, down the line.

Thanks,


On Sat, Feb 27, 2021 at 7:45 AM Damien GERARD  wrote:

> On 2021-02-26 09:05 PM, Charles-François Natali wrote:
> > As mentioned before I'd also be happy to contribute.
> >
> > Concretely, what's the next step to move this forward?
> >
> > On Fri, 26 Feb 2021, 11:15 Thomas Langé,  wrote:
> >
> >> Hello,
> >>
> >> I'm part of Criteo team as well, and as Grégoire said, we plan to
> >> support Mesos internally for some time. I would like to
> >> propose my help as well as a committer, and contribute as much as I
> >> can to this project.
>
> At Rakuten we also have a couple of clusters. As also mentionned before,
> happy to contribute.
> But yeah, need a plan of action :p
>
>
> >>
> >> Br,
> >>
> >> Thomas
> >>
> >> -
> >>
> >> From: Grégoire Seux 
> >> Sent: Friday, 26 February 2021 11:12
> >> To: priv...@mesos.apache.org ; dev
> >> ; user 
> >> Subject: Re: Next Steps
> >>
> >> Hello all,
> >>
> >> here at Criteo, we heavily use Mesos and plan to do so for a
> >> foreseeable future alongside other alternatives.
> >> I am ok to become committer and help the project if you are looking
> >> for contributors.
> >> It seems finding committers will be doable but finding a PMC chair
> >> will be difficult.
> >>
> >> To give some context on our usage, Criteo is running 12 Mesos
> >> cluster running a light fork of Mesos 1.9.x.
> >> Each cluster has 10+ distinct marathons frameworks, a flink
> >> framework, an instance of Aurora and an in-house framework.
> >> We strongly appreciate the ability to scale the number of nodes
> >> (3500 on the largest cluster and growing), the simplicity of the
> >> project overall and the extensibility through modules.
> >>
> >> --
> >>
> >> Grégoire
>
> --
> Damien GERARD
>


Re: Next Steps

2021-02-18 Thread Vinod Kone
Good to see some interest in helping with project maintenance. 

Qian can you start a new email about figuring out the roadmap for the project?

Thanks,
Vinod

> On Feb 18, 2021, at 11:18 AM, Charles-François Natali  
> wrote:
> 
> Speaking as someone who contributed a few patches and would like to get
> more involved, I find it a bit difficult to get MRs reviewed and merged.
> I think it's probably because the current committers have other priorities
> now that D2iQ focus has shifted, which is understandable but makes it
> harder for outsiders to contribute.
> Is there anything which could be done about that?
> 
> Cheers,
> 
> 
> 
>> On Thu, 18 Feb 2021, 14:30 Qian Zhang,  wrote:
>> 
>> Hi Vinod,
>> 
>> I am still interested in the project. As other folks said, we need to have
>> a direction for the project. I think there are still a lot of Mesos
>> users/customers in the mail list, can you please send another mail to
>> collect their requirements / pain points on Mesos, and then we can try to
>> set up a roadmap for the project to move forward.
>> 
>> 
>> Regards,
>> Qian Zhang
>> 
>> 
>> On Thu, Feb 18, 2021 at 9:16 PM Andrei Sekretenko 
>> wrote:
>> 
>>> IIUC, Attic is not intended for projects which still have active users
>>> and thus might be in need of fixing bugs.
>>> 
>>> Key items about moving project to Attic:
 It is not intended to:
 - Rebuild community
 - Make bugfixes
 - Make releases
>>> 
 Projects whose PMC are unable to muster 3 votes for a release, who have
>>> no active committers or are unable to fulfill their reporting duties to the
>>> board are all good candidates for the Attic.
>>> 
>>> As a D2iQ employee, I can say that if we find a bug critical for our
>>> customers, we will be interested in fixing that. Should the project be
>>> moved into Attic, the fix will be present only in forks (which might
>>> mean our internal forks).
>>> 
>>> I could imagine that other entities and people using Mesos are in a
>>> similar position with regards to bugfixes.
>>> If this is true, then moving the project to Attic in the near future
>>> is not a proper solution to the issue of insufficient bandwidth of the
>>> active PMC members/chair.
>>> 
>>> ---
>>> A long-term future of the project is a different story, which, in my
>>> personal view, will "end" either in moving the project into Attic or
>>> in shifting the project direction from what it used to be in the
>>> recent few years to something substantially different. IMO, this
>>> requires a  _separate_ discussion.
>>> 
>>> Damien's questions sound like a good starting point for that
>>> discussion, I'll try to answer them from my committer/PMC member
>>> perspective when I have enough time.
>>> 
>>> On Thu, 18 Feb 2021 at 12:49, Charles-François Natali
>>>  wrote:
 
 Thanks Tomek, that's what I suspected.
 It would therefore make it much more difficult for anyone to carry on
>>> since it would effectively have to be a fork, etc.
 I think it'd be a bit of a shame, but I understand Benjamin's point.
 I hope it can be avoided.
 
 
 Cheers,
 
 
 
 On Thu, 18 Feb 2021, 11:02 Tomek Janiszewski, 
>>> wrote:
> 
> Moving to attic is making project read only
> https://attic.apache.org/
> https://attic.apache.org/projects/aurora.html
> 
> czw., 18 lut 2021, 11:56 użytkownik Charles-François Natali <
>>> cf.nat...@gmail.com> napisał:
>> 
>> I'm not familiar with the attic but would it still allow to actually
>> develop, make commits to the repository etc?
>> 
>> 
>> On Thu, 18 Feb 2021, 08:27 Benjamin Bannier, 
>>> wrote:
>> 
>>> Hi Vinod,
>>> 
 I would like to start a discussion around the future of the Mesos
>>> project.
 
 As you are probably aware, the number of active committers and
>>> contributors
 to the project have declined significantly over time. As of today,
>>> there's
 no active development of any features or a public release
>>> planned. On the
 flip side, I do know there are a few companies who are still
>>> actively
>>> using
 Mesos.
>>> 
>>> Thanks for starting this discussion Vinod. Looking at Slack, mailing
>>> lists, JIRA and reviewboard/github the project has wound down a lot
>>> in
>>> the last 12+ months.
>>> 
 Given that, we need to assess if there's interest in the
>>> community to
>>> keep
 this project moving forward. Specifically, we need some active
>>> committers
 and PMC members who are going to manage the project. Ideally,
>>> these would
 be people who are using Mesos in some capacity and can make code
 contributions.
>>> 
>>> While I have seen a few non-committer folks contribute patches in
>>> the
>>> last months, I feel it might be too late to bootstrap an active
>>> community at this point.
>>> 
>>> Apache Mesos is still mentioned 

Next Steps

2021-02-17 Thread Vinod Kone
Hi folks,

I would like to start a discussion around the future of the Mesos project.

As you are probably aware, the number of active committers and contributors
to the project have declined significantly over time. As of today, there's
no active development of any features or a public release planned. On the
flip side, I do know there are a few companies who are still actively using
Mesos.

Given that, we need to assess if there's interest in the community to keep
this project moving forward. Specifically, we need some active committers
and PMC members who are going to manage the project. Ideally, these would
be people who are using Mesos in some capacity and can make code
contributions.

If there is no active interest, we will likely need to figure out steps for
retiring the project.

*Call for action: If you are interested in becoming a committer/PMC member
(including PMC chair) and actively maintain the project, please reply to
this email.*

I personally don't foresee myself being very active in the Mesos project
going forward, so I'm planning to step down from my chair role as soon as
we find a replacement.

Thanks,
Vinod


Re: Subject: [VOTE] Release Apache Mesos 1.11.0 (rc1)

2020-11-17 Thread Vinod Kone
+1 (binding)

We have this version running in an internal cluster without any issues.

On Tue, Nov 17, 2020 at 8:53 AM Andrei Sekretenko 
wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.11.0.
>
> 1.11.0 includes the following:
>
> 
>   * CSI external volumes support: now, Mesos Containerizer supports using
> pre-provisioned external CSI storage volumes by means of the new
> `volume/csi`
> isolator. Also, the latter significantly extends the range of
> compatible 3rd party
>CSI plugins compared to the previous SLRP-based solution
> (MESOS-10141).
>
>   * Constraints-based offer filtering: the Scheduler API adds an
> interface allowing
> frameworks to put constraints  on agent attributes in resource
> offers to help "picky"
> frameworks significantly reduce scheduling latency when close to
> being out of quota
> (MESOS-10161).
>
>   * CMake build becomes usable for deploying in production (MESOS-898).
>
> The CHANGELOG for the release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.11.0-rc1
>
> 
>
> The candidate for Mesos 1.11.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.11.0-rc1/mesos-1.11.0.tar.gz
>
> The tag to be voted on is 1.11.0-rc1:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.11.0-rc1
>
> The SHA512 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.11.0-rc1/mesos-1.11.0.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.11.0-rc1/mesos-1.11.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1260
>
> Please vote on releasing this package as Apache Mesos 1.11.0!
>
> The vote is open until 2020 Nov 20th 15:00 UTC at least, and passes if
> a majority of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.11.0
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Andrei Sekretenko
>


Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-07-06 Thread Vinod Kone
on.
> >> >> > >-
> >> >> > >
> >> >> > >The semantics of the “erase” operation: ZooKeeper fails with
> >> >> ZNOTEMPTY
> >> >> > >if node has children, while liboffkv removes the subtree
> >> >> recursively.
> >> >> > As
> >> >> > >neither of users ever attempts to remove node with children,
> we
> >> >> > propose
> >> >> > > to
> >> >> > >change the interface so that it declares (and actually
> >> implements)
> >> >> the
> >> >> > >liboffkv-compatible semantics.
> >> >> > >-
> >> >> > >
> >> >> > >Return of ZooKeeper-specific Stat structures instead of just
> >> >> versions.
> >> >> > >As both users only use the version field of this structure, we
> >> >> propose
> >> >> > > to
> >> >> > >simply alter the interface so that only the version is
> returned.
> >> >> > >-
> >> >> > >
> >> >> > >Explicit “session drop” operation that also immediately erases
> >> all
> >> >> the
> >> >> > >“leased” nodes. We propose to implement this in liboffkv.
> >> >> > >-
> >> >> > >
> >> >> > >Check if the node being created has leased parent. Currently,
> >> >> liboffkv
> >> >> > >declares this to be unspecified behavior: it may either throw
> >> (if
> >> >> > > ZooKeeper
> >> >> > >is used as the back-end) or successfully create the node
> >> >> (otherwise).
> >> >> > As
> >> >> > >neither of users ever attempts to create such a node, we
> >> propose to
> >> >> > > leave
> >> >> > >this as is.
> >> >> > >
> >> >> > > Estimates
> >> >> > > We estimate that—including tests—this will be ready by the end of
> >> next
> >> >> > > month.
> >> >> > > --
> >> >> > >
> >> >> > > Open to alternative suggestions, otherwise we'll begin.
> >> >> > > Samuel Marks
> >> >> > > Charity <https://sydneyscientific.org> | consultancy <
> >> >> > https://offscale.io>
> >> >> > > | open-source <https://github.com/offscale> | LinkedIn
> >> >> > > <https://linkedin.com/in/samuelmarks>
> >> >> > >
> >> >> > >
> >> >> > > On Sat, May 2, 2020 at 4:04 AM Benjamin Mahler <
> bmah...@apache.org
> >> >
> >> >> > wrote:
> >> >> > >
> >> >> > > > So it sounds like:
> >> >> > > >
> >> >> > > > Zookeeper: Official C library has an async API. Are we gaining
> a
> >> lot
> >> >> > with
> >> >> > > > the third party C++ wrapper you pointed to? Maybe it "just
> >> works",
> >> >> but
> >> >> > it
> >> >> > > > looks very inactive and it's hard to tell how maintained it is.
> >> >> > > >
> >> >> > > > Consul: No official C or C++ library. Only some third party C++
> >> ones
> >> >> > that
> >> >> > > > look pretty inactive. The ppconsul one you linked to does have
> an
> >> >> issue
> >> >> > > > about an async API, I commented on it:
> >> >> > > > https://github.com/oliora/ppconsul/issues/26.
> >> >> > > >
> >> >> > > > etcd: Can use gRPC c++ client async API.
> >> >> > > >
> >> >> > > > Since 2 of 3 provide an async API already, I would lean more
> >> >> towards an
> >> >> > > > async API so that we don't have to change anything with the
> mesos
> >> >> code
> >> >> > > when
> >> >> > > > the last one gets an async implementation. However,  we
> currently
> >> >> use
> >> >> > the
> >> >> > > > synchronous ZK API 

Re: Subject: [VOTE] Release Apache Mesos 1.10.0 (rc1)

2020-05-26 Thread Vinod Kone
+1 (binding)

Thanks for looking into it. Lets fix this in a point release.

On Tue, May 26, 2020 at 8:31 AM Andrei Sekretenko 
wrote:

> Thanks for checking this!
>
> The first one (centos, non-SSL, gcc, autotools) seems to be a race between
> several instances of `javah` attempting to check for existence and create
> the output directory.
> I believe there were no related changes in 1.10.x compared to 1.9.x.
>
> The second one (ubuntu, SSL, clang, autotools) is somewhat tricky.
> The immediate cause of the failure seems to be an attempt to compile
> src/tests/http_tests.proto with a not yet built protoc.
> src/tests/http_tests.proto has been added in 1.10; there were no
> tests-only protobuf definitions in Mesos before that.
> However, I'm not quite getting how protobuf compilation in the automake
> build is supposed to work at all with a bundled protoc.
>
> When the bundled protobuf is used, I don't see any dependency on protoc
> injected into the pb.cc/pb.h targets in the generated Makefile.
> Neither do I see how src/Makefile.am is supposed to introduce this
> dependency.
> (See
> https://github.com/apache/mesos/blob/5a04a1693e4f1d51007c23728f1884a307e22src/testssrc/tests9a1/src/Makefile.am#L499
> <https://github.com/apache/mesos/blob/5a04a1693e4f1d51007c23728f1884a307e229a1/src/Makefile.am#L499>
> and below).
> Looks like all other protobufs (usually?) compile due to sheer luck.
>
> The workaround for the javah race and the fix for missing dependency on
> protoc seem to be rather straightforward.
> If any of these two should be considered a blocker for 1.10.0, please vote
> -1.
>
>
>
>
>
> On Tue, May 19, 2020 at 6:55 PM Vinod Kone  wrote:
>
>> Ran it in Apache CI. Found 2 build issues (issue 1
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>,
>> issue 2
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A16.04,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>)
>> which seem to be related to race condition due to parallel build.
>>
>> @Andrei Sekretenko  Can you confirm this is
>> not a regression in the build system?
>>
>> *Revision*: 1fb36dcc5a0099f147cd01bd82cd7b4f0aec2256
>>
>>- refs/tags/1.10.0-rc1
>>
>> Configuration Matrix gcc clang
>> centos:7 --verbose --disable-libtool-wrappers
>> --disable-parallel-test-execution --enable-libevent --enable-ssl
>> autotools
>> [image: Success]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> cmake
>> [image: Success]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> --verbose --disable-libtool-wrappers --disable-parallel-test-execution
>> autotools
>> [image: Failed]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> cmake
>> [image: Success]
>> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/77/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop%7C%7Cbeam)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>> [image: Not run]
>> ubuntu:16.04 --verbose --disable-libtool-wrappers
>> --disable-parallel-test-exec

Re: Subject: [VOTE] Release Apache Mesos 1.10.0 (rc1)

2020-05-19 Thread Vinod Kone
Ran it in Apache CI. Found 2 build issues (issue 1
,
issue 2
)
which seem to be related to race condition due to parallel build.

@Andrei Sekretenko  Can you confirm this is not
a regression in the build system?

*Revision*: 1fb36dcc5a0099f147cd01bd82cd7b4f0aec2256

   - refs/tags/1.10.0-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

Re: [VOTE] Release Apache Mesos 1.7.3 (rc1)

2020-05-06 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. All builds passed!

*Revision*: 5f617044c969ebcfca281d043a2474c1a6b39f23

   - refs/tags/1.7.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Mon, May 4, 2020 at 12:48 PM Greg Mann  wrote:

> Hi all,
>
> Please vote on releasing the following 

Re: how is the agent available memory computed/updated?

2020-04-30 Thread Vinod Kone
I commented on the JIRA.

On Thu, Apr 30, 2020 at 3:02 PM Charles-François Natali 
wrote:

> Thanks Vinod.
>
> Yes, I understand that Mesos assumes it's the only process managing
> resources, makes sense.
> Looking at the code and testing shows the agent reports as available
> memory the total memory of the host, minus 1GB (or half the total
> memory if the total memory is below 2GB)
> (
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L152
> ).
> So basically it means that if assumes that the OS doesn't use more
> than 1GB. I guess if it's not the case one can just specify the memory
> manually to the agent, so that's fine.
>
> Actually the reason I was wondering about this is because we recently
> had a problem where containers couldn't be destroyed because of tasks
> stuck in uninterruptible (D) state, which caused the memory to be
> basically leaked, i.e. the agent was advertising the memory free while
> it was still being used by the stuck processes. We ran into a similar
> issue with GPUs - it's a known issue
> https://issues.apache.org/jira/browse/MESOS-8038 - I posted an
> analysis and potential fix, it'd be great if someone could have a look
> :).
>
> Cheers,
>
> Charles
>
> Le jeu. 30 avr. 2020 à 15:36, Vinod Kone  a écrit :
> >
> > Mesos assumes that it is the only process managing resources of a box
> (cpu,
> > mem, disk). So if you have out of band processes using up resources it
> > won't be reflected in the resource offers and the box can be
> overcommitted.
> > There is no runtime periodic check of available resources, it's only
> > calculated once at startup.
> >
> > Resource detection logic is here:
> >
> https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L65
> >
> > On Thu, Apr 30, 2020 at 8:17 AM Charles-François Natali <
> cf.nat...@gmail.com>
> > wrote:
> >
> > > Hi,
> > >
> > > Could someone point me to some code/documentation explaining how the
> > > agent available memory is computed, and when it is refreshed?
> > >
> > > For example, if I have an agent started, with some outstanding offers,
> > > and I then start a process - not as a task managed by Mesos, but as an
> > > external process which just allocates a lot of memory - and touches
> > > it, not just committed - I can see the machine available memory go
> > > down (as reported by free, and MemAvailable in /proc/meminfo), but the
> > > agent doesn't rescind any offer, and never seems to actually refresh
> > > it - event after starting/stopping tasks.
> > >
> > > Cheers,
> > >
> > > Charles
> > >
>


Re: how is the agent available memory computed/updated?

2020-04-30 Thread Vinod Kone
Mesos assumes that it is the only process managing resources of a box (cpu,
mem, disk). So if you have out of band processes using up resources it
won't be reflected in the resource offers and the box can be overcommitted.
There is no runtime periodic check of available resources, it's only
calculated once at startup.

Resource detection logic is here:
https://github.com/apache/mesos/blob/master/src/slave/containerizer/containerizer.cpp#L65

On Thu, Apr 30, 2020 at 8:17 AM Charles-François Natali 
wrote:

> Hi,
>
> Could someone point me to some code/documentation explaining how the
> agent available memory is computed, and when it is refreshed?
>
> For example, if I have an agent started, with some outstanding offers,
> and I then start a process - not as a task managed by Mesos, but as an
> external process which just allocates a lot of memory - and touches
> it, not just committed - I can see the machine available memory go
> down (as reported by free, and MemAvailable in /proc/meminfo), but the
> agent doesn't rescind any offer, and never seems to actually refresh
> it - event after starting/stopping tasks.
>
> Cheers,
>
> Charles
>


Re: [AREA1 SUSPICIOUS] [OFFER] Remove ZooKeeper as hard-dependency, support etcd, Consul, OR ZooKeeper

2020-04-17 Thread Vinod Kone
Hi Samuel,

Thanks for showing interest in contributing to the project. Having
optionality between ZooKeeper and Etcd would be great for the project and
something that has been brought up a few times before, as you noted.

I echo everything that BenM said. As part of the design it would be great
to see the migration path for users currently using Mesos with ZooKeeper to
Etcd. Ideally, the migration can happen without much user intervention.

Additionally, from our past experience, efforts like these are more
successful if the people writing the code have experience with how things
work in Mesos code base. So I would recommend starting small, maybe have a
few engineers work on a couple "newbie" tickets and do some small projects
and have those committed to the project. That gives the committers some
level of confidence about quality of the code and be more open to bigger
changes like etcd integration. It would also help contributors get a better
feeling for the lay of the land and see if they are truly interested in
maintaining this piece of integration for the long haul. This is a bit of a
longer path but I think it would be more a fruitful one.

Looking forward to seeing new contributions to Mesos including the above
design!

Thanks,

On Fri, Apr 17, 2020 at 4:52 PM Samuel Marks  wrote:

> Happy to build a design doc,
>
> To answer your question on what Offscale.io is, it's my software and
> biomedical engineering consultancy. Currently it's still rather small, with
> only 8 engineers, but I'm expecting & preparing to grow rapidly.
>
> My philosophy is always open-source and patent-free, so that's what my
> consultancy—and for that matter, the charitable research that I fund
> through it —follows.
>
> The goal of everything we create is: interoperable (cross-platform,
> cross-technology, cross-language, multi-cloud); open-source (Apache-2.0 OR
> MIT); with a view towards scaling:
>
>- teams;
>- software-development ;
>- infrastructure [this proposed Mesos contribution + our DevOps
> tooling];
>- [in the charity's case] facilitating very large-scale medical
>diagnostic screening.
>
> Technologies like Mesos we expect to both optimise resource
> allocation—reducing costs and increasing data locality—and award us
> 'bragging rights' with which we can gain clients that are already using
> Mesos (which, from my experience, is always big corporates… though
> hopefully contributions like these will make it attractive to small
> companies also).
>
> So no, we're not going anywhere, and are planning to maintain this library
> into the future
>
> PS: Once accepted by Mesos, we'll be making similar contributions to other
> Mesos ecosystem projects like Chronos ,
> Marathon , and Aurora
>  as well as to unrelated
> projects (e.g., removing etcd as a hard-dependency from Kubernetes
> … enabling them to choose between ZooKeeper, etcd,
> and Consul).
>
> Thanks for your continual feedback,
>
> *SAMUEL MARKS*
> Sydney Medical School | Westmead Institute for Medical Research |
> https://linkedin.com/in/samuelmarks
> Director | Sydney Scientific Foundation Ltd 
> | Offscale.io of Sydney Scientific Pty Ltd 
>
>
> On Sat, Apr 18, 2020 at 6:58 AM Benjamin Mahler 
> wrote:
>
> > Oh ok, could you tell us a little more about how you're using Mesos? And
> > what offscale.io is?
> >
> > Strictly speaking, we don't really need packaging and releases as we can
> > bundle the dependency in our repo and that's what we do for many of our
> > dependencies.
> > To me, the most important thing is the commitment to maintain the library
> > and address issues that come up.
> > I also would lean more towards a run-time flag rather than a build level
> > flag, if possible.
> >
> > I think the best place to start would be to put together a design doc.
> The
> > act of writing that will force the author to think through the details
> (and
> > there are a lot of them!), and we'll then get a chance to give feedback.
> > You can look through the mailing list for past examples of design docs
> (in
> > terms of which sections to include, etc).
> >
> > How does that sound?
> >
> > On Tue, Apr 14, 2020 at 8:44 PM Samuel Marks  wrote:
> >
> > > Dear Benjamin Mahler [and *Developers mailing-list for Apache Mesos*],
> > >
> > > Thanks for responding so quickly.
> > >
> > > Actually this entire project I invested—time & money, including a
> > > development team—explicitly in order to contribute this to Apache
> Mesos.
> > So
> > > no releases yet, because I wanted to ensure it was up to the
> > specification
> > > requirements referenced in dev@mesos.apache.org before proceeding with
> > > packaging and releases.
> > >
> > > Tests have been setup in Travis CI for Linux (Ubuntu 18.04) and 

Re: Scheduler driver doesn't detect loss of connection to the master without zookeeper

2019-12-30 Thread Vinod Kone
In latest versions of mesos that is handled via heartbeats. 

Thanks,
Vinod

> On Dec 30, 2019, at 4:37 AM, Charles-François Natali  
> wrote:
> 
> Thanks.
> 
> That's what I thought. The problem though is that it is probably possible
> that the zookeeper detector doesn't detect the failure while the connection
> to the master fails. One way this could happen would be for example because
> of a firewall causing the TCP connection from the framework to the master
> to fail, while the zookeeper connections (from master to zk and framework
> to zk) still work. Unlikely but possible I think. Having the driver detect
> and fail upon EOF/socket error would guard against that.
> 
> 
> 
> 
> 
>> On Thu, 26 Dec 2019, 18:07 Vinod Kone,  wrote:
>> 
>> IIRC, the standalone master detector (the detector that's used when using a
>> local ip address of the master and not zk) doesn't re-detect when master
>> process restarts. It's a limitation of that detector since it's mainly used
>> for testing purposes and not recommended for production use. For
>> production, please use zookeeper master detector (this detector is used
>> when using zookeeper).
>> 
>> On Fri, Dec 20, 2019 at 5:11 AM Charles-François Natali <
>> cf.nat...@gmail.com>
>> wrote:
>> 
>>> Hi,
>>> 
>>> It seems that the C++ scheduler driver doesn't detect loss of the
>>> connection to the master when not using zookeeper.
>>> 
>>> A simple way to reproduce this is to start a server passing it e.g.
>>> "--ip=127.0.0.1", start the scheduler driver passing it "127.0.0.1:5050
>> ",
>>> and then send a SIGKILL to the master. The scheduler logs the following:
>>> 
>>> 
>>> I1220 10:56:11.679347 10635 process.cpp:2928] Resuming
>>> __reaper__(1)@192.168.65.76:34345 at 2019-12-20
>>> 10:56:11.679366144+00:00
>>> I1220 10:56:11.679392 10635 clock.cpp:279] Created a timer for
>>> __reaper__(1)@192.168.65.76:34345 in 100ms in the future (2019-12-20
>>> 10:56:11.779389952+00:00)
>>> I1220 10:56:11.690646 10631 process.cpp:2928] Resuming
>>> scheduler-6a93a8e3-5a8f-4195-bde2-718b5832d317@192.168.65.76:34345 at
>>> 2019-12-20 10:56:11.690665984+00:00
>>> I1220 10:56:11.690775 10632 process.cpp:2928] Resuming
>>> __http__(1)@192.168.65.76:34345 at 2019-12-20 10:56:11.690784000+00:00
>>> I1220 10:56:11.690806 10632 process.cpp:3088] Cleaning up
>>> __http__(1)@192.168.65.76:34345
>>> I1220 10:56:11.690914 10632 process.cpp:2928] Resuming
>>> help@192.168.65.76:34345 at 2019-12-20 10:56:11.690921984+00:00
>>> 
>>> An strace confirms that the process receives EOF when reading from the
>>> socket, but Scheduler::disconnected isn't called.
>>> It's that expected?
>>> 
>>> Or is it assumed that the scheduler relies on zookeeper for detection?
>>> 
>>> Cheers,
>>> 
>>> Charles
>>> 
>> 


Re: Scheduler driver doesn't detect loss of connection to the master without zookeeper

2019-12-26 Thread Vinod Kone
IIRC, the standalone master detector (the detector that's used when using a
local ip address of the master and not zk) doesn't re-detect when master
process restarts. It's a limitation of that detector since it's mainly used
for testing purposes and not recommended for production use. For
production, please use zookeeper master detector (this detector is used
when using zookeeper).

On Fri, Dec 20, 2019 at 5:11 AM Charles-François Natali 
wrote:

> Hi,
>
> It seems that the C++ scheduler driver doesn't detect loss of the
> connection to the master when not using zookeeper.
>
> A simple way to reproduce this is to start a server passing it e.g.
> "--ip=127.0.0.1", start the scheduler driver passing it "127.0.0.1:5050",
> and then send a SIGKILL to the master. The scheduler logs the following:
>
>
> I1220 10:56:11.679347 10635 process.cpp:2928] Resuming
> __reaper__(1)@192.168.65.76:34345 at 2019-12-20
> 10:56:11.679366144+00:00
> I1220 10:56:11.679392 10635 clock.cpp:279] Created a timer for
> __reaper__(1)@192.168.65.76:34345 in 100ms in the future (2019-12-20
> 10:56:11.779389952+00:00)
> I1220 10:56:11.690646 10631 process.cpp:2928] Resuming
> scheduler-6a93a8e3-5a8f-4195-bde2-718b5832d317@192.168.65.76:34345 at
> 2019-12-20 10:56:11.690665984+00:00
> I1220 10:56:11.690775 10632 process.cpp:2928] Resuming
> __http__(1)@192.168.65.76:34345 at 2019-12-20 10:56:11.690784000+00:00
> I1220 10:56:11.690806 10632 process.cpp:3088] Cleaning up
> __http__(1)@192.168.65.76:34345
> I1220 10:56:11.690914 10632 process.cpp:2928] Resuming
> help@192.168.65.76:34345 at 2019-12-20 10:56:11.690921984+00:00
>
> An strace confirms that the process receives EOF when reading from the
> socket, but Scheduler::disconnected isn't called.
> It's that expected?
>
> Or is it assumed that the scheduler relies on zookeeper for detection?
>
> Cheers,
>
> Charles
>


Re: [VOTE] Release Apache Mesos 1.9.0 (rc3)

2019-09-03 Thread Vinod Kone
+1 (binding)

Tested on ASF CI.

*Revision*: 5e79a584e6ec3e9e2f96e8bf418411df9dafac2e

   - refs/tags/1.9.0-rc3

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Sun, Sep 1, 2019 at 10:16 PM Qian Zhang  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 

Re: [VOTE] Release Apache Mesos 1.9.0 (rc1)

2019-08-27 Thread Vinod Kone
I see. That's reduces the risk considerably than what I originally thought
but I guess still risky to introduce it so late?

On Tue, Aug 27, 2019 at 1:28 PM Benjamin Mahler  wrote:

> > We upgraded the version of the bundled boost very late in the release
> cycle
>
> Did we? We still bundle boost 1.65.0, just like we did during 1.8.x. We
> just adjusted our special stripped bundle to include additional headers.
>
> On Tue, Aug 27, 2019 at 1:39 PM Vinod Kone  wrote:
>
>> -1
>>
>> We upgraded the version of the bundled boost very late in the release
>> cycle
>> which doesn't give downstream customers (who also depend on boost) enough
>> time to vet any compatibility/perf/other issues. I propose we revert the
>> boost upgrade (and the corresponding code changes depending on the
>> upgrade)
>> in 1.9.x branch but keep it in the master branch.
>>
>> On Tue, Aug 27, 2019 at 4:18 AM Qian Zhang  wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.9.0.
>> >
>> >
>> > 1.9.0 includes the following:
>> >
>> >
>> 
>> > * Agent draining
>> > * Support configurable /dev/shm and IPC namespace.
>> > * Containerizer debug endpoint.
>> > * Add `no-new-privileges` isolator.
>> > * Client side SSL certificate verification in Libprocess.
>> >
>> > The CHANGELOG for the release is available at:
>> >
>> >
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.9.0-rc1
>> >
>> >
>> 
>> >
>> > The candidate for Mesos 1.9.0 release is available at:
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz
>> >
>> > The tag to be voted on is 1.9.0-rc1:
>> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.9.0-rc1
>> >
>> > The SHA512 checksum of the tarball can be found at:
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.sha512
>> >
>> > The signature of the tarball can be found at:
>> >
>> >
>> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.asc
>> >
>> > The PGP key used to sign the release is here:
>> > https://dist.apache.org/repos/dist/release/mesos/KEYS
>> >
>> > The JAR is in a staging repository here:
>> > https://repository.apache.org/content/repositories/orgapachemesos-1255
>> >
>> > Please vote on releasing this package as Apache Mesos 1.9.0!
>> >
>> > The vote is open until Friday, April 30 and passes if a majority of at
>> > least 3 +1 PMC votes are cast.
>> >
>> > [ ] +1 Release this package as Apache Mesos 1.9.0
>> > [ ] -1 Do not release this package because ...
>> >
>> >
>> > Thanks,
>> > Qian and Gilbert
>> >
>>
>


Re: [VOTE] Release Apache Mesos 1.9.0 (rc1)

2019-08-27 Thread Vinod Kone
-1

We upgraded the version of the bundled boost very late in the release cycle
which doesn't give downstream customers (who also depend on boost) enough
time to vet any compatibility/perf/other issues. I propose we revert the
boost upgrade (and the corresponding code changes depending on the upgrade)
in 1.9.x branch but keep it in the master branch.

On Tue, Aug 27, 2019 at 4:18 AM Qian Zhang  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.9.0.
>
>
> 1.9.0 includes the following:
>
> 
> * Agent draining
> * Support configurable /dev/shm and IPC namespace.
> * Containerizer debug endpoint.
> * Add `no-new-privileges` isolator.
> * Client side SSL certificate verification in Libprocess.
>
> The CHANGELOG for the release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.9.0-rc1
>
> 
>
> The candidate for Mesos 1.9.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz
>
> The tag to be voted on is 1.9.0-rc1:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.9.0-rc1
>
> The SHA512 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.9.0-rc1/mesos-1.9.0.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1255
>
> Please vote on releasing this package as Apache Mesos 1.9.0!
>
> The vote is open until Friday, April 30 and passes if a majority of at
> least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.9.0
> [ ] -1 Do not release this package because ...
>
>
> Thanks,
> Qian and Gilbert
>


Re: Restarting mesas-agent kills executors

2019-08-01 Thread Vinod Kone
Need agent and executor logs to diagnose. Can you share them?

Thanks,
Vinod

> On Aug 1, 2019, at 6:05 AM, Jorge Machado  wrote:
> 
> Hi Guys, 
> 
> I was reading about agent restarts on 
> http://mesos.apache.org/documentation/latest/agent-recovery/ 
> 
> From what I understood, If I had a task running and we restart the 
> mesos-agent I should not loose any task running. 
> This is not the case for systemctl (or with service command) from ubuntu 
> 18.04. Our Framework has checkpointing active...
> 
> My config: 
> 
> [Unit]
> Description=Mesos Agent
> After=network.target
> Wants=network.target
> 
> [Service]
> Environment=LIBPROCESS_SSL_ENABLED=true
> Environment=LIBPROCESS_SSL_SUPPORT_DOWNGRADE=false
> Environment=LIBPROCESS_SSL_CIPHERS=AES128-SHA:AES256-SHA:DHE-RSA-AES128-SHA:DHE-DSS-AES128-SHA:DHE-RSA-AES256-SHA:DHE-DSS-AES256-SHA
> Environment=LIBPROCESS_SSL_KEY_FILE=/etc/ssl/private/server_2048.key
> Environment=LIBPROCESS_SSL_CERT_FILE=/etc/ssl/server.crt
> Environment=LIBPROCESS_SSL_CA_FILE=/etc/pki/trust/anchors/it4ad.pem
> 
> ExecStart=/usr/local/sbin/mesos-agent \
>--master= \
>--work_dir=/data/mesos/work \
>--log_dir=/var/log/mesos \
>--executor_registration_timeout=20mins \
>--executor_environment_variables=file:///etc/mesos/executor_envs.json \
>--resources=file:///etc/mesos/resources.txt \
>--image_gc_config=file:///etc/mesos/image-gc-config.json \
>
> --isolation=cgroups/cpu,cgroups/mem,cgroups/devices,filesystem/linux,gpu/nvidia,docker/runtime,namespaces/pid,namespaces/ipc
>  \
>--image_providers=docker \
>--docker_store_dir=/data/mesos/store/docker \
>--gc_delay=3weeks \
>--attributes=
> 
> KillMode=control-cgroup
> Restart=always
> RestartSec=20
> LimitNOFILE=infinity
> CPUAccounting=true
> MemoryAccounting=true
> TasksMax=infinity
> 
> [Install]
> WantedBy=multi-user.target
> 
> 
> 
> Any tipp ? thx
> 
> 
> 
> Jorge Machado
> www.jmachado.me
> 
> 
> 
> 
> 


Re: [VOTE] Release Apache Mesos 1.8.1 (rc1)

2019-07-10 Thread Vinod Kone
+1 (binding).

Tested in ASF CI. One build failed due to known flaky test
https://issues.apache.org/jira/browse/MESOS-9594


*Revision*: 4ae06448466408d9ec96ede953208057609f0744

   - refs/tags/1.8.1-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Failed]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]





On Wed, Jul 10, 2019 at 11:54 AM Benno Evers  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.8.1.
>
> We had a lot 

Daily snapshot builds through Jenkins

2019-07-10 Thread Vinod Kone
Hi there,

We would like to automate publishing of Mesos snapshot jar to
repository.apache.org through a Jenkins job in builds.a.o. Right now we do
this manually through a script
.

I have couple questions that I was hoping you could help me with.

1) IIUC, all Jenkins build agents already have required credentials
configured in "~/m2" to be able to push to snapshot repo of RAO when doing
`mvn deploy`. Can you confirm if that's the case? Do we have to set
anything in the PATH for this to work?

2) Our POM file

uses GPG signing through maven gpg plugin. Do the Jenkins agents also have
pre-configured GPG keys or do we have to inject them somehow (any
examples?) or should we disable signing for snapshots?

Thanks,
Vinod


Re: '*.json' endpoints removed in 1.7

2019-05-10 Thread Vinod Kone
I propose that we revert this change and keep the ".json" endpoints in
master branch and 1.8.x

My reasoning is that, we have ecosystem components (e.g., mesos-dns which
is yet to have a release with fix) and anecdotally a bunch of custom
tooling at user sites that depend on these ".json" endpoints (esp.
/state.json). The amount of techdebt that we saved or consistency we
achieved in the codebase by doing this is not worth the tradeoff of
breaking some user/tooling, in my opinion. We could revisit this if and
when we do a Mesos 2.0.

On Wed, Aug 8, 2018 at 9:25 AM Alex Rukletsov  wrote:

> Folks,
>
> The long ago deprecated '*.json' endpoints will be removed in Mesos 1.7.0.
> Please use their non-'.json' counterparts instead.
>
> Commit:
> https://github.com/apache/mesos/commit/42551cb5290b7b04101f7d800b4b8fd573e47b91
> JIRA ticket: https://issues.apache.org/jira/browse/MESOS-4509
>
> Alex.
>


Re: [VOTE] Release Apache Mesos 1.8.0 (rc3)

2019-04-26 Thread Vinod Kone
+1 (binding)

1 failed build was due to a known flaky test.
*Revision*: acefa90695a32f8e8d6361f8192a6522aeaadbb9

   - refs/tags/1.8.0-rc3

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Failed]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Fri, Apr 26, 2019 at 1:04 PM Benno Evers  wrote:

> Addendum:
> The vote is open until Thursday, May 2nd.
>
> On Fri, Apr 26, 2019 at 6:28 PM Benno Evers  wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following 

Re: [VOTE] Release Apache Mesos 1.8.0 (rc2)

2019-04-18 Thread Vinod Kone
+1 (binding)

Ran on ASF CI.

*Revision*: f5920ad1a7cbcd2423c30465dcf14948e392081b

   - refs/tags/1.8.0-rc2

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Thu, Apr 18, 2019 at 8:00 AM Benno Evers  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.8.0.
>
>
> 1.8.0 includes the following:
>
> 

Re: Subject: [VOTE] Release Apache Mesos 1.8.0 (rc1)

2019-04-15 Thread Vinod Kone
+1 (binding)

Ran it on ASF CI.

*Revision*: 85462fc183a60ae18d85729bccb1fffb59aa572c

   - refs/tags/1.8.0-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Mon, Apr 15, 2019 at 1:26 PM Benno Evers  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.8.0.
>
>
> 1.8.0 includes the following:
>
> 

Re: Mesos on ssl

2019-04-11 Thread Vinod Kone
Hi Jorge. We are hoping to cut 1.8.0 RC within a week.

-- Vinod


On Fri, Apr 5, 2019 at 12:06 PM Jorge Machado  wrote:

> Thanks Hsiao, I think it is this, as a build from master works fine for
> me. When are we releasing 1.8.0 ?
>
>
>
> > On 5 Apr 2019, at 16:52, Chun-Hung Hsiao  wrote:
> >
> > I'm not sure if this is related:
> > https://issues.apache.org/jira/browse/MESOS-7076
> >
> > In summary, Ubuntu 18.04 ships libevent 2.1.x (for OpenSSL 1.1.x
> support).
> > But libevent 2.1.x has an unknown bug that caused some Mesos tests to
> fail.
> > As a workaround, the current Mesos master branch (will be 1.8 soon)
> bundled
> > libevent 2.0.x with a magic patch from Debian 8 for OpenSSL 1.1.x). So
> > Mesos 1.8 will be the first official release supporting SSL on Ubuntu
> 18.04.
> >
> > That said, I'm not sure what you encountered is exactly the same bug that
> > caused the Mesos tests to fail though. Just a guess ;)
> >
> > On Fri, Apr 5, 2019, 12:58 AM Jorge Machado 
> wrote:
> >
> >> Hi Guys,
> >>
> >> I'm having issues with mesos versions from tar.gz compared with a build
> >> from git master when using ssl.
> >> With a build from git ssl agent is fine and for example the endpoint
> >> https://mesos-agent:5051/ returns a 404 which is fine.
> >> With a build from tar.gz (1.7.1 or 1.7.2) the same endpoint does not
> work
> >> and it just hangs. No logs nothing...
> >> I'm testing this on ubuntu 18.04.
> >>
> >> Any tipps ?
> >> thanks
> >> Jorge
> >>
> >>
> >> Jorge Machado
> >> www.jmachado.me
> >>
> >>
> >>
> >>
> >>
> >>
>
>


Re: Apache ReviewBoard not accepting new users

2019-03-27 Thread Vinod Kone
Good news! Looks like the LDAP requirement will be reverted.

https://issues.apache.org/jira/browse/INFRA-18071

On Wed, Mar 6, 2019 at 8:16 AM Benno Evers  wrote:

> Hi everyone,
>
> I just wanted to share something that caught us off-guard yesterday: it is
> currently not possible to register new user accounts on
> http://reviews.apache.org - only committers and people with an existing
> account are allowed to log in.
>
> It looks like this is a side-effect of Apache switching to LDAP logins.
> Since reviewboard supports only exactly one sign-up backend, so when they
> enabled LDAP, "classic" signup was disabled in the process. We're still
> trying to get some background on this decision from the ASF Infra team.
>
> Best regards,
> --
> Benno Evers
> Software Engineer, Mesosphere
>


Re: Bundled glog update from 0.3.3 to 0.4.0

2019-03-27 Thread Vinod Kone
Another thing to keep in mind is that we are very close to release 1.8.0.
Is this glog upgrade potentially risky? If yes, maybe we should wait until
1.8 is branched off and then do it on master / 1.9 .

On Wed, Mar 27, 2019 at 10:00 AM Benjamin Mahler  wrote:

> Thanks Andrei!
>
> Some interesting changes for us from what I see:
>   - Looks like there are some potential memory allocation reduction changes
> which is nice. ("reduce dynamic allocation from 3 to 1 per log message" in
> 0.3.4)
>   - https://github.com/google/glog/pull/245 (this will change the log file
> names for those that see 'invalid-user' in the filenames, which I recall
> seeing often)
>   - https://github.com/google/glog/pull/145 (this fixes the issue we filed
> https://github.com/google/glog/issues/84 where we've had to disable
> GLOG_drop_log_memory).
>
> After this update we should be able to remove our special case disablement
> of GLOG_drop_log_memory:
>
> https://github.com/apache/mesos/blob/1.7.2/src/logging/logging.cpp#L184-L194
>
> Is there a ticket for the glog upgrade to 0.4.0? I filed
> https://issues.apache.org/jira/browse/MESOS-9680 but couldn't find the
> 0.4.0 ticket to link that it's blocked by the upgrade.
>
> On Tue, Mar 26, 2019 at 9:17 AM Andrei Sekretenko <
> asekrete...@mesosphere.com> wrote:
>
> > Hi all,
> > We are intending to update the bundled glog from 0.3.3 to 0.4.0.
> >
> > If you have any objections/concerns, or know about any issues introduced
> > into glog between 0.3.3 and 0.4.0, please raise them.
> >
> > Corresponding glog changelogs:
> > https://github.com/google/glog/releases/tag/v0.4.0
> > https://github.com/google/glog/releases/tag/v0.3.5
> > https://github.com/google/glog/releases/tag/v0.3.4
> >
> > Regards,
> > Andrei Sekretenko
> >
>


Re: [VOTE] Release Apache Mesos 1.5.3 (rc1)

2019-03-07 Thread Vinod Kone
+1 (binding)

Ran in ASF CI. Saw some flaky tests but otherwise looks good.

*Revision*: b1dbba03af23b0222d11f2b7ae936d77ef42650d

   - refs/tags/1.5.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Failed]


On Wed, Mar 6, 2019 at 7:33 AM Gilbert Song  wrote:

>  Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.3.
>
> 1.5.3 includes the following:
>
> 

Re: [VOTE] Release Apache Mesos 1.6.2 (rc1)

2019-02-20 Thread Vinod Kone
+1 (binding)

Thanks for the update Greg.

On Wed, Feb 20, 2019 at 11:41 AM Greg Mann  wrote:

> It appears to be a flaky test; that particular failure hasn't come up in
> the CI builds that I ran, or in my own manual testing. Just now, I was able
> to get that test to fail after many repetitions, but with a different
> error. I filed ticket MESOS-9589
> <https://issues.apache.org/jira/browse/MESOS-9589> to track.
>
> Cheers,
> Greg
>
> On Tue, Feb 19, 2019 at 2:41 PM Vinod Kone  wrote:
>
> > Found a flaky test
> > <
> >
> https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/65/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--disable-libtool-wrappers%20--disable-parallel-test-execution,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:16.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console
> > >in
> > ASF CI. Doesn't seem to be a known issue according to JIRA.
> >
> > @Greg Mann   can you please confirm if this is a
> flaky
> > test or something new?
> >
> >
> >
> > On Tue, Feb 19, 2019 at 1:56 PM Greg Mann  wrote:
> >
> > > Hi all,
> > >
> > > Please vote on releasing the following candidate as Apache Mesos 1.6.2.
> > >
> > >
> > > 1.6.2 includes a number of bug fixes since 1.6.1; the CHANGELOG for the
> > > release is available at:
> > >
> > >
> >
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.2-rc1
> > >
> > >
> >
> 
> > >
> > > The candidate for Mesos 1.6.2 release is available at:
> > >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz
> > >
> > > The tag to be voted on is 1.6.2-rc1:
> > > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.2-rc1
> > >
> > > The SHA512 checksum of the tarball can be found at:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.sha512
> > >
> > > The signature of the tarball can be found at:
> > >
> > >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.asc
> > >
> > > The PGP key used to sign the release is here:
> > > https://dist.apache.org/repos/dist/release/mesos/KEYS
> > >
> > > The JAR is in a staging repository here:
> > > https://repository.apache.org/content/repositories/orgapachemesos-1246
> > >
> > > Please vote on releasing this package as Apache Mesos 1.6.2!
> > >
> > > The vote is open until Fri Feb 22 11:54 PST 2019, and passes if a
> > majority
> > > of at least 3 +1 PMC votes are cast.
> > >
> > > [ ] +1 Release this package as Apache Mesos 1.6.2
> > > [ ] -1 Do not release this package because ...
> > >
> > > Thanks,
> > > Greg
> > >
> >
>


Re: [VOTE] Release Apache Mesos 1.7.2 (rc1)

2019-02-20 Thread Vinod Kone
+1

Ran this on ASF CI.

The red builds are a flaky infra issue and a known flaky test
.

*Revision*: 58cc918e9acc2865bb07047d3d2dff156d1708b2

   - refs/tags/1.7.2-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Failed]



On Tue, Feb 19, 2019 at 5:00 PM Gastón Kleiman  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.7.2.
>
>

Re: [VOTE] Release Apache Mesos 1.6.2 (rc1)

2019-02-19 Thread Vinod Kone
Found a flaky test
in
ASF CI. Doesn't seem to be a known issue according to JIRA.

@Greg Mann   can you please confirm if this is a flaky
test or something new?



On Tue, Feb 19, 2019 at 1:56 PM Greg Mann  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.6.2.
>
>
> 1.6.2 includes a number of bug fixes since 1.6.1; the CHANGELOG for the
> release is available at:
>
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.2-rc1
>
> 
>
> The candidate for Mesos 1.6.2 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz
>
> The tag to be voted on is 1.6.2-rc1:
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.2-rc1
>
> The SHA512 checksum of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.sha512
>
> The signature of the tarball can be found at:
>
> https://dist.apache.org/repos/dist/dev/mesos/1.6.2-rc1/mesos-1.6.2.tar.gz.asc
>
> The PGP key used to sign the release is here:
> https://dist.apache.org/repos/dist/release/mesos/KEYS
>
> The JAR is in a staging repository here:
> https://repository.apache.org/content/repositories/orgapachemesos-1246
>
> Please vote on releasing this package as Apache Mesos 1.6.2!
>
> The vote is open until Fri Feb 22 11:54 PST 2019, and passes if a majority
> of at least 3 +1 PMC votes are cast.
>
> [ ] +1 Release this package as Apache Mesos 1.6.2
> [ ] -1 Do not release this package because ...
>
> Thanks,
> Greg
>


Re: [VOTE] Release Apache Mesos 1.4.3 (rc2)

2019-02-15 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. Red builds are known flaky tests or unrelated infra
issues.


*Revision*: 1fee9b5365bf2424e4768dc1d5209c6c78dfece6

   - refs/tags/1.4.3-rc2

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Failed]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Failed]


On Wed, Feb 13, 2019 at 8:49 PM Meng Zhu  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.4.3.
>
> 1.4.3 includes the following:
>
> 

Re: need help http api /endpoints

2019-02-11 Thread Vinod Kone
There is no operator API (i.e., http api endpoint in your parlance) to
add/kill tasks.

Your job master can talk to the framework that launched all your tasks in
the first place (marathon?), to add/kill tasks. Most existing mesos
frameworks expose add/kill tasks through their own API.

Alternatively, you can rewrite your job master as a Mesos framework
.
A framework can add/kill tasks through the scheduler api
. But,
this is likely a much bigger undertaking.



On Mon, Feb 11, 2019 at 6:15 AM Gurhan Gunduz 
wrote:

> I would be really glad if you could tell me what to do for the following
> situation; I am running a job consists of tasks running in docker
> containers on agents. One of these tasks is the job master, which decides
> when to kill the framework or kill few of the tasks or add new tasks to the
> job. I can kill the framework using the endpoint
> mesos.apache.org/documentation/latest/endpoints/master/teardown<
> http://mesos.apache.org/documentation/latest/endpoints/master/teardown/>.
> What should I do in order to add or kill tasks. As I mentioned one of the
> tasks(job master) will decide on this dynamically.
> Thanks,
> Gurhan
>


Re: Welcome Benno Evers as committer and PMC member!

2019-01-30 Thread Vinod Kone
Congratulations Benno!

On Wed, Jan 30, 2019 at 3:21 PM Alex R  wrote:

> Folks,
>
> Please welcome Benno Evers as an Apache committer and PMC member of the
> Apache Mesos!
>
> Benno has been active in the project for more than a year now and has made
> significant contributions, including:
>   * Agent reconfiguration, MESOS-1739
>   * Memory profiling, MESOS-7944
>   * "/state" performance improvements, MESOS-8345
>
> I have been working closely with Benno, paired up on, and shepherded some
> of his work. Benno has very strong technical knowledge in several areas and
> he is willing to share it with others and help his peers.
>
> Benno, thanks for all your contributions so far and looking forward to
> continuing to work with you on the project!
>
> Alex.
>


Re: [VOTE] Release Apache Mesos 1.4.3 (rc1)

2019-01-29 Thread Vinod Kone
+1

Tested in ASF CI. Red builds are known flakes.

*Revision*: fcfe1904e45726ca96fc6707d8b227a16664f4f8

   - refs/tags/1.4.3-rc1

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Failed]

[image: Failed]

cmake
[image: Success]

[image: Success]



On Mon, Jan 28, 2019 at 2:48 AM Alex Rukletsov  wrote:

> This will be the last official 1.4.x release. Even though we agreed to
> keep the branch and occasionally back port fixes to it post last release,
> maybe it makes sense to 

Re: Shut down modules@mesos mailing list ?

2019-01-25 Thread Vinod Kone
SGTM 

Thanks,
Vinod

> On Jan 25, 2019, at 5:31 AM, sebb  wrote:
> 
> Does anyone else agree?
> Does anyone disagree?
> 
>> On Mon, 21 Jan 2019 at 19:20, Till Toenshoff  
>> wrote:
>> 
>> Agreed.
>> 
>>> On 20. Jan 2019, at 23:34, sebb  wrote:
>>> 
>>> The modules@ mailing list looks as though it is not needed and should
>>> be shut down.
>>> 
>>> It has had very few postings - none in 2018.
>>> 
>>> Agreed?
>>> 
>>> Sebb.
>> 


Re: [VOTE] Release Apache Mesos 1.7.1 (rc2)

2019-01-16 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. Failing builds are due to missed SSL dep in the docker
build file and a flaky test.

*Revision*: d5678c3c5500cec72e22e775d9d048c55c128954

   - refs/tags/1.7.1-rc2

Configuration Matrix gcc clang
centos:7 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:16.04 --verbose --disable-libtool-wrappers
--disable-parallel-test-execution --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Failed]

[image: Failed]

--verbose --disable-libtool-wrappers --disable-parallel-test-execution
autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Tue, Jan 15, 2019 at 8:30 PM Chun-Hung Hsiao  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.7.1.
>
>
> 1.7.1 includes the 

Re: [VOTE] Release Apache Mesos 1.5.2 (rc3)

2019-01-16 Thread Vinod Kone
+1  (binding)

Passed in ASF CI. Known flaky tests, but otherwise builds look good.

*Revision*: 3088295d4156eb58d092ad9b3529b85fd33bd36e

   - refs/tags/1.5.2-rc3

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Wed, Jan 16, 2019 at 11:04 AM Jie Yu  wrote:

> +1
>
> make dist check on macOS Mojave
>
> On Tue, Jan 15, 2019 at 12:57 AM Gilbert Song  wrote:
>
>>  Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>>
>> 1.5.2 includes the following:
>>
>> 
>> *Announce major bug fixes here*
>> https://jira.apache.org/jira/issues/?filter=12345443
>>
>> The CHANGELOG for the release is available at:
>>
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.5.2-rc3
>>
>> 
>>
>> The candidate for Mesos 1.5.2 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz
>>
>> The tag to be voted on is 1.5.2-rc3:
>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.5.2-rc3
>>
>> The SHA512 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.5.2-rc3/mesos-1.5.2.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>>
>> 

Re: [Community WG] Reminder: Meeting today at 10:30 AM PST

2019-01-14 Thread Vinod Kone
Cloud recording for those who missed it:
https://zoom.us/recording/play/8z2oHhJZIkf0xnJZ40-NtzlNdwn9ev_FuGQnlYkbdp4AFqpHbfWXdO46Us3-MyNu?continueMode=true

On Mon, Jan 14, 2019 at 11:49 AM Vinod Kone  wrote:

> Hi folks,
>
> This is a reminder that we have community WG meeting today at 10:30 AM PST.
>
> The agenda for the meeting is here
> <https://docs.google.com/document/d/1vgi434dYkkZHs49EK4F4eMmM-3JG4f3qg-N5En-4ubg/edit#>.
> Please feel free to add more items to the agenda.
>
> See you there,
>
> Vinod
>


[DISCUSS] Updating the support and release policy

2019-01-14 Thread Vinod Kone
Hi folks,

As discussed in the Community WG meeting today, I wanted to send out a
proposal for updating the current support and release policy
.

Context: According to our release policy, the latest released version and
last 2 released versions are supported at any given time. With an expected
timeline of a minor release every 3 months, that means a minor release is
typically supported for 9 months. So far, we've indicated that a release is
unsupported by deleting the corresponding release branch in our repository.

The new proposal is as follows:

   - Keep the unsupported release branches and not delete them. Instead, we
   would make it clear in the CHANGELOG and also on the downloads
    page in our website which releases
   are supported and which are not.
   - If a committer would like to backport a fix to an unsupported release
   branch, they can do so. Such a backport is not required but a committer can
   do it if they wish. Contributor and committer should've a dialog regarding
   this.
   - CI will keep running against both supported and unsupported release
   branches  (as it is today) and any issues that might arise will be fixed on
   a best effort basis.
   - A committer can ask a contributor to submit a backport review incase
   the backport is complicated. Our review tooling (post-reviews and
   reviewbot) will be updated to make this possible.

Based on our experience with the current policy in the last couple of years
and the reality of how some of the organizations are using Mesos, we
believe this tweaks will make it more practical and useful.

Please let us know your thoughts by replying here or chatting in #community
in our slack channel.

Thanks,
Vinod (on behalf of Community WG)


[Community WG] Reminder: Meeting today at 10:30 AM PST

2019-01-14 Thread Vinod Kone
Hi folks,

This is a reminder that we have community WG meeting today at 10:30 AM PST.

The agenda for the meeting is here
.
Please feel free to add more items to the agenda.

See you there,

Vinod


Re: [VOTE] Release Apache Mesos 1.7.1 (rc1)

2019-01-02 Thread Vinod Kone
Also, another error
<https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/57/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>
.

/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result ssl_handshaker_extract_peer(tsi_handshaker*,
tsi_peer*)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1011:71:
error: 'SSL_get0_alpn_selected' was not declared in this scope
   SSL_get0_alpn_selected(impl->ssl, _selected, _selected_len);
   ^
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result tsi_create_ssl_client_handshaker_factory(const
tsi_ssl_pem_key_cert_pair*, const char*, const char*, const char**,
uint16_t, tsi_ssl_client_handshaker_factory**)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1417:73:
error: 'SSL_CTX_set_alpn_protos' was not declared in this scope
   static_cast(impl->alpn_protocol_list_length))) {
 ^
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:
In function 'tsi_result
tsi_create_ssl_server_handshaker_factory_ex(const
tsi_ssl_pem_key_cert_pair*, size_t, const char*,
tsi_client_certificate_request_type, const char*, const char**,
uint16_t, tsi_ssl_server_handshaker_factory**)':
/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0/src/core/tsi/ssl_transport_security.cc:1557:79:
error: 'SSL_CTX_set_alpn_select_cb' was not declared in this scope

server_handshaker_factory_alpn_callback, impl);
   ^
make[7]: *** [CMakeFiles/grpc.dir/src/core/tsi/ssl_transport_security.cc.o]
Error 1
make[7]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[6]: *** [CMakeFiles/grpc.dir/all] Error 2
make[6]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[5]: *** [CMakeFiles/grpc.dir/rule] Error 2
make[5]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[4]: *** [grpc] Error 2
make[4]: Leaving directory
`/mesos/build/3rdparty/grpc-1.10.0/src/grpc-1.10.0-build'
make[3]: *** [3rdparty/grpc-1.10.0/src/grpc-1.10.0-stamp/grpc-1.10.0-build]
Error 2
make[3]: Leaving directory `/mesos/build'
make[2]: *** [3rdparty/CMakeFiles/grpc-1.10.0.dir/all] Error 2
make[2]: *** Waiting for unfinished jobs



On Wed, Jan 2, 2019 at 3:35 PM Vinod Kone  wrote:

> I see an issue
> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/57/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu:14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console>
> with clang compiler when running it in ASF CI. Is this a known issue?
>
> ../../src/resource_provider/storage/provider.cpp:3190:5: error: conditional 
> expression is ambiguous; 'Future>' can be converted 
> to 'Future>' and vice versa
> ? createVolume(
> ^ ~
>
>
>
> On Wed, Jan 2, 2019 at 2:11 PM Benjamin Mahler  wrote:
>
>> +1 (binding)
>>
>> make check passes on macOS 10.14.2
>>
>> $ clang++ --version
>> Apple LLVM version 10.0.0 (clang-1000.10.44.4)
>> Target: x86_64-apple-darwin18.2.0
>> Thread model: posix
>> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>>
>> $ ./configure CC=clang CXX=clang++ CXXFLAGS="-Wno-deprecated-declarations"
>> --disable-python --disable-java --with-apr=/usr/local/opt/apr/libexec
>> --with-svn=/usr/local/opt/subversion && make check -j12
>> ...
>> [  PASSED  ] 1956 tests.
>>
>> On Fri, Dec 21, 2018 at 5:48 PM Chun-Hung Hsiao 
>> wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.7.1.
>> >
>> >
>> > 1.7.1 includes the following:
>> >
>> >
>> 
>> > * This is a bug fix release. Also includes performance and API
>> >   improvements:
>> >
>> >   * **Allocator**: Improved allocation cycle time substantially
>> > (see MESOS-9239 and MESOS-9249). These reduce the allocation
>> > cycle time in some benchmarks by 80%.
>> >
>> >   * **Scheduler API**: Improved the experimental `C

Re: [VOTE] Release Apache Mesos 1.7.1 (rc1)

2019-01-02 Thread Vinod Kone
I see an issue

with clang compiler when running it in ASF CI. Is this a known issue?

../../src/resource_provider/storage/provider.cpp:3190:5: error:
conditional expression is ambiguous; 'Future>'
can be converted to 'Future>' and vice versa
? createVolume(
^ ~



On Wed, Jan 2, 2019 at 2:11 PM Benjamin Mahler  wrote:

> +1 (binding)
>
> make check passes on macOS 10.14.2
>
> $ clang++ --version
> Apple LLVM version 10.0.0 (clang-1000.10.44.4)
> Target: x86_64-apple-darwin18.2.0
> Thread model: posix
> InstalledDir: /Library/Developer/CommandLineTools/usr/bin
>
> $ ./configure CC=clang CXX=clang++ CXXFLAGS="-Wno-deprecated-declarations"
> --disable-python --disable-java --with-apr=/usr/local/opt/apr/libexec
> --with-svn=/usr/local/opt/subversion && make check -j12
> ...
> [  PASSED  ] 1956 tests.
>
> On Fri, Dec 21, 2018 at 5:48 PM Chun-Hung Hsiao 
> wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 1.7.1.
> >
> >
> > 1.7.1 includes the following:
> >
> >
> 
> > * This is a bug fix release. Also includes performance and API
> >   improvements:
> >
> >   * **Allocator**: Improved allocation cycle time substantially
> > (see MESOS-9239 and MESOS-9249). These reduce the allocation
> > cycle time in some benchmarks by 80%.
> >
> >   * **Scheduler API**: Improved the experimental `CREATE_DISK` and
> > `DESTROY_DISK` operations for CSI volume recovery (see MESOS-9275
> > and MESOS-9321). Storage local resource providers now return disk
> > resources with the `source.vendor` field set, so frameworks needs to
> > upgrade the `Resource` protobuf definitions.
> >
> >   * **Scheduler API**: Offer operation feedbacks now present their agent
> > IDs and resource provider IDs (see MESOS-9293).
> >
> >
> > The CHANGELOG for the release is available at:
> >
> >
> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.7.1-rc1
> >
> >
> 
> >
> > The candidate for Mesos 1.7.1 release is available at:
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.7.1-rc1/mesos-1.7.1.tar.gz
> >
> > The tag to be voted on is 1.7.1-rc1:
> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.7.1-rc1
> >
> > The SHA512 checksum of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.7.1-rc1/mesos-1.7.1.tar.gz.sha512
> >
> > The signature of the tarball can be found at:
> >
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.7.1-rc1/mesos-1.7.1.tar.gz.asc
> >
> > The PGP key used to sign the release is here:
> > https://dist.apache.org/repos/dist/release/mesos/KEYS
> >
> > The JAR is in a staging repository here:
> >
> >
> https://repository.apache.org/content/repositories/releases/org/apache/mesos/mesos/1.7.1-rc1/
> >
> > Please vote on releasing this package as Apache Mesos 1.7.1!
> >
> > To accommodate for the holidays, the vote is open until Mon Dec 31
> > 14:00:00 PST 2018 and passes if a majority of at least 3 +1 PMC votes are
> > cast.
> >
> > [ ] +1 Release this package as Apache Mesos 1.7.1
> > [ ] -1 Do not release this package because ...
> >
> > Thanks,
> > Chun-Hung & Gaston
> >
>


Re: FW: full Zookeeper authentication

2018-12-06 Thread Vinod Kone
Dmitrii.

That approach sounds reasonable. Would you like to work on this? Are you
looking for a reviewer/shepherd?

On Thu, Dec 6, 2018 at 11:28 AM Kishchukov, Dmitrii (NIH/NLM/NCBI) [C] <
dmitrii.kishchu...@nih.gov> wrote:

> Mesos allow using only digest authentication scheme for Zookeeper. Which
> is bad because Zookeeper has quite a flexible security model.
> It is easy to make you own authenticator with its own scheme name.
>
> To support fully Zookeeper authentication, Mesos has pass two items into
> Zookeeper:
> scheme and credentials.
> credentials can have different format depending on authentication scheme.
> For digest scheme it is ‘login:password’
>
> All Mesos should do just pass scheme and credentials to Zookeeper.
>
> Another improvement might be be to configure credentials via file instead
> of URI
>
> For example it can be two command line options:
> --zk_auth_scheme and –zk_auth_credentials
>
> It can be used like this:
> --zk_auth_scheme=some_custome_scheme –zk_auth_credentials=filename
>
> --zk_auth_credentials can just get all contents of the file as credentials
> string.
>
> Class Authentication in Mesos already contains all that we need. The
> problem is what Mesos pass to the constructor.
>
>
> --
>
> Dmitrii Kishchukov.
>
>


Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-03 Thread Vinod Kone
Thanks Meng for the explanation.

I imagine most frameworks do not remember what stuff they filtered much
less figure out how previously filtered stuff  can satisfy new operations.
That sounds complicated!

But I like your example. So a suggestion we could make to frameworks could
be to use CLEAR_FILTERS when they have new work, e.g., scale up/down, new
app (they might want to use this even if they aren't suppressed!); and to
use UNSUPPRESS when they are rescheduling old work?

Thoughts?

On Mon, Dec 3, 2018 at 6:51 PM Meng Zhu  wrote:

> Hi Vinod:
>
> Yeah, `CLEAR_FILTERS` sounds good.
>
> UNSUPPRESS should be used whenever currently suppressed framework wants to
> resume getting offers after a previous SUPPRESS call.
>
> As for `CLEAR_FILTERS`, the short (but not very useful) suggestion is to
> call it whenever the framework wants to clear all the existing filters.
>
> To elaborate it, frameworks decline and accumulate filters when it is
> trying to satisfy a particular set of requirements/constraints to perform
> an operation. Once the operation is done and the next operation comes, if
> the new operation has the same (or strictly more) resource
> requirements/constraints compared to the last one, then it is more
> efficient to KEEP the existing filters instead of getting useless offers
> and rebuild the filters again.
>
> On the other hand, if the requirements/constraints are different (i.e. some
> of the previous requirements could be loosened), then it means the existing
> filter no longer make sense. Then it might be a good idea to clear all the
> existing filters to improve the chance of getting more offers.
>
> Note, although we introduce `CLEAR_FILTERS` as part of decoupling the
> `REVIVE` call, its usage should be independent of suppression/revival. The
> decision to clear the filters only depends on whether the existing filters
> make sense for the current operation constraints/requirements.
>
> Examples:
> If a framework first launches a task, then wants to launch a replacement
> task (because the first task failed), then it should keep the filters built
> up during the first launch. However, if the framework wants to launch a
> second task with a completely different resource profile, then clearing
> filters might help to get more (otherwise filtered) offers and hence speed
> up the deployment.
>
> -Meng
>
> On Mon, Dec 3, 2018 at 12:40 PM Vinod Kone  wrote:
>
> > Hi Meng,
> >
> > What would be the recommendation for framework authors on when to use
> > UNSUPPRESS vs CLEAR_FILTER?
> >
> > Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?
> >
> > On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu  wrote:
> >
> >> Hi:
> >>
> >> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> >> clear_filter in order to decouple the dual-semantics of the current
> revive
> >> call.
> >>
> >> As pointed out in the Mesos framework scalability guide
> >> <
> http://mesos.apache.org/documentation/latest/app-framework-development-guide/#multi-scheduler-scalability
> >,
> >> utilizing the suppress
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#suppress>
> >> call is the key to get your cluster to a large number of frameworks
> >> <
> https://schd.ws/hosted_files/mesoscon18/84/Scaling%20Mesos%20to%20Thousands%20of%20Frameworks.pdf
> >.
> >> In short, when a framework is idling with no intention to launch any
> tasks,
> >> it should suppress to inform the Mesos to stop sending any more offers.
> And
> >> the framework should revive
> >> <
> http://mesos.apache.org/documentation/latest/scheduler-http-api/#revive>
> >> when new work arrives. This way, the allocator will skip the framework
> when
> >> performing resource allocations. As a result, thorny issues such as
> offer
> >> starvation and resource fragmentation would be greatly mitigated.
> >>
> >> That being said. The suppress/revive calls currently are a little bit
> >> unwieldy due to MESOS-9028
> >> <https://issues.apache.org/jira/browse/MESOS-9028>:
> >>
> >> The revive call has two semantics. It unsuppresses the framework AND
> >> clears all the existing filters. The later makes the revive call
> >> non-idempotent. And sometimes users may want to keep the existing
> filters
> >> when reiving which is not possible atm.
> >>
> >> To decouple the semantics, as suggested in the ticket, we propose to add
> >> two new V1 scheduler calls:
> >>
> >> (1) `UNSUPPRESS` call requests the Mesos to 

Re: New scheduler API proposal: unsuppress and clear_filter

2018-12-03 Thread Vinod Kone
Hi Meng,

What would be the recommendation for framework authors on when to use
UNSUPPRESS vs CLEAR_FILTER?

Also, should it CLEAR_FILTERS instead of CLEAR_FILTER?

On Mon, Dec 3, 2018 at 2:26 PM Meng Zhu  wrote:

> Hi:
>
> tl;dr: We are proposing to add two new V1 scheduler APIs: unsuppress and
> clear_filter in order to decouple the dual-semantics of the current revive
> call.
>
> As pointed out in the Mesos framework scalability guide
> ,
> utilizing the suppress
> 
> call is the key to get your cluster to a large number of frameworks
> .
> In short, when a framework is idling with no intention to launch any tasks,
> it should suppress to inform the Mesos to stop sending any more offers. And
> the framework should revive
> 
> when new work arrives. This way, the allocator will skip the framework when
> performing resource allocations. As a result, thorny issues such as offer
> starvation and resource fragmentation would be greatly mitigated.
>
> That being said. The suppress/revive calls currently are a little bit
> unwieldy due to MESOS-9028
> :
>
> The revive call has two semantics. It unsuppresses the framework AND
> clears all the existing filters. The later makes the revive call
> non-idempotent. And sometimes users may want to keep the existing filters
> when reiving which is not possible atm.
>
> To decouple the semantics, as suggested in the ticket, we propose to add
> two new V1 scheduler calls:
>
> (1) `UNSUPPRESS` call requests the Mesos to resume sending offers;
> (2) `CLEAR_FILTER` call will explicitly clear all the existing filters.
>
> To make life easier, both calls will return 200 OK (as opposed to 202
> returned by most existing scheduler calls, including `SUPPRESS` and
> `REVIVE`).
>
> We will keep the revive call and its semantics (i.e. unsupppress AND clear
> filters) for backward compatibility.
>
> Note, the changes are proposed for V1 API only. Thus, once the changes are
> landed, framework developers are encouraged to move to V1 API to take
> advantage of the new calls (among many other benefits).
>
> Any feedback/comments are welcome.
>
> -Meng
>


Re: Propose to create a Kubernetes framework for Mesos

2018-11-28 Thread Vinod Kone
Cameron and Michal: I would love to understand your motivations and use
cases for a k8s Mesos framework in a bit more detail. Looks like you are
willing to rewrite your existing app definitions into k8s API spec. At this
point, why are you still interested in Mesos as a CAAS backend? Is it
because of scalability / reliability? Or is it because you still want to
run non-k8s workloads/frameworks in this world? What are these workloads?

In general, I'm in favor of Mesos coming shipped with a default scheduler.
I think it might help with the adoption similar to what happened with the
command/default executor. In hindsight, we should've done this a long time
ago. But, oh well, we were too optimistic that a single "default" scheduler
will rule in the ecosystem which didn't quite pan out.

However, I'm not sure if re-implementing k8s-scheduler as a Mesos framework
is the right approach. I imagine k8s scheduler is significant piece of
code  which we need to re-implement and on top of it as new API objects are
added to k8s API, we need to keep pace with k8s scheduler for parity. The
approach we (in the community) took with Spark (and Jenkins to some extent)
was for the scheduling innovation happen in Spark community and we just let
Spark launch spark executors via Mesos and let Spark launch its tasks out
of band of Mesos. We used to have a version of Spark framework (fine
grained mode?) where spark tasks were launched via Mesos offers but that
was deprecated, partly because of maintainability. Will this k8s framework
have similar problem? Sounds like one of the problems with the existing k8s
framework implementations it the pre-launching of kubelets; can we use the
k8s autoscaler to solve that problem?

Also, I think (I might be wrong) most k8s users are not directly creating
pods via the API but rather using higher level abstractions like replica
sets, stateful sets, daemon sets etc. How will that fit into this
architecture? Will the framework need to re-implement those controllers as
well?

Is there an integration point in k8s ecosystem where we can reuse the
existing k8s schedulers and controllers but run the pods with mesos
container runtime?

All, in all, I'm +1 to explore the ideas in a WG.


On Wed, Nov 28, 2018 at 2:05 PM Paulo Pires  wrote:

> Hello all,
>
> As a Kubernetes fan, I am excited about this proposal.
> However, I would challenge this community to think more abstractly about
> the problem you want to address and any solution requirements before
> discussing implementation details, such as adopting VK.
>
> Don't take me wrong, VK is a great concept: a Kubernetes node that
> delegates container management to someone else.
> But allow me to clarify a few things about it:
>
> - VK simply provides a very limited subset of the kubelet functionality,
> namely the Kubernetes node registration and the observation of Pods that
> have been assigned to it. It doesn't do pod (intra or inter) networking nor
> delegates to CNI, doesn't do volume mounting, and so on.
> - Like the kubelet, VK doesn't implement scheduling. It also doesn't
> understand anything else than a Pod and its dependencies (e.g. ConfigMap or
> Secret), meaning other primitives, such as DaemonSet, Deployment,
> StatefulSet, or extensions, such as CRDs are unknown to the VK.
> - While the kubelet manages containers through CRI API (Container Runtime
> Interface), the VK does it through its own Provider API.
> - kubelet translates from Kubernetes primitives to CRI primitives, so CRI
> implementations only need to understand CRI. However, the VK does no
> translation and passes Kubernetes primitives directly to a provider,
> requiring the VK provider to understand Kubernetes primitives.
> - kubelet talks to CRI implementations through a gRPC socket. VK talks to
> providers in-process and is highly-opinionated about the fact a provider
> has no lifecycle (there's no _start_ or _stop_, as there would be for a
> framework). There are talks about having Provide API over gRPC but it's not
> trivial to decide[2].
>
> Now, if you are still thinking about implementation details, and having
> some experience trying to create a VK provider for Mesos[1], I can tell you
> the VK, as is today, is not a seamless fit.
> That said, I am willing to help you figure out the design and pick the
> right pieces to execute, if this is indeed something you want to do.
>
> 1 -
> https://github.com/pires/virtual-kubelet/tree/mesos_integration/providers/mesos
> 2 - https://github.com/virtual-kubelet/virtual-kubelet/issues/160
>
> Cheers,
> Pires
>
> On Wed, Nov 28, 2018 at 5:38 AM Jie Yu  wrote:
>
>> + user list as well to hear more feedback from Mesos users.
>>
>> I am +1 on this proposal to create a Mesos framework that exposes k8s
>> API, and provide nodeless
>> 
>> experience to users.
>>
>> Creating Mesos framework that provides k8s API is not a new idea. For
>> instance, the 

Re: Welcome Meng Zhu as PMC member and committer!

2018-10-31 Thread Vinod Kone
Congrats Meng!

Thanks,
Vinod

> On Oct 31, 2018, at 4:26 PM, Gilbert Song  wrote:
> 
> Well deserved, Meng!
> 
>> On Wed, Oct 31, 2018 at 2:36 PM Benjamin Mahler  wrote:
>> Please join me in welcoming Meng Zhu as a PMC member and committer!
>> 
>> Meng has been active in the project for almost a year and has been very 
>> productive and collaborative. He is now one of the few people of understands 
>> the allocator code well, as well as the roadmap for this area of the 
>> project. He has also found and fixed bugs, and helped users in slack.
>> 
>> Thanks for all your work so far Meng, I'm looking forward to more of your 
>> contributions in the project.
>> 
>> Ben


Re: Propose to run debug container as the same user of its parent container by default

2018-10-25 Thread Vinod Kone
Sounds good to me.

If I understand correctly, you want to treat this is a bug and backport it
to previous release branches? So, you are also asking whether backporting
this bug will be considered a breaking change for any existing users?

On Thu, Oct 25, 2018 at 11:46 AM James Peach  wrote:

>
>
> On Oct 23, 2018, at 7:47 PM, Qian Zhang  wrote:
>
> Hi all,
>
> Currently when launching a debug container (e.g., via `dcos task exec` or
> command health check) to debug a task, by default Mesos agent will use the
> executor's user as the debug container's user. There are actually 2 cases:
> 1. Command task: Since the command executor's user is same with command
> task's user, so the debug container will be launched as the same user of
> the command task.
> 2. The task in a task group: The default executor's user is same with the
> framework user, so in this case the debug container will be launched as the
> same user of the framework rather than the task.
>
> Basically I think the behavior of case 1 is correct. For case 2, we may
> run into a situation that the task is run as a user (e.g., root), but the
> debug container used to debug that task is run as another user (e.g., a
> normal user, suppose framework is run as a normal user), this may not be
> what user expects.
>
> So I created MESOS-9332  and
> propose to run debug container as the same user of its parent container
> (i.e., the task to be debugged) by default. Please let me know if you have
> any comments, thanks!
>
>
> This sounds like a sensible default to me. I can imagine for debug use
> cases you might want to run the debug container as root or give it elevated
> capabilities, but that should not be the default.
>
> J
>


Re: [VOTE] Release Apache Mesos 1.5.2 (rc1)

2018-10-24 Thread Vinod Kone
-1

Tested on ASF CI. Looks like Clang builds are failing with a build error.
See example build output

below:

libtool: compile:  clang++-3.5 -DPACKAGE_NAME=\"mesos\"
-DPACKAGE_TARNAME=\"mesos\" -DPACKAGE_VERSION=\"1.5.2\"
"-DPACKAGE_STRING=\"mesos 1.5.2\"" -DPACKAGE_BUGREPORT=\"\"
-DPACKAGE_URL=\"\" -DPACKAGE=\"mesos\" -DVERSION=\"1.5.2\"
-DSTDC_HEADERS=1 -DHAVE_SYS_TYPES_H=1 -DHAVE_SYS_STAT_H=1
-DHAVE_STDLIB_H=1 -DHAVE_STRING_H=1 -DHAVE_MEMORY_H=1
-DHAVE_STRINGS_H=1 -DHAVE_INTTYPES_H=1 -DHAVE_STDINT_H=1
-DHAVE_UNISTD_H=1 -DHAVE_DLFCN_H=1 -DLT_OBJDIR=\".libs/\"
-DHAVE_CXX11=1 -DHAVE_PTHREAD_PRIO_INHERIT=1 -DHAVE_PTHREAD=1
-DHAVE_FTS_H=1 -DHAVE_APR_POOLS_H=1 -DHAVE_LIBAPR_1=1 -DHAVE_LIBCURL=1
-DMESOS_HAS_JAVA=1 -DHAVE_EVENT2_EVENT_H=1 -DHAVE_LIBEVENT=1
-DHAVE_EVENT2_THREAD_H=1 -DHAVE_LIBEVENT_PTHREADS=1 -DHAVE_LIBSASL2=1
-DHAVE_OPENSSL_SSL_H=1 -DHAVE_EVENT2_BUFFEREVENT_SSL_H=1
-DHAVE_LIBEVENT_OPENSSL=1 -DUSE_SSL_SOCKET=1 -DHAVE_SVN_VERSION_H=1
-DHAVE_LIBSVN_SUBR_1=1 -DHAVE_SVN_DELTA_H=1 -DHAVE_LIBSVN_DELTA_1=1
-DHAVE_ZLIB_H=1 -DHAVE_LIBZ=1 -DHAVE_PYTHON=\"2.7\"
-DMESOS_HAS_PYTHON=1 -I. -I../../src -Werror
-DLIBDIR=\"/mesos/mesos-1.5.2/_inst/lib\"
-DPKGLIBEXECDIR=\"/mesos/mesos-1.5.2/_inst/libexec/mesos\"
-DPKGDATADIR=\"/mesos/mesos-1.5.2/_inst/share/mesos\"
-DPKGMODULEDIR=\"/mesos/mesos-1.5.2/_inst/lib/mesos/modules\"
-I../../include -I../include -I../include/mesos -DPICOJSON_USE_INT64
-D__STDC_FORMAT_MACROS -isystem ../3rdparty/boost-1.53.0 -isystem
../3rdparty/concurrentqueue-7b69a8f -I../3rdparty/elfio-3.2
-I../3rdparty/glog-0.3.3/src -I../3rdparty/leveldb-1.19/include
-I../../3rdparty/libprocess/include -I../3rdparty/nvml-352.79
-I../3rdparty/picojson-1.3.0 -I../3rdparty/protobuf-3.5.0/src
-I../../3rdparty/stout/include
-I../3rdparty/zookeeper-3.4.8/src/c/include
-I../3rdparty/zookeeper-3.4.8/src/c/generated -isystem
/usr/include/subversion-1 -isystem /usr/include/apr-1 -isystem
/usr/include/apr-1.0 -pthread -Wall -Wsign-compare -Wformat-security
-fstack-protector-strong -fPIC -g1 -O0 -std=c++11 -MT
slave/containerizer/libmesos_no_3rdparty_la-containerizer.lo -MD -MP
-MF slave/containerizer/.deps/libmesos_no_3rdparty_la-containerizer.Tpo
-c ../../src/slave/containerizer/containerizer.cpp  -fPIC -DPIC -o
slave/containerizer/.libs/libmesos_no_3rdparty_la-containerizer.o
In file included from ../../src/slave/http.cpp:30:
In file included from ../../include/mesos/authorizer/authorizer.hpp:25:
../../3rdparty/libprocess/include/process/future.hpp:1089:3: error: no
matching member function for call to 'set'
  set(u);
  ^~~
../../src/slave/http.cpp:3196:10: note: in instantiation of function
template specialization
'process::Future::Future > >' requested here
  return slave->containerizer->attach(containerId)
 ^
../../3rdparty/libprocess/include/process/future.hpp:597:8: note:
candidate function not viable: no known conversion from 'const
process::Future >' to
'const process::http::Response' for 1st argument
  bool set(const T& _t);
   ^
../../3rdparty/libprocess/include/process/future.hpp:598:8: note:
candidate function not viable: no known conversion from 'const
process::Future >' to
'process::http::Response' for 1st argument
  bool set(T&& _t);
   ^







On Mon, Oct 22, 2018 at 12:53 AM Gilbert Song  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.5.2.
>
> 1.5.2 includes the following:
>
> 
>   * [MESOS-3790] - ZooKeeper connection should retry on `EAI_NONAME`.
>   * [MESOS-8128] - Make os::pipe file descriptors O_CLOEXEC.
>   * [MESOS-8418] - mesos-agent high cpu usage because of numerous
> /proc/mounts reads.
>   * [MESOS-8545] -
> AgentAPIStreamingTest.AttachInputToNestedContainerSession is flaky.
>   * [MESOS-8568] - Command checks should always call
> `WAIT_NESTED_CONTAINER` before `REMOVE_NESTED_CONTAINER`.
>   * [MESOS-8620] - Containers stuck in FETCHING possibly due to
> unresponsive server.
>   * [MESOS-8830] - Agent gc on old slave sandboxes could empty persistent
> volume data.
>   * [MESOS-8871] - Agent may fail to recover if the agent dies before
> image store cache checkpointed.
>   * [MESOS-8904] - Master crash when removing quota.
>   * [MESOS-8906] - `UriDiskProfileAdaptor` fails to update profile
> selectors.
>   * [MESOS-8917] - Agent leaking file descriptors into forked processes.
>   * [MESOS-8921] - Autotools don't work with newer OpenJDK versions.
>   * [MESOS-8935] - Quota limit "chopping" can lead to cpu-only and
> memory-only offers.
>   * [MESOS-8936] - Implement a Random Sorter for offer allocations.
>   * [MESOS-8942] - Master streaming API does not send 

Re: Request for Comments - Health Check API Proposal

2018-10-18 Thread Vinod Kone
I understand and am in agreement that `HealthCheckStatusInfo` will have
more information than `CheckStatusInfo`.

I would like us to put a little more thought into how that would look like
to be doubly sure that what we are introducing today will be evolvable into
that envisioned future. We have to live with API changes for a long time,
so I would like to see more rigor here (e.g., has the note on top of the
`HealthCheckStatusInfo` in the doc
<https://docs.google.com/document/d/1VLdaH7i7UDT3_38aOlzTOtH7lwH-laB8dCwNzte0DkU/edit#heading=h.lessdcojxc5v>
has
been discussed/resolved?) to avoid costly changes/deprecations.

On Thu, Oct 18, 2018 at 4:04 AM Alex Rukletsov  wrote:

> Thanks for the thoughts, Vinod! Answers inlined.
>
> On Wed, Oct 17, 2018 at 8:55 PM Vinod Kone  wrote:
>
> > One of the things we discussed when we added `CheckInfo` and
> > `CheckStatusInfo` was to make the older `HealthCheck` and `bool healthy`
> > field (inside `TaskStatus`) consistent with the new `Check` format.
> >
> Correct.
>
> >
> > IIRC, some of the changes we wanted to do were
> >
> >- Deprecate `HealthCheck` and introduce a new `HealthCheckInfo` proto
> >
> Correct.
>
> >- The nested messages inside `HealthCheck` (e.g., `HTTPCheckInfo`)
>
>should be named differently in `HealthCheckInfo` (e.g., `Http`)
> >
> Likely, yes.
>
> >- Deprecate `bool healthy` in TaskStatusInfo and introduce a new
> >`HealthCheckStatusInfo` which looks similar to `CheckStatusInfo`
> >
> Correct.
>
> >
> > Right now, the proposal seems to only address the last point without
> > addressing the first two, which feels weird to me. I would prefer to see
> > them addressed in one shot.
> >
> Can you please explain why? Is there any problem you foresee if we do it
> step by step? Introducing `HealthCheckStatusInfo` now solves an important
> problem and does not seem to introduce new issues.
>
> >
> > Additionally, the proposed `HealthCheckStatusInfo` proto looks completely
> > different from `CheckStatusInfo`. Is that intentional? I hope we are not
> > thinking of deprecating it again when we come around to fix `HealthCheck`
> > proto to be consistent with `CheckInfo` ?
> >
> How do you think it should look like? Why will we deprecate it?
>
> Health checks are different from checks in the way the result of a check is
> interpreted on the agent. In other words health check is an extra step on
> top of a check. We might include `CheckStatusInfo` or its contents into
> `HealthCheckStatusInfo`, but... should we think about this now? It is nice
> to have lower level info from the check in the heath status update, but it
> also means more data to transfer. But interpretation—health—we definitely
> need.
>
> Greg, I'm +1 on your proposal.
>
> >
> > Thanks,
> >
> > On Wed, Oct 17, 2018 at 1:26 PM Greg Mann  wrote:
> >
> > > Hi all,
> > > Some users have recently reported issues with our current
> implementation
> > > of health checks. See this ticket
> > > <https://issues.apache.org/jira/browse/MESOS-6417> for an introduction
> > to
> > > the issue.
> > >
> > > To summarize: we currently use a single 'optional bool healthy' field
> > > within the 'TaskStatus' message to indicate the result of a health
> check.
> > > This allows us to expose 3 health states to users:
> > > 1) 'healthy' field is unset = no health check specified, or health
> check
> > > failed but grace period has not yet elapsed, or health check has not
> yet
> > > been attempted
> > > 2) 'healthy' field is set to 'false' = a health check is specified and
> it
> > > returned 'false'
> > > 3) 'healthy' field is set to 'true' = a health check is specified and
> it
> > > returned 'true'
> > >
> > > The issue is that some users need to distinguish between the three
> > > scenarios in #1: no health check is specified, OR the task is not yet
> > > healthy but we are in the grace period. An example use case would be a
> > load
> > > balancer which needs to wait for a healthy status to route traffic, but
> > > which immediately routes traffic to tasks which have no health check
> > > defined.
> > >
> > > This issue was recognized during the design of Mesos generalized
> checks;
> > > for those checks, we use the presence of the 'check_status' field to
> > > indicate whether or not a check is defined for the task. While
> consumers
> > > could make use of generalized checks as a workaround, this does not
> allow
>

Re: Mesos Flakiness Statistics

2018-10-17 Thread Vinod Kone
This is great. Thanks Benno for sharing!

What did you use to do the analysis? I would love it if we can have graphs
that we can run on TVs.

On Mon, Oct 15, 2018 at 5:23 AM Benno Evers  wrote:

> > Is there any reason the first portion of the test name is being
> truncated?
>
> There is, although it is slightly embarrassing: We currently only store the
> detailed data including full test case name and platform
> for about a week, for anything older than that the abridged version is the
> best I could find. The data should still be good, though,
> since we hopefully don't have two tests with the same name that are both
> frequently flaky.
>
> In particular, the ResourceStatistics refers to the
> 'MesosContainerizerSlaveRecoveryTest.ResourceStatistics' test tracked
> in MESOS-5048.
>
> On Fri, Oct 12, 2018 at 7:03 PM Benjamin Mahler 
> wrote:
>
> > Thanks for sending this Benno! I for one would love to see more regular
> > communication about the state of CI, especially so that I know how I can
> > help fix tests (right now I don't know which flaky tests are in areas I
> am
> > maintaining).
> >
> > Is there any reason the first portion of the test name is being
> truncated?
> > For example, ResourceStatistics matches several tests:
> >
> > $ grep -R ' ResourceStatistics)' src/tests
> > src/tests/containerizer/xfs_quota_tests.cpp:TEST_F(ROOT_XFS_QuotaTest,
> > ResourceStatistics)
> >
> >
> src/tests/slave_recovery_tests.cpp:TEST_F(MesosContainerizerSlaveRecoveryTest,
> > ResourceStatistics)
> > src/tests/disk_quota_tests.cpp:TEST_F(DiskQuotaTest, ResourceStatistics)
> >
> > Did we actually fix the flaky tests or did we disable them? I see only 22
> > disabled tests, which is better than I expected, but I hope there's good
> > tracking on getting these un-disabled again:
> >
> > $ grep -R DISABLED src/tests | grep -v DISABLED_ON_WINDOWS | grep -v
> > NestedQuota | grep -v ChildRole | grep -v NestedRoles | grep -v
> > environment.cpp | wc -l
> >   22
> >
> > On Fri, Oct 12, 2018 at 7:38 AM Benno Evers 
> wrote:
> >
> > > Hey all,
> > >
> > > as you might know, we've set up an internal CI system that is running
> > `make
> > > check` on a variety of different platforms and configurations, 16 in
> > total.
> > >
> > > As we've experienced more and more pain maintaining a green master,
> I've
> > > compiled some statistics about which tests are most flaky. I thought
> > other
> > > people might also be interested to have a look at that data:
> > >
> > > Last Week:
> > >
> > > # CI Statistics since 2018-10-05 14:22:35.422882 for branches
> > > containing 'asf/master'
> > > Total: 41 failing tests, 28 unique. (avg 0.14236111 failing
> tests
> > > per build)
> > >
> > > Top 5 failing tests:
> > > 6x: [empty]
> > > 4x: ResourceStatistics
> > > 2x: CreateDestroyDiskRecovery
> > > 2x: INTERNET_CURL_InvokeFetchByName
> > > 2x: RecoverNestedContainer
> > >
> > > Last Month:
> > >
> > > # CI Statistics since 2018-09-12 14:23:36.272031 for branches
> > > containing 'asf/master'
> > > Total: 320 failing tests, 75 unique. (avg 0.285714285714 failing
> > tests
> > > per build)
> > >
> > > Top 5 failing tests:
> > > 57x: Used
> > > 32x: LongLivedDefaultExecutorRestart
> > > 27x: PythonFramework
> > > 23x: ROOT_CGROUPS_LaunchNestedContainerSessionsInParallel
> > > 22x: ResourceStatistics
> > >
> > > Last year:
> > >
> > > # CI Statistics since 2017-10-12 14:24:31.639792 for branches
> > > containing 'asf/master'
> > > Total: 3045 failing tests, 225 unique. (avg 0.184054642166 failing
> > > tests per build)
> > >
> > > Top 5 failing tests:
> > > 292x: [empty]
> > > 272x:
> > ROOT_LOGROTATE_UNPRIVILEGED_USER_RotateWithSwitchUserTrueOrFalse
> > > 136x: LOGROTATE_RotateInSandbox
> > > 136x: LOGROTATE_CustomRotateOptions
> > > 131x: ResourceStatistics
> > >
> > >
> > > I don't really have a point with all of this, but some observations:
> > >  - [empty] means that the `mesos-tests` binary crashed
> > >  - The data also includes "real", i.e. non-flaky test failures, but
> they
> > > should not appear in the top 5 lists because we would hopefully either
> > > revert or fix them before they can accumulate dozens of failures
> > >  - Over the whole year, we seem to be pretty good at fixing  the
> nastiest
> > > flakes, with only one of the top 5 still appearing in this weeks test
> > > results
> > >  - Sadly, the fail percentage isn't as different between now and then
> as
> > we
> > > might have hoped.
> > >
> > > Hope this was interesting, and best regards,
> > > --
> > > Benno Evers
> > > Software Engineer, Mesosphere
> > >
> >
>
>
> --
> Benno Evers
> Software Engineer, Mesosphere
>


Re: Request for Comments - Health Check API Proposal

2018-10-17 Thread Vinod Kone
One of the things we discussed when we added `CheckInfo` and
`CheckStatusInfo` was to make the older `HealthCheck` and `bool healthy`
field (inside `TaskStatus`) consistent with the new `Check` format.

IIRC, some of the changes we wanted to do were

   - Deprecate `HealthCheck` and introduce a new `HealthCheckInfo` proto
   - The nested messages inside `HealthCheck` (e.g., `HTTPCheckInfo`)
   should be named differently in `HealthCheckInfo` (e.g., `Http`)
   - Deprecate `bool healthy` in TaskStatusInfo and introduce a new
   `HealthCheckStatusInfo` which looks similar to `CheckStatusInfo`

Right now, the proposal seems to only address the last point without
addressing the first two, which feels weird to me. I would prefer to see
them addressed in one shot.

Additionally, the proposed `HealthCheckStatusInfo` proto looks completely
different from `CheckStatusInfo`. Is that intentional? I hope we are not
thinking of deprecating it again when we come around to fix `HealthCheck`
proto to be consistent with `CheckInfo` ?

Thanks,

On Wed, Oct 17, 2018 at 1:26 PM Greg Mann  wrote:

> Hi all,
> Some users have recently reported issues with our current implementation
> of health checks. See this ticket
>  for an introduction to
> the issue.
>
> To summarize: we currently use a single 'optional bool healthy' field
> within the 'TaskStatus' message to indicate the result of a health check.
> This allows us to expose 3 health states to users:
> 1) 'healthy' field is unset = no health check specified, or health check
> failed but grace period has not yet elapsed, or health check has not yet
> been attempted
> 2) 'healthy' field is set to 'false' = a health check is specified and it
> returned 'false'
> 3) 'healthy' field is set to 'true' = a health check is specified and it
> returned 'true'
>
> The issue is that some users need to distinguish between the three
> scenarios in #1: no health check is specified, OR the task is not yet
> healthy but we are in the grace period. An example use case would be a load
> balancer which needs to wait for a healthy status to route traffic, but
> which immediately routes traffic to tasks which have no health check
> defined.
>
> This issue was recognized during the design of Mesos generalized checks;
> for those checks, we use the presence of the 'check_status' field to
> indicate whether or not a check is defined for the task. While consumers
> could make use of generalized checks as a workaround, this does not allow
> them to both detect the presence of a check AND achieve the task-killing
> behavior that health checks provide.
>
> In order to address this, I would like to propose the following new
> message, and an addition to the 'TaskStatus' message:
>
> message HealthCheckStatusInfo {
>   enum Status {
> UNKNOWN = 0;
> HEALTHY = 1;
> UNHEALTHY = 2;
>   }
>
>   required Status status = 0;
> }
>
> message TaskStatus {
>   . . .
>
>   optional HealthCheckStatusInfo health_check_status = 17;
>
>   . . .
> }
>
> The semantics of these fields would be as follows:
>
> 'health_status' field:
> - If set, a health check has been set
> - If unset, a health check has not been set
>
> 'health_status.status' field:
> - UNKNOWN: The task has not become healthy but is still within its grace
> period (this state is also used if an internal error prevents us from
> running the health check successfully)
> - HEALTHY: The health check indicates the task is healthy
> - UNHEALTHY: The health check indicates the task is not healthy
>
> This change would also involve deprecating the existing 'healthy' field.
> In accordance with our deprecation policy, I believe we could not remove
> the deprecated field until we have a new major release (2.x).
>
> I'd love to hear feedback on this proposal, thanks in advance! I'll also
> add this as an agenda item to our upcoming API working group meeting on
> Tuesday, Oct. 16 at 11am PST.
>
> Cheers,
> Greg
>


Re: [dcos] Vote now for MesosCon 2018 proposals!

2018-09-20 Thread Vinod Kone
Voted!

I see some really good proposals in there. Really looking forward to the
final program!

On Thu, Sep 20, 2018 at 11:51 AM Jörg Schad  wrote:

> Dear Mesos Community,
>
> Please take a few minutes over the next few days and review what members
> of the community have submitted for MesosCon 2018
>  (which will be held in San Francisco between
> November 5th-7th)!
> To make voting easier, we structured the voting following the different
> tracks.
> Please visit the following links and submit your responses. Look through
> as few or as many talks as you'd like to, and give us your feedback on
> these talks.
>
> Core Track: https://www.surveymonkey.com/r/mesoscon18-core
> Ecosystem Track: https://www.surveymonkey.com/r/mesoscon18-ecosystem
> DC/OS Track: https://www.surveymonkey.com/r/mesoscon18-dcos
> Frameworks Track: https://www.surveymonkey.com/r/mesoscon18-frameworks
> Operations Tracks: https://www.surveymonkey.com/r/mesoscon18-operations
> Misc Track: https://www.surveymonkey.com/r/mesoscon18-misc
> User Track: https://www.surveymonkey.com/r/mesoscon18-users
>
> Please submit your votes until Wednesday, Sept 26th 11:59 PM PDT, so you
> have one week to vote and make your voice heard!
>
> Thank you for your help and looking forward to a great MesosCon!
> Your MesosCon PC
>
> --
> You received this message because you are subscribed to the Google Groups
> "users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to users+unsubscr...@dcos.io.
> To post to this group, send email to us...@dcos.io.
> To view this discussion on the web visit
> https://groups.google.com/a/dcos.io/d/msgid/users/CALPK6M5jiT8jwm-GGrx9zV5ih17EfreGbZG5zrTfpB%3Dz13OoMA%40mail.gmail.com
> 
> .
>


Re: [VOTE] Release Apache Mesos 1.7.0 (rc2)

2018-08-29 Thread Vinod Kone
I prefer 1) since you already have the fix. 

Thanks,
Vinod

> On Aug 29, 2018, at 8:44 PM, Chun-Hung Hsiao  wrote:
> 
> I found two issues when compiling with clang 3.5:
> 
> 1. The `-Wno-inconsistent-missing-override` option added in 
> https://reviews.apache.org/r/67953/
> is not recognized by clang 3.5.
> 2. The same issue described in https://reviews.apache.org/r/55400/ would make
> `src/resource_provider/storage/provider.cpp` fail to compile.
> 
> I put up two patches to resolve the above issues (no review posted yet):
> https://github.com/chhsia0/mesos/commit/1f60aa3b3a7eede4a2a5ddf1288efff6a801ea97
> https://github.com/chhsia0/mesos/commit/84d13a0468f34726e4a920915cdda7e0e0a829b8
> 
> However, I'm not sure if this is worth blocking a release. We have 2 options:
> 1. Fail this vote and cut rc3 with the above patches to support clang 3.5.
> 2. Keep rc2 but bump the version requirement for clang on the website. (If 
> so, then the above patches are not needed.)
> 
> I was wondering which option would be more appropriate so I'd like to ask for 
> some feedbacks. Thanks!
> 
>> On Wed, Aug 29, 2018 at 10:18 AM James Peach  wrote:
>> +1 (binding)
>> 
>> Built and tested on Fedora 28 (clang).
>> 
>>> On Aug 24, 2018, at 4:42 PM, Chun-Hung Hsiao  wrote:
>>> 
>>> Hi all,
>>> 
>>> Please vote on releasing the following candidate as Apache Mesos 1.7.0.
>>> 
>>> 
>>> 1.7.0 includes the following:
>>> 
>>> * Performance Improvements:
>>>   * Master `/state` endpoint: ~130% throughput improvement through RapidJSON
>>>   * Allocator: Improved allocator cycle significantly
>>>   * Agent `/containers` endpoint: Fixed a performance issue
>>>   * Agent container launch / destroy throughput is significantly improved
>>> * Containerization:
>>>   * **Experimental** Supported docker image tarball fetching from HDFS
>>>   * Added new `cgroups/all` and `linux/devices` isolators
>>>   * Added metrics for `network/cni` isolator and docker pull latency
>>> * Windows:
>>>   * Added support to libprocess for the Windows Thread Pool API
>>> * Multi-Framework Workloads:
>>>   * **Experimental** Added per-framework metrics to the master
>>>   * A new weighted random sorter was added as an alternative to the DRF 
>>> sorter
>>> 
>>> The CHANGELOG for the release is available at:
>>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.7.0-rc2
>>> 
>>> 
>>> The candidate for Mesos 1.7.0 release is available at:
>>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc2/mesos-1.7.0.tar.gz
>>> 
>>> The tag to be voted on is 1.7.0-rc2:
>>> https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.7.0-rc2
>>> 
>>> The SHA512 checksum of the tarball can be found at:
>>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc2/mesos-1.7.0.tar.gz.sha512
>>> 
>>> The signature of the tarball can be found at:
>>> https://dist.apache.org/repos/dist/dev/mesos/1.7.0-rc2/mesos-1.7.0.tar.gz.asc
>>> 
>>> The PGP key used to sign the release is here:
>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>> 
>>> The JAR is in a staging repository here:
>>> https://repository.apache.org/content/repositories/orgapachemesos-1233
>>> 
>>> Please vote on releasing this package as Apache Mesos 1.7.0!
>>> 
>>> The vote is open until Mon Aug 27 16:37:35 PDT 2018 and passes if a 
>>> majority of at least 3 +1 PMC votes are cast.
>>> 
>>> [ ] +1 Release this package as Apache Mesos 1.7.0
>>> [ ] -1 Do not release this package because ...
>>> 
>>> Thanks,
>>> Chun-Hung & Gaston
>> 


Re: MesosCon 2018 Location Change

2018-08-26 Thread Vinod Kone
+1 for Bay area 

Thanks,
Vinod

> On Aug 26, 2018, at 12:02 AM, Vaibhav Khanduja  
> wrote:
> 
> +1 for bay area.
> 
> Thx
> 
>> On Sat, Aug 25, 2018, 3:20 PM Jörg Schad  wrote:
>> 
>> Just one more comment on the reasoning here:
>> We (i.e., the PC) want MesosCon to be a user-driven conference and hence
>> have the conference at a location where we can gather most users.
>> We understand it might be more difficult to travel to the Bay Area from
>> Europe, but are already considering EU timezone friendly working groups
>> meetings which could be joined remotely. Stay tuned here.
>> We understand this is a beyond last minute change, but we are considering
>> as a result of community (i.e., everyone here) feedback.
>> 
>> Please also consider this is the first time we are organizing MesosCon as
>> community ourselves (the previous years it was organized by Linux
>> Foundation) and so far I must say kudos to everyone involved. It is great
>> to see everyone working on making it a great Mesos (+Marathon, + Paasta, +
>> ...) community conference!
>> 
>> Also feel free to reach out personally if you have questions!
>> 
>> 
>> 
>>> On Fri, Aug 24, 2018 at 2:23 PM, Sunil Shah  wrote:
>>> 
>>> Hey everyone,
>>> 
>>> As we continue to organise this year's MesosCon, I wanted to ask for your
>>> preferences on location of the conference. Several community members have
>>> expressed a desire to have the conference in the Bay Area (as opposed to
>>> New York, as currently planned).
>>> 
>>> As a reminder, this year's MesosCon is a community run conference and is
>>> planned for November 5th to 7th.
>>> 
>>> Please let me know if you have any strong feelings one way or another and
>>> I'll take a summary back to the MesosCon Committee.
>>> 
>>> Cheers,
>>> 
>>> Sunil
>>> (P.S., If you haven't submitted a talk already, please do
>>> !)
>>> 
>>> 
>>> 
>> 


Re: This Month in Mesos: August 2018

2018-08-15 Thread Vinod Kone
This is great. Thanks for the update Greg!

On Wed, Aug 15, 2018 at 5:29 PM Greg Mann  wrote:

> Hi all,
> My apologies for the lack of emails during the last few months - I'm going
> to try to get back into the routine! Here's your August update on recent
> developments in the Mesos community, organized by working group:
>
> Containerization
> This has continued to be an area of active development, with the following
> features recently merged:
>
>- Automatic image garbage collection for Mesos containerizer
>- HDFS fetching of Docker images in Mesos containerizer
>- Auto cgroup support
>- Container cgroup FS mounts
>- Many bug fixes!
>
> Find more info in the agenda/notes document
> 
> .
>
>
> Performance
> Performance improvements have landed in a variety of components within the
> codebase including metrics, containerization, and resource allocation:
>
>- Faster generation of metrics snapshots
>- Benchmark testing of containerizer performance
>- Quota-related performance improvements in the allocator
>- Parallel processing of master state requests
>
> More information is in the agenda/notes document
> 
> .
>
>
> Community
> The biggest news on the community front is the progress on organizing the
> next MesosCon! MesosCon 2018 will be held in New York City from Nov. 5-7.
> Talk proposals are being accepted until Aug. 27th, submit yours at
> https://mesoscon2018.org/ !
>
> We also recently moved the Mesos repository to gitbox, which allows us to
> integrate better with GitHub and will hopefully enable some improvements to
> our committers' tooling in the near future.
>
> More information in the agenda/notes document
> 
> .
>
>
> API
> Just a couple items to report here:
>
>- Persistent volumes can now be resized with the GROW and
>SHRINK_VOLUME operations
>- Per-framework metrics have been added which provide useful stats for
>every framework that registers with the master
>
> More information in the agenda/notes document
> 
> .
>
>
> Operations
> Many thanks to Gastón Kleiman for spearheading this new working group! The
> first meeting was held recently, with the next one coming up on Aug. 28 at
> 9am PST.
>
> One notable change which came out of the first meeting is the movement of
> the 'mesos_exporter' metrics processing tool into the Mesos GitHub org; it
> can now be found at https://github.com/mesos/mesos_exporter.
>
> More information in the agenda/notes document
> 
> .
>
>
> Mesos 1.7.0
> Chun-Hung and Gastón are managing the 1.7.0 release, which is just around
> the corner! They're planning to cut the first release candidate on Monday,
> Aug. 20th. Keep your eyes peeled for their email, and please help test and
> vote!
>
>
> That's it for this month, thanks for all the hard work everyone! See you
> at the next working group meetings :)
>
> Cheers,
> Greg
>


Re: [VOTE] Release Apache Mesos 1.4.2 (rc1)

2018-08-14 Thread Vinod Kone
I see some flaky tests in ASF CI, that I don't see already reported.

@Kapil Arya   Can you take a look at
https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/53 and see
if the flaky tests are due to bugs in test code and not source?

*Revision*: 612ec2c63a68b4d5b60d1d864e6703fde1c2a023

   - refs/tags/1.4.2-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Success]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Failed]

[image: Success]

--verbose autotools
[image: Success]

[image: Failed]

cmake
[image: Success]

[image: Success]


On Mon, Aug 13, 2018 at 7:41 PM Benjamin Mahler  wrote:

> +1 (binding)
>
> make check passes on macOS 10.13.6 with Apple LLVM version 9.1.0
> (clang-902.0.39.2).
>
> Thanks Kapil!
>
> On Wed, Aug 8, 2018 at 3:06 PM, Kapil Arya  wrote:
>
> > Hi all,
> >
> > Please vote on releasing the following candidate as Apache Mesos 1.4.2.
> >
> > 1.4.2 is a bug fix release. The CHANGELOG for the release is available
> at:
> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=blob_
> > plain;f=CHANGELOG;hb=1.4.2-rc1
> >
> > The candidate for Mesos 1.4.2 release is available at:
> >
> https://dist.apache.org/repos/dist/dev/mesos/1.4.2-rc1/mesos-1.4.2.tar.gz
> >
> > The tag to be voted on is 1.4.2-rc1:
> > https://gitbox.apache.org/repos/asf?p=mesos.git;a=commit;h=1.4.2-rc1
> >
> > The SHA512 checksum of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/1.4.2-rc1/
> > mesos-1.4.2.tar.gz.sha512
> >
> > The signature of the tarball can be found at:
> > https://dist.apache.org/repos/dist/dev/mesos/1.4.2-rc1/
> > mesos-1.4.2.tar.gz.asc
> >

Re: Build failed in Jenkins: Mesos-Reviewbot #23005

2018-08-07 Thread Vinod Kone
Should be fixed now.

On Tue, Aug 7, 2018 at 9:58 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See 
>
> --
> Started by user vinodkone
> [EnvInject] - Loading node environment variables.
> Building remotely on H24 (ubuntu xenial) in workspace <
> https://builds.apache.org/job/Mesos-Reviewbot/ws/>
>  > git rev-parse --is-inside-work-tree # timeout=10
> Fetching changes from the remote Git repository
>  > git config remote.origin.url
> https://gitbox.apache.org/repos/asf/mesos.git # timeout=10
> Fetching upstream changes from
> https://gitbox.apache.org/repos/asf/mesos.git
>  > git --version # timeout=10
>  > git fetch --tags --progress
> https://gitbox.apache.org/repos/asf/mesos.git
> +refs/heads/*:refs/remotes/origin/*
> ERROR: Error fetching remote repo 'origin'
> hudson.plugins.git.GitException: Failed to fetch from
> https://gitbox.apache.org/repos/asf/mesos.git
> at hudson.plugins.git.GitSCM.fetchFrom(GitSCM.java:888)
> at hudson.plugins.git.GitSCM.retrieveChanges(GitSCM.java:1155)
> at hudson.plugins.git.GitSCM.checkout(GitSCM.java:1186)
> at hudson.scm.SCM.checkout(SCM.java:504)
> at hudson.model.AbstractProject.checkout(AbstractProject.java:1208)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.defaultCheckout(AbstractBuild.java:574)
> at
> jenkins.scm.SCMCheckoutStrategy.checkout(SCMCheckoutStrategy.java:86)
> at
> hudson.model.AbstractBuild$AbstractBuildExecution.run(AbstractBuild.java:499)
> at hudson.model.Run.execute(Run.java:1794)
> at hudson.model.FreeStyleBuild.run(FreeStyleBuild.java:43)
> at
> hudson.model.ResourceController.execute(ResourceController.java:97)
> at hudson.model.Executor.run(Executor.java:429)
> Caused by: hudson.plugins.git.GitException: Command "git fetch --tags
> --progress https://gitbox.apache.org/repos/asf/mesos.git
> +refs/heads/*:refs/remotes/origin/*" returned status code 128:
> stdout:
> stderr: error: missing object referenced by 'refs/tags/1.6.1'
> error: Could not read 163dd77ef8702fce4d6bcadfbcee2f2aeeb0c5f5
> error: Could not read 2f6f3812c14b4272b168d2b6264058df16f4b693
> error: Could not read d52a898027238edefcb0048640089df08cb546eb
> error: Could not read 6e2dfd06c8c7780d408a59980ce9df0d8b0cbed9
> remote: Counting objects: 9865, done.
> remote: Compressing objects:   0% (1/3556)   remote: Compressing
> objects:   1% (36/3556)   remote: Compressing objects:   2%
> (72/3556)   remote: Compressing objects:   3% (107/3556)
>  remote: Compressing objects:   4% (143/3556)   remote: Compressing
> objects:   5% (178/3556)   remote: Compressing objects:   6%
> (214/3556)   remote: Compressing objects:   7% (249/3556)
>  remote: Compressing objects:   8% (285/3556)   remote: Compressing
> objects:   9% (321/3556)   remote: Compressing objects:  10%
> (356/3556)   remote: Compressing objects:  11% (392/3556)
>  remote: Compressing objects:  12% (427/3556)   remote: Compressing
> objects:  13% (463/3556)   remote: Compressing objects:  14%
> (498/3556)   remote: Compressing objects:  15% (534/3556)
>  remote: Compressing objects:  16% (569/3556)   remote: Compressing
> objects:  17% (605/3556)   remote: Compressing objects:  18%
> (641/3556)   remote: Compressing objects:  19% (676/3556)
>  remote: Compressing objects:  20% (712/3556)   remote: Compressing
> objects:  21% (747/3556)   remote: Compressing objects:  22%
> (783/3556)   remote: Compressing objects:  23% (818/3556)
>  remote: Compressing objects:  24% (854/3556)   remote: Compressing
> objects:  25% (889/3556)   remote: Compressing objects:  26%
> (925/3556)   remote: Compressing objects:  27% (961/3556)
>  remote: Compressing objects:  28% (996/3556)   remote: Compressing
> objects:  29% (1032/3556)   remote: Compressing objects:  30%
> (1067/3556)   remote: Compressing objects:  31% (1103/3556)
>remote: Compressing objects:  32% (1138/3556)   remote:
> Compressing objects:  33% (1174/3556)   remote: Compressing
> objects:  34% (1210/3556)   remote: Compressing objects:  35%
> (1245/3556)   remote: Compressing objects:  36% (1281/3556)
>remote: Compressing objects:  37% (1316/3556)   remote:
> Compressing objects:  38% (1352/3556)   remote: Compressing
> objects:  39% (1387/3556)   remote: Compressing objects:  40%
> (1423/3556)   remote: Compressing objects:  41% (1458/3556)
>remote: Compressing objects:  42% (1494/3556)   remote:
> Compressing objects:  43% (1530/3556)   remote: Compressing
> objects:  44% (1565/3556)   remote: Compressing objects:  45%
> (1601/3556)   remote: 

Re: Getting write access to our GitHub repo

2018-07-27 Thread Vinod Kone
Filed: https://issues.apache.org/jira/browse/INFRA-16832

On Mon, Jul 23, 2018 at 6:11 PM Vinod Kone  wrote:

> Hi Benjamin,
>
> The main reason for moving to gitbox is to better avail GitHub integration
> (i.e., closing stale PRs, directly merge from GH if wanted, lower barrier
> for entry for newbies to contribute, better integration with CI etc).
> AFAICT, this will necessitate us having write access to our GH repo.
>
> Since we need write access to GH, I'm wondering if there is a strong
> reason for us to have write access to the ASF repo as well? Because having
> two writable repos could be painful (slow sync causing merge conflicts that
> need to be resolved), I'm trying to see if we can avoid that if possible.
> And this is not set in stone, we can always open up write to both repos in
> the future if we want/need to (e.g., GH goes poof).
>
> And just to be clear, making the GH repo the source of truth doesn't
> change our relationship with ASF. GH is just a hosting location with better
> tooling that we don't need to reinvent and/or maintain. All our existing
> tooling should work just fine.
>
> HTH,
>
> On Mon, Jul 23, 2018 at 12:43 PM Benjamin Bannier <
> benjamin.bann...@mesosphere.io> wrote:
>
>> Hi Vinod,
>>
>> We (Jie, James, me) briefly discussed this topic and some implication
>> over slack:
>>
>> * I mentioned I was surprised how a vote on _moving the project repo to
>> ASF gitbox_ turned into _moving the project repo to Github_.
>> * Jie mentioned that this would simplify (enable?) how we could close
>> Github PRs. He also mentioned infra reliability.
>> * I mentioned that I believed that while it was in ASF’s interest to
>> support us as long as ASF was around, I wasn’t sure the same would hold for
>> Github.
>> * I wrote that personally I’d prefer improving limitations in our tooling
>> over moving to Github.
>>
>> That said, I’d prefer if we’d keep an ASF infra repo as source of truth
>> like agreed on in the vote. We should get a clearer understanding of the
>> limitations and limits of what ASF can provide before considering Github as
>> source of truth. I personally do not yet see a true need.
>>
>>
>> Cheers,
>>
>> Benjamin
>>
>>
>> > On Jul 23, 2018, at 8:44 PM, Jie Yu  wrote:
>> >
>> >>
>> >> 1) Merge strategy on GH. I think we want to use the "rebase and merge
>> >> <https://help.github.com/articles/about-pull-request-
>> >> merges/#rebase-and-merge-your-pull-request-commits>"
>> >> strategy only (i.e., disable other strategies) to avoid merge commits.
>> This
>> >> will be in parity with our RB based workflow.
>> >
>> >
>> > Sounds good! And we can "ban" the rest in github setting.
>> >
>> > 2) One writable repo. Do we want to keep both github and gitbox repos as
>> >> writable repos or do we want to make github the only writable repo (and
>> >> make gibox a read only mirror)? One advantage is that this will avoid
>> >> conflicts (that need to be manually resolved) when people commit to
>> both
>> >> repos independently and there is slowness in synchronization.
>> >
>> >
>> > +1 on making only github writable.
>> >
>> > 3) Our RB server currently points to yet another mirror "
>> >> git.apache.org/mesos" which has occasionally given us issues when
>> posting
>> >> reviews due to synchronization issues. Should we move our RB to point
>> to
>> >> github too?
>> >
>> >
>> > +1 on switching to github
>> >
>> > - Jie
>> >
>> > On Mon, Jul 23, 2018 at 10:49 AM, Vinod Kone 
>> wrote:
>> >
>> >> Few things we need to finalize before the gitbox move.
>> >>
>> >> 1) Merge strategy on GH. I think we want to use the "rebase and merge
>> >> <https://help.github.com/articles/about-pull-request-
>> >> merges/#rebase-and-merge-your-pull-request-commits>"
>> >> strategy only (i.e., disable other strategies) to avoid merge commits.
>> This
>> >> will be in parity with our RB based workflow.
>> >>
>> >> 2) One writable repo. Do we want to keep both github and gitbox repos
>> as
>> >> writable repos or do we want to make github the only writable repo (and
>> >> make gibox a read only mirror)? One advantage is that this will avoid
>> >> conflicts (that need to be manually resolved) when people comm

Re: Getting write access to our GitHub repo

2018-07-23 Thread Vinod Kone
Hi Benjamin,

The main reason for moving to gitbox is to better avail GitHub integration
(i.e., closing stale PRs, directly merge from GH if wanted, lower barrier
for entry for newbies to contribute, better integration with CI etc).
AFAICT, this will necessitate us having write access to our GH repo.

Since we need write access to GH, I'm wondering if there is a strong reason
for us to have write access to the ASF repo as well? Because having two
writable repos could be painful (slow sync causing merge conflicts that
need to be resolved), I'm trying to see if we can avoid that if possible.
And this is not set in stone, we can always open up write to both repos in
the future if we want/need to (e.g., GH goes poof).

And just to be clear, making the GH repo the source of truth doesn't change
our relationship with ASF. GH is just a hosting location with better
tooling that we don't need to reinvent and/or maintain. All our existing
tooling should work just fine.

HTH,

On Mon, Jul 23, 2018 at 12:43 PM Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> Hi Vinod,
>
> We (Jie, James, me) briefly discussed this topic and some implication over
> slack:
>
> * I mentioned I was surprised how a vote on _moving the project repo to
> ASF gitbox_ turned into _moving the project repo to Github_.
> * Jie mentioned that this would simplify (enable?) how we could close
> Github PRs. He also mentioned infra reliability.
> * I mentioned that I believed that while it was in ASF’s interest to
> support us as long as ASF was around, I wasn’t sure the same would hold for
> Github.
> * I wrote that personally I’d prefer improving limitations in our tooling
> over moving to Github.
>
> That said, I’d prefer if we’d keep an ASF infra repo as source of truth
> like agreed on in the vote. We should get a clearer understanding of the
> limitations and limits of what ASF can provide before considering Github as
> source of truth. I personally do not yet see a true need.
>
>
> Cheers,
>
> Benjamin
>
>
> > On Jul 23, 2018, at 8:44 PM, Jie Yu  wrote:
> >
> >>
> >> 1) Merge strategy on GH. I think we want to use the "rebase and merge
> >> <https://help.github.com/articles/about-pull-request-
> >> merges/#rebase-and-merge-your-pull-request-commits>"
> >> strategy only (i.e., disable other strategies) to avoid merge commits.
> This
> >> will be in parity with our RB based workflow.
> >
> >
> > Sounds good! And we can "ban" the rest in github setting.
> >
> > 2) One writable repo. Do we want to keep both github and gitbox repos as
> >> writable repos or do we want to make github the only writable repo (and
> >> make gibox a read only mirror)? One advantage is that this will avoid
> >> conflicts (that need to be manually resolved) when people commit to both
> >> repos independently and there is slowness in synchronization.
> >
> >
> > +1 on making only github writable.
> >
> > 3) Our RB server currently points to yet another mirror "
> >> git.apache.org/mesos" which has occasionally given us issues when
> posting
> >> reviews due to synchronization issues. Should we move our RB to point to
> >> github too?
> >
> >
> > +1 on switching to github
> >
> > - Jie
> >
> > On Mon, Jul 23, 2018 at 10:49 AM, Vinod Kone 
> wrote:
> >
> >> Few things we need to finalize before the gitbox move.
> >>
> >> 1) Merge strategy on GH. I think we want to use the "rebase and merge
> >> <https://help.github.com/articles/about-pull-request-
> >> merges/#rebase-and-merge-your-pull-request-commits>"
> >> strategy only (i.e., disable other strategies) to avoid merge commits.
> This
> >> will be in parity with our RB based workflow.
> >>
> >> 2) One writable repo. Do we want to keep both github and gitbox repos as
> >> writable repos or do we want to make github the only writable repo (and
> >> make gibox a read only mirror)? One advantage is that this will avoid
> >> conflicts (that need to be manually resolved) when people commit to both
> >> repos independently and there is slowness in synchronization.
> >>
> >> 3) Our RB server currently points to yet another mirror "
> >> git.apache.org/mesos" which has occasionally given us issues when
> posting
> >> reviews due to synchronization issues. Should we move our RB to point to
> >> github too?
> >>
> >> Thanks,
> >>
> >> On Sun, Jul 15, 2018 at 9:26 PM Jie Yu  wrote:
> >>
> >>> Vinod, can you start a VOTE thread per our discussion during the
> >>> committer's meeting.
> >>>
> >>> On Sun, Jul 15, 2018 at 1:34 AM, Gastón Kleiman 
> >>> wrote:
> >>>
> >>>> On Wed, Jun 20, 2018 at 7:59 PM Vinod Kone 
> >> wrote:
> >>>>
> >>>>> Hi folks,
> >>>>>
> >>>>> Looks like ASF now supports <https://gitbox.apache.org/> giving
> >> write
> >>>>> access to committers for their GitHub mirrors, which means we can
> >> merge
> >>>> PRs
> >>>>> directly on GitHub!
> >>>>>
> >>>>
> >>>> +1. Not only does it allow to merge PRs directly on GitHub, but it
> also
> >>>> allows committers to close stale PRs!
> >>>>
> >>>> -Gastón
> >>>>
> >>>
> >>
>
>


Re: Build failed in Jenkins: Mesos-Tidybot » -DENABLE_LIBEVENT=OFF -DENABLE_SSL=OFF,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2) #1341

2018-07-23 Thread Vinod Kone
Oh ok. I searched JIRA but couldn't find anything.

On Mon, Jul 23, 2018 at 11:19 AM Benjamin Bannier <
benjamin.bann...@mesosphere.io> wrote:

> > Hmm. Is this new?
>
> This is about a week old. There’s a fix in progress,
> https://reviews.apache.org/r/68001/.
>
> @jpeach @drexin
>
>
> > On Mon, Jul 23, 2018 at 11:04 AM Apache Jenkins Server <
> > jenk...@builds.apache.org> wrote:
> >
> >> See <
> >>
> https://builds.apache.org/job/Mesos-Tidybot/CMAKE_ARGS=-DENABLE_LIBEVENT=OFF%20-DENABLE_SSL=OFF,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/1341/display/redirect?page=changes
> >>>
> >>
> >> Changes:
> >>
> >> [vinodkone] Document SUPPRESS HTTP call [MESOS-7211].
> >>
> >> --
> >> [...truncated 392.74 KB...]
> >> /usr/bin/make -f 3rdparty/CMakeFiles/googletest-1.8.0.dir/build.make
> >> 3rdparty/CMakeFiles/googletest-1.8.0.dir/depend
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/http_parser-2.6.2.dir/DependInfo.cmake
> --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/libarchive-3.3.2.dir/DependInfo.cmake
> --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/glog-0.3.3.dir/DependInfo.cmake --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/boost-1.65.0.dir/DependInfo.cmake --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/libev-4.22.dir/DependInfo.cmake --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/concurrentqueue-7b69a8f.dir/DependInfo.cmake
> >> --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/picojson-1.3.0.dir/DependInfo.cmake --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/protobuf-3.5.0.dir/DependInfo.cmake --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/elfio-3.2.dir/DependInfo.cmake --color=
> >> make[3]: Entering directory '/BUILD'
> >> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> >> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> >> /BUILD/3rdparty/CMakeFiles/googletest-1.8.0.dir/DependInfo.cmake
> --color=
> >> make[3]: Leaving directory '/BUILD'
> >> /usr/bin/make -f 3rdparty/CMakeFiles/libarchive-3.3.2.dir/build.make
> >> 3rdparty/CMakeFiles/libarchive-3.3.2.dir/build
> >> make[3]: Leaving directory '/BUILD'
> >> /usr/bin/make -f 3rdparty/CMakeFiles/libev-4.22.dir/build.make
> >> 3rdparty/CMakeFiles/libev-4.22.dir/build
> >> make[3]: Leaving directory '/BUILD'
> >> /usr/bin/make -f 3rdparty/CMakeFiles/protobuf-3.5.0.dir/build.make
> >> 3rdparty/CMakeFiles/protobuf-3.5.0.dir/build
> >> make[3]: Leaving directory '/BUILD'
> >> make[3]: Leaving directory '/BUILD'
> >> make[3]: Leaving directory '/BUILD'
> >> /usr/bin/make -f 3rdparty/CMakeFiles/glog-0.3.3.dir/build.make
> >> 3rdparty/CMakeFiles/glog-0.3.3.dir/build
> >> make[3]: Leaving directory '/BUILD'
> >> /usr/bin/make -f 3rdparty/CMakeFiles/elfio-3.2.dir/build.make
> >> 3rdparty/CMakeFiles/elfio-3.2.dir/build
> >> /usr/bin/make -f 3rdparty/CMakeFiles/http_parser-2.6.2.dir/build.make
> >> 3rdparty/CMakeFiles/http_parser-2.6.2.dir/build
> >> /usr/bin/make -f 3rdparty/CMakeFiles/boost-1.65.0.dir/build.make
> >> 3rdparty/CMakeFiles/boost-1.65.0.dir/build
> >> make[3]: Leaving directory '/BUILD'
> >> make[3]: Leaving directory '/BUILD'
> >> /usr/bin/make -f 3rdparty/CMakeFiles/picojson-1.3.0.dir/build.make
> >> 3rdparty/CMakeFiles/picojson-1.3.0.dir/build
> >> Scanning dependencies of target concurrentqueue-7b69a8f
> >> make[3]: Entering directory '/BUILD'
> >> make[3]: Nothing to be done for
> >> '3rdparty/CMakeFiles/libarchive-3.3.2.dir/build'.
> >> /usr/bin/make -f 3rdparty/CMakeFiles/googletest-1.8.0.dir/build.make
> >> 

Re: Build failed in Jenkins: Mesos-Tidybot » -DENABLE_LIBEVENT=OFF -DENABLE_SSL=OFF,(docker||Hadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2) #1341

2018-07-23 Thread Vinod Kone
Hmm. Is this new?

On Mon, Jul 23, 2018 at 11:04 AM Apache Jenkins Server <
jenk...@builds.apache.org> wrote:

> See <
> https://builds.apache.org/job/Mesos-Tidybot/CMAKE_ARGS=-DENABLE_LIBEVENT=OFF%20-DENABLE_SSL=OFF,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/1341/display/redirect?page=changes
> >
>
> Changes:
>
> [vinodkone] Document SUPPRESS HTTP call [MESOS-7211].
>
> --
> [...truncated 392.74 KB...]
> /usr/bin/make -f 3rdparty/CMakeFiles/googletest-1.8.0.dir/build.make
> 3rdparty/CMakeFiles/googletest-1.8.0.dir/depend
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/http_parser-2.6.2.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/libarchive-3.3.2.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/glog-0.3.3.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/boost-1.65.0.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/libev-4.22.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/concurrentqueue-7b69a8f.dir/DependInfo.cmake
> --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/picojson-1.3.0.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/protobuf-3.5.0.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/elfio-3.2.dir/DependInfo.cmake --color=
> make[3]: Entering directory '/BUILD'
> cd /BUILD && /usr/local/bin/cmake -E cmake_depends "Unix Makefiles"
> /tmp/SRC /tmp/SRC/3rdparty /BUILD /BUILD/3rdparty
> /BUILD/3rdparty/CMakeFiles/googletest-1.8.0.dir/DependInfo.cmake --color=
> make[3]: Leaving directory '/BUILD'
> /usr/bin/make -f 3rdparty/CMakeFiles/libarchive-3.3.2.dir/build.make
> 3rdparty/CMakeFiles/libarchive-3.3.2.dir/build
> make[3]: Leaving directory '/BUILD'
> /usr/bin/make -f 3rdparty/CMakeFiles/libev-4.22.dir/build.make
> 3rdparty/CMakeFiles/libev-4.22.dir/build
> make[3]: Leaving directory '/BUILD'
> /usr/bin/make -f 3rdparty/CMakeFiles/protobuf-3.5.0.dir/build.make
> 3rdparty/CMakeFiles/protobuf-3.5.0.dir/build
> make[3]: Leaving directory '/BUILD'
> make[3]: Leaving directory '/BUILD'
> make[3]: Leaving directory '/BUILD'
> /usr/bin/make -f 3rdparty/CMakeFiles/glog-0.3.3.dir/build.make
> 3rdparty/CMakeFiles/glog-0.3.3.dir/build
> make[3]: Leaving directory '/BUILD'
> /usr/bin/make -f 3rdparty/CMakeFiles/elfio-3.2.dir/build.make
> 3rdparty/CMakeFiles/elfio-3.2.dir/build
> /usr/bin/make -f 3rdparty/CMakeFiles/http_parser-2.6.2.dir/build.make
> 3rdparty/CMakeFiles/http_parser-2.6.2.dir/build
> /usr/bin/make -f 3rdparty/CMakeFiles/boost-1.65.0.dir/build.make
> 3rdparty/CMakeFiles/boost-1.65.0.dir/build
> make[3]: Leaving directory '/BUILD'
> make[3]: Leaving directory '/BUILD'
> /usr/bin/make -f 3rdparty/CMakeFiles/picojson-1.3.0.dir/build.make
> 3rdparty/CMakeFiles/picojson-1.3.0.dir/build
> Scanning dependencies of target concurrentqueue-7b69a8f
> make[3]: Entering directory '/BUILD'
> make[3]: Nothing to be done for
> '3rdparty/CMakeFiles/libarchive-3.3.2.dir/build'.
> /usr/bin/make -f 3rdparty/CMakeFiles/googletest-1.8.0.dir/build.make
> 3rdparty/CMakeFiles/googletest-1.8.0.dir/build
> make[3]: Leaving directory '/BUILD'
> make[3]: Entering directory '/BUILD'
> make[3]: Nothing to be done for '3rdparty/CMakeFiles/glog-0.3.3.dir/build'.
> make[3]: Leaving directory '/BUILD'
> make[3]: Entering directory '/BUILD'
> make[3]: Entering directory '/BUILD'
> make[3]: Nothing to be done for '3rdparty/CMakeFiles/elfio-3.2.dir/build'.
> make[3]: Nothing to be done for '3rdparty/CMakeFiles/libev-4.22.dir/build'.
> make[3]: Leaving directory '/BUILD'
> make[3]: Leaving directory '/BUILD'
> 

Re: Getting write access to our GitHub repo

2018-07-23 Thread Vinod Kone
Few things we need to finalize before the gitbox move.

1) Merge strategy on GH. I think we want to use the "rebase and merge
<https://help.github.com/articles/about-pull-request-merges/#rebase-and-merge-your-pull-request-commits>"
strategy only (i.e., disable other strategies) to avoid merge commits. This
will be in parity with our RB based workflow.

2) One writable repo. Do we want to keep both github and gitbox repos as
writable repos or do we want to make github the only writable repo (and
make gibox a read only mirror)? One advantage is that this will avoid
conflicts (that need to be manually resolved) when people commit to both
repos independently and there is slowness in synchronization.

3) Our RB server currently points to yet another mirror "
git.apache.org/mesos" which has occasionally given us issues when posting
reviews due to synchronization issues. Should we move our RB to point to
github too?

Thanks,

On Sun, Jul 15, 2018 at 9:26 PM Jie Yu  wrote:

> Vinod, can you start a VOTE thread per our discussion during the
> committer's meeting.
>
> On Sun, Jul 15, 2018 at 1:34 AM, Gastón Kleiman 
> wrote:
>
> > On Wed, Jun 20, 2018 at 7:59 PM Vinod Kone  wrote:
> >
> > > Hi folks,
> > >
> > > Looks like ASF now supports <https://gitbox.apache.org/> giving write
> > > access to committers for their GitHub mirrors, which means we can merge
> > PRs
> > > directly on GitHub!
> > >
> >
> > +1. Not only does it allow to merge PRs directly on GitHub, but it also
> > allows committers to close stale PRs!
> >
> > -Gastón
> >
>


[RESULT] [VOTE] Move the project repos to gitbox

2018-07-20 Thread Vinod Kone
Hi,

This vote has passed with 7 +1s and no 0s or -1s!

+1 (binding)
-
Vinod Kone
James Peach
Zhitao Li
Andrew Schwartzmeyer
Jie Yu
Greg Mann
Gaston Kleiman

I'll file an INFRA ticket to get the process in motion.

Thanks,
Vinod


On Tue, Jul 17, 2018 at 8:27 PM Gastón Kleiman  wrote:

> On Tue, Jul 17, 2018 at 7:59 AM Vinod Kone  wrote:
>
>> Hi,
>>
>> As discussed in another thread and in the committers sync, there seem to
>> be
>> heavy interest in moving our project repos ("mesos", "mesos-site") from
>> the
>> "git-wip" git server to the new "gitbox" server to better avail GitHub
>> integrations.
>>
>> Please vote +1, 0, -1 regarding the move to gitbox. The vote will close in
>> 3 business days.
>>
>
> +1
>


Re: [Performance WG] Meeting Notes - July 18

2018-07-18 Thread Vinod Kone
Awesome. Thanks for the write up, Ben!

On Wed, Jul 18, 2018 at 2:55 PM Benjamin Mahler  wrote:

> For folks that missed it, here are my own notes. Thanks to alexr and dario
> for presenting!
>
> (1) I discussed a high agent cpu usage issue when hitting the /containers
> endpoint:
>
> https://issues.apache.org/jira/browse/MESOS-8418
>
> This was resolved, but it didn't get attention for months until I noticed a
> recent complaint about it in slack. It highlights the need to periodically
> check for new performance tickets in the backlog.
>
>
> (2) alexr presented slides on some ongoing work to improve the state
> serving performance:
>
>
> https://docs.google.com/presentation/d/10VczNGAPZDOYF1zd5b4qe-Q8Tnp-4pHrjOCF5netO3g
>
> This included measurements from clusters with many frameworks. The short
> term plan (hopefully in 1.7.0) is to investigate batching / parallel
> processing of state requests (still on the master actor), and halving the
> queueing time via authorizing outside of the master actor. There are
> potential longer term plans, but these short term improvements should take
> us pretty far, along with (3).
>
>
> (3) I presented some results from adapting our jsonify library to use
> rapidjson under the covers, and it cuts our state serving time in half:
>
>
> https://docs.google.com/spreadsheets/d/1tZ17ws88jIIhuY6kH1rVkR_QxNG8rYL4DX_T6Te_nQo
>
> The code is mainly done but there are a few things left to get it in a
> reviewable state.
>
>
> (4) I briefly mentioned some various other performance work:
>
>   (a) Libprocess metrics scalability: Greg, Gilbert and I undertook some
> benchmarking and improvements were made to better handle a large number of
> metrics, in support of per-framework metrics:
>
> https://issues.apache.org/jira/browse/MESOS-9072 (and see related tickets)
>
> There's still more open work that can be done here, but a more critical
> user-facing improvement at this point is the migration to push gauges in
> the master and allocator:
>
> https://issues.apache.org/jira/browse/MESOS-8914
>
>   (b) JSON parsing cost was cut in half by avoiding conversion through an
> intermediate format and instead directly parsing into our data structures:
>
> https://issues.apache.org/jira/browse/MESOS-9067
>
>
> (5) Till, Kapil, Meng Zhu, Greg Mann, Gaston and I have been working on
> benchmarking and making performance improvements to the allocator to speed
> up allocation cycle time and to address "offer starvation". In our
> multi-framework scale testing we saw allocation cycle time go down from 15
> secs to 5 secs, and there's still lots of low hanging fruit:
>
> https://issues.apache.org/jira/browse/MESOS-9087
>
> For offer starvation, we fixed an offer fragmentation issue due to quota
> "chopping" and we introduced the choice of a random weighted shuffle sorter
> as an alternative to ensure that high share frameworks don't get starved.
> We may also investigate introducing a round-robin sorter that shuffles
> between rounds if needed:
>
> https://issues.apache.org/jira/browse/MESOS-8935
> https://issues.apache.org/jira/browse/MESOS-8936
>
>
> (6) Dario talked about the MPSC queue that was recently added to libprocess
> for use in Process event queues. This needs to be enabled at configure-time
> as is currently the case for the lock free structures, and should provide a
> throughput improvement to libprocess. We still need to chart a path to
> turning these libprocess performance enhancing features on by default.
>
>
> (7) I can draft a 1.7.0 performance improvements blog post that features
> all of these topics and more. We may need to pull out some of the more
> lengthy content into separate blog posts if needed, but I think from the
> user perspective, highlighting what they get in 1.7.0 performance wise will
> be nice.
>
> Agenda Doc:
>
> https://docs.google.com/document/d/12hWGuzbqyNWc2l1ysbPcXwc0pzHEy4bodagrlNGCuQU
>
> Ben
>


Re: [AREA1 SPOOF] 1.7 release manager?

2018-07-17 Thread Vinod Kone
+dev

-- Vinod


On Tue, Jul 17, 2018 at 4:20 PM Chun-Hung Hsiao 
wrote:

> I could volunteer unless someone has been waiting for this :)
>
> On Tue, Jul 17, 2018 at 2:09 PM Greg Mann  wrote:
>
> > Hey folks!
> > The question just came up here in the office: who is managing the 1.7.0
> > release? 1.6.0 came out on May 11, so according to our quarterly release
> > policy, we should aim for 1.7 to come out some time around mid-August.
> >
> > AFAIK, nobody has volunteered yet? I thought I'd start a thread to see if
> > anybody is interested - any volunteers?
> >
> > Cheers,
> > Greg
> >
>


[VOTE] Move the project repos to gitbox

2018-07-17 Thread Vinod Kone
Hi,

As discussed in another thread and in the committers sync, there seem to be
heavy interest in moving our project repos ("mesos", "mesos-site") from the
"git-wip" git server to the new "gitbox" server to better avail GitHub
integrations.

Please vote +1, 0, -1 regarding the move to gitbox. The vote will close in
3 business days.

Thanks,
Vinod


Re: Backport Policy

2018-07-16 Thread Vinod Kone
oreseen consequences, which I
> >>> believe is something to be actively avoided in already released
> versions.
> >>> The reason for backporting patches to fix regressions is the same as
> the
> >>> reason to avoid backporting as much as possible: keep behavior
> consistent
> >>> (and safe) within a release. With that as the goal of a branch in
> >>> maintenance mode, it makes sense to fix regressions, and make
> exceptions to
> >>> fix CVEs and other critical/blocking issues.
> >>>
> >>> As for who should decide what to backport, I lean toward Ben's view of
> >>> the burden being on the committer. I don't think we should add more
> work
> >>> for release managers, and I think the committer/shepherd obviously has
> the
> >>> most understanding of the context around changes proposed for backport.
> >>>
> >>> Here's an example of a recent bugfix which I backported:
> >>> https://reviews.apache.org/r/67587/ (for MESOS-3790)
> >>>
> >>> While normally I believe this change falls under "avoid due to
> >>> unforeseen consequences," I made an exception as the bug was old, circa
> >>> 2015, (indicating it had been an issue for others), and was causing
> >>> recurring failures in testing. The fix itself was very small, meaning
> it
> >>> was easier to evaluate for possible side effects, so I felt a little
> safer
> >>> in that regard. The effect of not having the fix was a fatal and
> undesired
> >>> crash, which furthermore left troublesome side effects on the system
> (you
> >>> couldn't bring the agent back up). And lastly, a dependent project
> (DC/OS)
> >>> wanted it in their next bump, which necessitated backporting to the
> release
> >>> they were pulling in.
> >>>
> >>> I think in general we should backport only as necessary, and leave it
> on
> >>> the committers to decide if backporting a particular change is
> necessary.
> >>>
> >>>
> >>> On 07/13/2018 12:54 am, Alex Rukletsov wrote:
> >>>
> >>>> This is exactly where our views differ, Ben : )
> >>>>
> >>>> Ideally, I would like a release manager to have more ownership and
> less
> >>>> manual work. In my imagination, a release manager has more power and
> >>>> control about dates, features, backports and everything that is
> related
> >>>> to
> >>>> "their" branch. I would also like us to back port as little as
> >>>> possible, to
> >>>> simplify testing and releasing patch versions.
> >>>>
> >>>> On Fri, Jul 13, 2018 at 1:17 AM, Benjamin Mahler 
> >>>> wrote:
> >>>>
> >>>> +user, I probably it would be good to hear from users as well.
> >>>>>
> >>>>> Please see the original proposal as well as Alex's proposal and let
> us
> >>>>> know
> >>>>> your thoughts.
> >>>>>
> >>>>> To continue the discussion from where Alex left off:
> >>>>>
> >>>>> > Other bugs and significant improvements, e.g., performance, may be
> >>>>> back
> >>>>> ported,
> >>>>> the release manager should ideally be the one who decides on this.
> >>>>>
> >>>>> I'm a little puzzled by this, why is the release manager involved? As
> >>>>> we
> >>>>> already document, backports occur when the bug is fixed, so this
> >>>>> happens in
> >>>>> the steady state of development, not at release time. The release
> >>>>> manager
> >>>>> only comes in at the time of the release itself, at which point all
> >>>>> backports have already happened and the release manager handles the
> >>>>> release
> >>>>> process. Only blocker level issues can stop the release and while the
> >>>>> release manager has a strong say, we should generally agree on what
> >>>>> consists of a release blocking issue.
> >>>>>
> >>>>> Just to clarify my workflow, I generally backport every bug fix I
> >>>>> commit
> >>>>> that applies cleanly, right after I commit it to master (with the
> >>>>> exceptions I listed below).
> >>>>>
> >&

Re: [VOTE] Release Apache Mesos 1.6.1 (rc2)

2018-07-13 Thread Vinod Kone
+1 (binding)

Ran through ASF CI. Red builds were known health check / check flaky tests.

*Revision*: ae82dd5cc6f415916702897acfd3085b6387b118

   - refs/tags/1.6.1-rc2

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Failed]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Fri, Jul 13, 2018 at 12:48 PM Chun-Hung Hsiao 
wrote:

> +1 (binding)
>
> Tested on our internal CI. All green.
> Tested on my Mac with both autotools and CMake, with gRPC enabled.
> Failed tests:
>
> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPWithContainerImage
> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaHTTPSWithContainerImage
> HealthCheckTest.ROOT_INTERNET_CURL_HealthyTaskViaTCPWithContainerImage
> FetcherCacheTest.LocalUncachedExtract
> FetcherCacheHttpTest.HttpMixed
>
> MesosContainerizer/DefaultExecutorTest.ROOT_INTERNET_CURL_DockerTaskWithFileURI
> MesosContainerizer/DefaultExecutorTest.ROOT_LaunchGroupFailure
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_PersistentResources
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TaskSandboxPersistentVolume
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TasksSharingViaSandboxVolumes
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_TaskGroupsSharingViaSandboxVolumes
>
> LauncherAndIsolationParam/PersistentVolumeDefaultExecutor.ROOT_HealthCheckUsingPersistentVolume
>
> All of the above tests require the `filesystem/linux` isolator so are
> supposed to fail 

Re: Backport Policy

2018-07-11 Thread Vinod Kone
Ben, thanks for the clarification. I'm in agreement with the points you
made.

Once we have consensus, would you mind updating the doc?

On Wed, Jul 11, 2018 at 5:15 PM Benjamin Mahler  wrote:

> I realized recently that we aren't all on the same page with backporting.
> We currently only document the following:
>
> "Typically the fix for an issue that is affecting supported releases lands
> on the master branch and is then backported to the release branch(es). In
> rare cases, the fix might directly go into a release branch without landing
> on master (e.g., fix / issue is not applicable to master)." [1]
>
> This leaves room for interpretation about what lies outside of "typical".
> Here's the simplest way I can explain what I stick to, and I'd like to hear
> what others have in mind:
>
> * By default, bug fixes at any level should be backported to existing
> release branches if it affects those releases. Especially important:
> crashes, bugs in non-experimental features.
>
> * Exceptional cases that can omit backporting: difficult to backport fixes
> (especially if the bugs are deemed of low priority), bugs in experimental
> features.
>
> * Exceptional non-bug cases that can be backported: performance
> improvements.
>
> I realize that there is a ton of subtlety here (even in terms of which
> things are defined as bugs). But I hope we can lay down a policy that gives
> everyone the right mindset for common cases and then discuss corner cases
> on-demand in the future.
>
> [1] http://mesos.apache.org/documentation/latest/versioning/
>


Re: [Proposal] Replicated log storage compaction

2018-07-06 Thread Vinod Kone
I don't know about the replicated log, but the proposal seems find to me.

Jie/BenM, do you guys have an opinion?

On Mon, Jul 2, 2018 at 10:57 PM Santhosh Kumar Shanmugham
 wrote:

> +1. Aurora will hugely benefit from this change.
>
> On Mon, Jul 2, 2018 at 4:49 PM Ilya Pronin 
> wrote:
>
> > Hi everyone,
> >
> > I'd like to propose adding "manual" LevelDB compaction to the
> > replicated log truncation process.
> >
> > Motivation
> >
> > Mesos Master and Aurora Scheduler use the replicated log to persist
> > information about the cluster. This log is periodically truncated to
> > prune outdated log entries. However the replicated log storage is not
> > compacted and grows without bounds. This leads to problems like
> > synchronous failover of all master/scheduler replicas happening
> > because all of them ran out of disk space.
> >
> > The only time when log storage compaction happens is during recovery.
> > Because of that periodic failovers are required to control the
> > replicated log storage growth. But this solution is suboptimal.
> > Failovers are not instant: e.g. Aurora Scheduler needs to recover the
> > storage which depending on the cluster can take several minutes.
> > During the downtime tasks cannot be (re-)scheduled and users cannot
> > interact with the service.
> >
> > Proposal
> >
> > In MESOS-184 John Sirois pointed out that our usage pattern doesn’t
> > work well with LevelDB background compaction algorithm. Fortunately,
> > LevelDB provides a way to force compaction with DB::CompactRange()
> > method. Replicated log storage can trigger it after persisting learned
> > TRUNCATE action and deleting truncated log positions. The compacted
> > range will be from previous first position of the log to the new first
> > position (the one the log was truncated up to).
> >
> > Performance impact
> >
> > Mesos Master and Aurora Scheduler have 2 different replicated log
> > usage profiles. For Mesos Master every registry update (agent
> > (re-)registration/marking, maintenance schedule update, etc.) induces
> > writing a complete snapshot which depending on the cluster size can
> > get pretty big (in a scale test fake cluster with 55k agents it is
> > ~15MB). Every snapshot is followed by a truncation of all previous
> > entries, which doesn't block the registrar and happens kind of in the
> > background. In the scale test cluster with 55k agents compactions
> > after such truncations take ~680ms.
> >
> > To reduce the performance impact for the Master compaction can be
> > triggered only after more than some configurable number of keys were
> > deleted.
> >
> > Aurora Scheduler writes incremental changes of its storage to the
> > replicated log. Every hour a storage snapshot is created and persisted
> > to the log, followed by a truncation of all entries preceding the
> > snapshot. Therefore, storage compactions will be infrequent but will
> > deal with potentially large number of keys. In the scale test cluster
> > such compactions took ~425ms each.
> >
> > Please let me know what you think about it.
> >
> > Thanks!
> >
> > --
> > Ilya Pronin
> >
>


Re: [VOTE] Release Apache Mesos 1.6.1 (rc1)

2018-06-27 Thread Vinod Kone
Hmm. Lot of tests failed when I ran this through ASF CI. Not sure if all of
these are known flaky tests?

https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/50/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console

https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Release/50/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/console

On Wed, Jun 27, 2018 at 11:59 AM Jie Yu  wrote:

> +1
>
> Passed on our internal CI that has the following matrix. I looked into the
> only failed test, looks to be a flaky test due to a race in the test.
>
>
>
> On Tue, Jun 26, 2018 at 7:02 PM, Greg Mann  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.6.1.
>>
>>
>> 1.6.1 includes the following:
>>
>> 
>> *Announce major features here*
>> *Announce major bug fixes here*
>>
>> The CHANGELOG for the release is available at:
>>
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_plain;f=CHANGELOG;hb=1.6.1-rc1
>>
>> 
>>
>> The candidate for Mesos 1.6.1 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz
>>
>> The tag to be voted on is 1.6.1-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.1-rc1
>>
>> The SHA512 checksum of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>>
>> https://dist.apache.org/repos/dist/dev/mesos/1.6.1-rc1/mesos-1.6.1.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1229
>>
>> Please vote on releasing this package as Apache Mesos 1.6.1!
>>
>> The vote is open until Fri Jun 29 18:46:28 PDT 2018 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.6.1
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>> Greg
>>
>
>


Re: Getting write access to our GitHub repo

2018-06-23 Thread Vinod Kone
That’s right. Reviewboard will still be supported after the move. Just makes GH 
side if things better. 

Thanks,
Vinod

> On Jun 23, 2018, at 4:48 AM, Chun-Hung Hsiao  wrote:
> 
> I also find GitHub hard to do code review. If we put multiple commits in a
> PR where each commit has a specific purpose, then after the author revises
> each commit, it would hard to see what has been updated between two
> revisions of "the same commit." If we put a review chain into multiple PRs
> where each PR has a specific purpose and make each revision a new commit,
> then it's hard to specify dependencies between PRs.
> 
>> On Fri, Jun 22, 2018, 10:23 PM Yan Xu  wrote:
>> 
>> IIUC this wouldn't necessarily rule out RB reviews just better support for
>> Github PRs?
>> 
>> On Fri, Jun 22, 2018 at 9:13 PM Andrew Schwartzmeyer <
>> and...@schwartzmeyer.com> wrote:
>> 
>>> GitHub PR code reviews have gotten _significantly_ better over the last
>>> two years. You can actually open addressable issues now (like
>>> ReviewBoard), and assign reviewers, and "officially" mark it as
>>> signed-off (ship-it) too. They used to suck so bad that I preferred
>>> inline email comments to PRs, but they've improved.
>>> 
>>> On 06/22/2018 9:01 pm, James Peach wrote:
>>>>> On Jun 22, 2018, at 7:34 PM, Jie Yu  wrote:
>>>>> 
>>>>> +1
>>>>> 
>>>>> Does this means we can add CI webhooks to the git repo?
>>>> 
>>>> FWIW, I'm hugely -1 on doing code reviews on GitHub. I'm cautiously
>>>> optimistic about other kinds of integration though.
>>>> 
>>>>> On Thu, Jun 21, 2018 at 3:45 PM, James Peach 
>> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>>> On Jun 20, 2018, at 7:58 PM, Vinod Kone 
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Hi folks,
>>>>>>> 
>>>>>>> Looks like ASF now supports <https://gitbox.apache.org/> giving
>>>>>>> write
>>>>>>> access to committers for their GitHub mirrors, which means we can
>>>>>>> merge
>>>>>> PRs
>>>>>>> directly on GitHub!
>>>>>> 
>>>>>> Are you proposing that we move to Github generally?
>>>>>> 
>>>>>>> FWICT, this requires us moving our repo to a new gitbox server by
>>>>>>> filing
>>>>>> an
>>>>>>> INFRA ticket. We probably need to update our CI and other tooling
>>>>>>> that
>>>>>>> references our git repo directly, so there will be work involved on
>>>>>>> our
>>>>>> end
>>>>>>> as well.
>>>>>>> 
>>>>>>> This has been one of the long requested features from several
>>>>>>> committers,
>>>>>>> so I'm gauging interest to see if folks think we should go down this
>>>>>> route
>>>>>>> (several projects seem to be already moving
>>>>>>> <https://issues.apache.org/jira/issues/?jql=text%20~%20%22gitbox%22
>>> )
>>>>>> too.
>>>>>>> 
>>>>>>> If there is enough interest, we could start a vote.
>>>>>>> 
>>>>>>> Thanks,
>>>>>>> Vinod
>>>>>> 
>>>>>> 
>>> 
>> 


Getting write access to our GitHub repo

2018-06-20 Thread Vinod Kone
Hi folks,

Looks like ASF now supports  giving write
access to committers for their GitHub mirrors, which means we can merge PRs
directly on GitHub!

FWICT, this requires us moving our repo to a new gitbox server by filing an
INFRA ticket. We probably need to update our CI and other tooling that
references our git repo directly, so there will be work involved on our end
as well.

This has been one of the long requested features from several committers,
so I'm gauging interest to see if folks think we should go down this route
(several projects seem to be already moving
) too.

If there is enough interest, we could start a vote.

Thanks,
Vinod


Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-31 Thread Vinod Kone
=
I0529 21:04:38.781270 28418 openssl.cpp:429] Will not verify peer certificate!
NOTE: Set LIBPROCESS_SSL_VERIFY_CERT=1 to enable peer certificate verification
I0529 21:04:38.781277 28418 openssl.cpp:435] Will only verify peer
certificate if presented!
NOTE: Set LIBPROCESS_SSL_REQUIRE_CERT=1 to require peer certificate verification
E0529 21:04:38.781814 28435 process.cpp:956] Failed to accept socket:
future discarded
*** Aborted at 1527627878 (unix time) try "date -d @1527627878" if you
are using GNU date ***
PC: @ 0x7fcbd5615dd6 __memcpy_ssse3_back
*** SIGSEGV (@0x5cabd78) received by PID 28418 (TID 0x7fcbcc6dd700)
from PID 97172856; stack trace: ***
I0529 21:04:38.797348 28418 process.cpp:1272] libprocess is
initialized on 172.17.0.3:47350 with 16 worker threads
@ 0x7fcbd66dd6d0 (unknown)
@ 0x7fcbd5615dd6 __memcpy_ssse3_back
@ 0x7fcbd5e636f0 (unknown)
@ 0x7fcbd5e63d9c (unknown)
@   0x42af09 process::UPID::UPID()
I0529 21:04:38.803799 29172 process.cpp:3741] Handling HTTP event for
process '(77)' with path: '/(77)/body'
@   0x8edfaa process::DispatchEvent::DispatchEvent()
@   0x8e6560 process::internal::dispatch()
I0529 21:04:38.809983 29176 process.cpp:3741] Handling HTTP event for
process '(77)' with path: '/(77)/pipe'
@   0x900ad8 process::dispatch<>()
@   0x8e548c process::ProcessBase::route()
I0529 21:04:38.821267 29181 process.cpp:3741] Handling HTTP event for
process '(77)' with path: '/(77)/body'
I0529 21:04:38.821970 29182 process.cpp:3798] Failed to process
request for '/(77)/body': failure
I0529 21:04:38.821995 29172 process.cpp:1482] Returning '500 Internal
Server Error' for '/(77)/body' (failure)
[   OK ] Scheme/HTTPTest.Endpoints/0 (227 ms)
[ RUN  ] Scheme/HTTPTest.Endpoints/1
@   0x9d823d process::ProcessBase::route<>()
@   0x9d4480 process::Help::initialize()
@   0x8de9e8 process::ProcessManager::resume()
@   0x8db3be _ZZN7process14ProcessManager12init_threadsEvENKUt_clEv
@   0x8ed63e
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEE9_M_invokeIIEEEvSt12_Index_tupleIIXspT_EEE
@   0x8ed582
_ZNSt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEclEv
@   0x8ed50c
_ZNSt6thread5_ImplISt12_Bind_simpleIFZN7process14ProcessManager12init_threadsEvEUt_vEEE6_M_runEv
@ 0x7fcbd5e5a070 (unknown)
@ 0x7fcbd66d5e25 start_thread
@ 0x7fcbd55bebad __clone
make[7]: *** [check-local] Segmentation fault





On Tue, May 29, 2018 at 12:28 PM, Benjamin Mahler 
wrote:

> +1 (binding)
>
> Make check passes on macOS 10.13.4 with Apple LLVM version 9.1.0
> (clang-902.0.39.1).
>
> On Wed, May 23, 2018 at 10:00 PM, Michael Park  wrote:
>
> > The tarball has been fixed, please vote now!
> >
> > 'twas BSD `tar` issues... :(
> >
> > Thanks,
> >
> > MPark
> >
> > On Wed, May 23, 2018 at 11:39 AM, Michael Park  wrote:
> >
> >> Huh... 樂 Super weird. I'll look into it.
> >>
> >> Thanks for checking!
> >>
> >> MPark
> >>
> >> On Wed, May 23, 2018 at 11:34 AM Vinod Kone 
> wrote:
> >>
> >>> It's empty for me too!
> >>>
> >>> On Wed, May 23, 2018 at 11:32 AM, Benjamin Mahler 
> >>> wrote:
> >>>
> >>>> Thanks Michael!
> >>>>
> >>>> Looks like the tar.gz is empty, is it just me?
> >>>>
> >>>> On Tue, May 22, 2018 at 10:09 PM, Michael Park 
> >>>> wrote:
> >>>>
> >>>>> Hi all,
> >>>>>
> >>>>> Please vote on releasing the following candidate as Apache Mesos
> 1.3.3.
> >>>>>
> >>>>> The CHANGELOG for the release is available at:
> >>>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
> >>>>> lain;f=CHANGELOG;hb=1.3.3-rc1
> >>>>> 
> >>>>> 
> >>>>>
> >>>>> The candidate for Mesos 1.3.3 release is available at:
> >>>>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
> >>>>> -1.3.3.tar.gz
> >>>>>
> >>>>> The tag to be voted on is 1.3.3-rc1:
> >>>>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit
> >>>>> ;h=1.3.3-rc1
> >>>>>
> >>>>> The SHA512 checksum of the tarball can be found at:
> >>>>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
> >>>>> -1.3.3.tar.gz.sha512
> >>>>>
> >>>>> The signature of the tarball can be found at:
> >>>>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
> >>>>> -1.3.3.tar.gz.asc
> >>>>>
> >>>>> The PGP key used to sign the release is here:
> >>>>> https://dist.apache.org/repos/dist/release/mesos/KEYS
> >>>>>
> >>>>> The JAR is up in Maven in a staging repository here:
> >>>>> https://repository.apache.org/content/repositories/
> orgapachemesos-1226
> >>>>>
> >>>>> Please vote on releasing this package as Apache Mesos 1.3.3!
> >>>>>
> >>>>> The vote is open until Fri May 25 22:07:39 PDT 2018 and passes if a
> >>>>> majority of at least 3 +1 PMC votes are cast.
> >>>>>
> >>>>> [ ] +1 Release this package as Apache Mesos 1.3.3
> >>>>> [ ] -1 Do not release this package because ...
> >>>>>
> >>>>> Thanks,
> >>>>>
> >>>>> MPark
> >>>>>
> >>>>
> >>>>
> >>>
> >
>


Re: Upgrading our support scripts to Python 3

2018-05-25 Thread Vinod Kone
Have we turned python3 on in our CI? Would be great to test. 

Sent from my phone

> On May 24, 2018, at 11:26 PM, Armand Grillet  wrote:
> 
> Hi all,
> 
> Python 2.7 will retire on January 1, 2020 and we currently use it for
> our support scripts, our Python bindings, and our new CLI.
> 
> Starting July 1, 2018 you will need to have Python 3.6 on your computer
> in order to use the support scripts. It is available on all the
> operating systems we support and even preinstalled on most recent Linux
> distributions. We're making this change due to issues with our support
> scripts on Windows that have been fixed with Python 3.
> 
> If you already have Python 3.6 installed on your machine, great.
> Otherwise, you will see a deprecation message when you use the support
> scripts and the related git hooks. Don't worry, these messages and the
> switch to Python 3 do not change how the scripts work.
> 
> We now have Python 3 support scripts alongside the existing ones. Having
> a duplicated codebase is not sustainable and we thus plan on deprecating
> the Python 2 support scripts by July 1st. We want to have a few weeks to
> test these new scripts thoroughly and let you install Python 3.6, this
> is why we have decided to have both codebases for a while.
> 
> If you want to use the new scripts, set in your environment the variable
> `MESOS_SUPPORT_PYTHON` to `3` and run again the bash support script
> `build-virtualenv`. You will then use the Python 3 scripts by default.
> 
> If you have any questions, please answer to this thread or join the
> Mesos Slack channel #python3.
> 
> PS: This Python 3 switch does not apply to the rest of our codebase yet.
> As we have seen in a previous thread, some developers still rely on the
> Python 2 bindings and we do not want to disturb that.
> 
> -- 
> Armand Grillet
> Software Engineer, Mesosphere


Re: [VOTE] Release Apache Mesos 1.3.3 (rc1)

2018-05-23 Thread Vinod Kone
It's empty for me too!

On Wed, May 23, 2018 at 11:32 AM, Benjamin Mahler 
wrote:

> Thanks Michael!
>
> Looks like the tar.gz is empty, is it just me?
>
> On Tue, May 22, 2018 at 10:09 PM, Michael Park  wrote:
>
>> Hi all,
>>
>> Please vote on releasing the following candidate as Apache Mesos 1.3.3.
>>
>> The CHANGELOG for the release is available at:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> lain;f=CHANGELOG;hb=1.3.3-rc1
>> 
>> 
>>
>> The candidate for Mesos 1.3.3 release is available at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos-1.3.3.tar.gz
>>
>> The tag to be voted on is 1.3.3-rc1:
>> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.3.3-rc1
>>
>> The SHA512 checksum of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
>> -1.3.3.tar.gz.sha512
>>
>> The signature of the tarball can be found at:
>> https://dist.apache.org/repos/dist/dev/mesos/1.3.3-rc1/mesos
>> -1.3.3.tar.gz.asc
>>
>> The PGP key used to sign the release is here:
>> https://dist.apache.org/repos/dist/release/mesos/KEYS
>>
>> The JAR is up in Maven in a staging repository here:
>> https://repository.apache.org/content/repositories/orgapachemesos-1226
>>
>> Please vote on releasing this package as Apache Mesos 1.3.3!
>>
>> The vote is open until Fri May 25 22:07:39 PDT 2018 and passes if a
>> majority of at least 3 +1 PMC votes are cast.
>>
>> [ ] +1 Release this package as Apache Mesos 1.3.3
>> [ ] -1 Do not release this package because ...
>>
>> Thanks,
>>
>> MPark
>>
>
>


Re: [jira] [Commented] (MESOS-8927) Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.

2018-05-16 Thread Vinod Kone
Can you paste some logs here too if you have?

On Wed, May 16, 2018 at 5:53 PM, Chun-Hung Hsiao (JIRA) 
wrote:

>
> [ https://issues.apache.org/jira/browse/MESOS-8927?page=
> com.atlassian.jira.plugin.system.issuetabpanels:comment-
> tabpanel=16478318#comment-16478318 ]
>
> Chun-Hung Hsiao commented on MESOS-8927:
> 
>
> I'd like to add some notes here. This problem is actually nontrivial,
> because AFAIK we don't have a reliable way to kill a container at any state.
>
> > Default executor cannot kill tasks if `LAUNCH_NESTED_CONTAINER` is stuck.
> > 
> -
> >
> > Key: MESOS-8927
> > URL: https://issues.apache.org/jira/browse/MESOS-8927
> > Project: Mesos
> >  Issue Type: Bug
> >  Components: executor
> >Affects Versions: 1.5.1, 1.6.0
> >Reporter: Chun-Hung Hsiao
> >Priority: Critical
> >  Labels: default-executor, mesosphere
> >
> > In the default executor, if the {{LAUNCH_NESTED_CONTAINER}} call never
> returns, {{container->launched}} won't be set, so a follow-up {{KILL}}
> event will be ignored:
> >  [https://github.com/apache/mesos/blob/40b40d9b73221388e583fc140280f1
> eb2b48b832/src/launcher/default_executor.cpp#L1091]
> > This could lead to tasks stuck in {{TASK_STARTING}}.
>
>
>
> --
> This message was sent by Atlassian JIRA
> (v7.6.3#76005)
>


Re: [VOTE] Release Apache Mesos 1.6.0 (rc1)

2018-05-11 Thread Vinod Kone
Love the votes based on test clusters! Keep them coming.

On Fri, May 11, 2018 at 9:55 AM, Andrew Schwartzmeyer <
and...@schwartzmeyer.com> wrote:

> +1 (binding)
>
> Built and tested on Windows manually and through the Windows CI. All tests
> passed, test cluster worked as expected.
>
>
> On 05/11/2018 9:08 am, Zhitao Li wrote:
>
>> +1 (binding)
>>
>> Tested with both make check (with and without root), and deployed to a
>> small testing cluster.
>>
>> On Thu, May 10, 2018 at 9:09 PM, James Peach <jpe...@apache.org> wrote:
>>
>> +1 (binding)
>>>
>>> Checked the signatures, build and tests on Fedora 27
>>>
>>> > On May 10, 2018, at 9:06 AM, Chun-Hung Hsiao <chhs...@mesosphere.io>
>>> wrote:
>>> >
>>> > +1 (binding)
>>> >
>>> > Tested on our internal CI (sudo make check) on Mac, CentOS 6/7, Debian
>>> 8/9
>>> > and Ubuntu 14/16/17, with gRPC/SSL disabled/enabled.
>>> > Also manually tested "make distcheck" w/ autotools, and "ninja check"
>>> w/
>>> > CMake on Mac and CentOS 7 with gRPC enabled.
>>> >
>>> > Observed the following failures:
>>> > https://issues.apache.org/jira/browse/MESOS-8884
>>> > https://issues.apache.org/jira/browse/MESOS-8875
>>> >
>>> > The first one is a test flakiness, and the second one is related to
>>> > MESOS-2407 which is a known problem.
>>> >
>>> > On Wed, May 9, 2018 at 11:00 AM, Vinod Kone <vinodk...@apache.org>
>>> wrote:
>>> >
>>> >> +1 (binding)
>>> >>
>>> >> Ran it on ASF CI. The only failures observed were known flaky command
>>> check
>>> >> tests.
>>> >>
>>> >> *Revision*: c7df5eadc075adcf525ea091f65786aaffb9b072
>>> >>
>>> >>   - refs/tags/1.6.0-rc1
>>> >>
>>> >> Configuration Matrix gcc clang
>>> >> centos:7 --verbose --enable-libevent --enable-ssl autotools
>>> >> [image: Failed]
>>> >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
>>> >> ease/48/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>>> >> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>>> >> GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%
>>> >> 7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> >> [image: Not run]
>>> >> cmake
>>> >> [image: Success]
>>> >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
>>> >> ease/48/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> >> %20--enable-libevent%20--enable-ssl,ENVIRONMENT=GLOG_v=
>>> >> 1%20MESOS_VERBOSE=1,OS=centos%3A7,label_exp=(docker%7C%
>>> >> 7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> >> [image: Not run]
>>> >> --verbose autotools
>>> >> [image: Failed]
>>> >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
>>> >> ease/48/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>>> >> verbose,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%
>>> >> 3A7,label_exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> >> [image: Not run]
>>> >> cmake
>>> >> [image: Success]
>>> >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
>>> >> ease/48/BUILDTOOL=cmake,COMPILER=gcc,CONFIGURATION=--verbose
>>> >> ,ENVIRONMENT=GLOG_v=1%20MESOS_VERBOSE=1,OS=centos%3A7,label_
>>> >> exp=(docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> >> [image: Not run]
>>> >> ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
>>> >> [image: Failed]
>>> >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
>>> >> ease/48/BUILDTOOL=autotools,COMPILER=gcc,CONFIGURATION=--
>>> >> verbose%20--enable-libevent%20--enable-ssl,ENVIRONMENT=
>>> >> GLOG_v=1%20MESOS_VERBOSE=1,OS=ubuntu%3A14.04,label_exp=(
>>> >> docker%7C%7CHadoop)&&(!ubuntu-us1)&&(!ubuntu-eu2)/>
>>> >> [image: Success]
>>> >> <https://builds.apache.org/view/M-R/view/Mesos/job/Mesos-Rel
>>> >> ease/48/BUILDTOOL=autotools,COMPILER=clang,CONFIGURATION=-
>>> >> 

Re: Deprecating the Python bindings

2018-05-09 Thread Vinod Kone
One of the production users that I know who used to depend on python
bindings were https://github.com/douban.

Also, apache aurora used to have an executor that depended on python
bindings.

I don't know what their dependencies are these days w.r.t python bindings.

On Wed, May 9, 2018 at 11:51 AM, Andrew Schwartzmeyer <
and...@schwartzmeyer.com> wrote:

> Hi all,
>
> There are two parallel efforts underway that would both benefit from
> officially deprecating (and then removing) the Python bindings. The first
> effort is the move to the CMake system: adding support to generate the
> Python bindings was investigated but paused (see MESOS-8118), and the
> second effort is the move to Python 3: producing Python 3 compatible
> bindings is under investigation but not in progress (see MESOS-7163).
>
> Benjamin Bannier, Joseph Wu, and I have all at some point just wondered
> how the community would fare if the Python bindings were officially
> deprecated and removed. So please, if this would negatively impact you or
> your project, let me know in this thread.
>
> Thanks,
>
> Andrew Schwartzmeyer
>


Re: [VOTE] Release Apache Mesos 1.6.0 (rc1)

2018-05-09 Thread Vinod Kone
+1 (binding)

Ran it on ASF CI. The only failures observed were known flaky command check
tests.

*Revision*: c7df5eadc075adcf525ea091f65786aaffb9b072

   - refs/tags/1.6.0-rc1

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]



On Mon, May 7, 2018 at 8:48 PM, Greg Mann  wrote:

> Hi all,
>
> Please vote on releasing the following candidate as Apache Mesos 1.6.0.
>
>
> 1.6.0 includes the following:
> 
> 
> * Resizing of persistent volumes for agent default resources
> * Offer operation feedback for resource provider resources
> * Docker executor/containerizer improvements for graceful handling of
> Docker failures
> * Support for jemalloc on Linux
>
> The CHANGELOG for the release is available at:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_
> plain;f=CHANGELOG;hb=1.6.0-rc1
> 
> 
>
> The candidate for Mesos 1.6.0 release is available at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.0-rc1/mesos-1.6.0.tar.gz
>
> The tag to be voted on is 1.6.0-rc1:
> https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=commit;h=1.6.0-rc1
>
> The SHA512 checksum of the tarball can be found at:
> https://dist.apache.org/repos/dist/dev/mesos/1.6.0-rc1/
> mesos-1.6.0.tar.gz.sha512
>
> The signature of the tarball can be found at:
> 

Re: Add hostname or agentid in rescind offers callback

2018-05-02 Thread Vinod Kone
Can I ask why you are indexing the offers by hostname? Is it to better
handle agent removal / unreachable signal?

Looking at the code
 ,
I think master has the requested information (agent id, hostname) so we can
include it in the rescind message!

But there are couple things to discuss.

The extra information to be included in rescind message is technically
redundant. So we need to figure out a guideline on what information should
be included / not included (e.g., should we include agent IP too) in such
calls.

Second, adding this extra information in v1 scheduler API would be
relatively easy. But adding this to v0 API would be hard. Which API do you
need to be updated?


On Wed, May 2, 2018 at 10:31 AM, Varun Gupta  wrote:

> Hi,
>
> Currently in our implementation we maintain two maps.
>
> Hostname -> []Offers
>
> offerID -> Hostname
>
> Second map is needed because rescind offers callback only provides offerid
> and we need hostname to do performant lookup in first map.
>
> Is it feasible to add hostname or agentid in rescind offers?
>
> Thanks,
> Varun
>


Re: Convention for Backward Compatibility for New Operations in Mesos 1.6

2018-04-16 Thread Vinod Kone
Crashing the agent is definitely not a viable option IMO.

Why can't we use agent capabilities instead of agent version and reject
such operations at master? This is one of the main reasons we introduced
the concept of framework, master, agent capabilities.

On Mon, Apr 16, 2018 at 2:04 PM, Chun-Hung Hsiao  wrote:

> Hi all,
>
> As some might have already known, we are currently working on patches to
> implement the new GROW_VOLUME and SHRINK_VOLUME operations [1].
>
> One problem surfaces is that, since the new operations are not supported in
> Mesos 1.5, they will lead to an agent crash during the operation
> application
> cycle if a Mesos 1.6 master send these operations to a Mesos 1.5 agent [2].
>
> We are now consider two possibilities to address this compatibility
> problem:
>
> 1) The Mesos 1.6 master should check the agent's Mesos version in
> `Master::accept` [3]. Moving forward, if we add new operations in future
> Mesos
> releases, we would have code like the following:
>
> ```
> Version slaveVersion = ...; // Get the Mesos version of the slave of the
> offer.
> switch (operation.type()) {
>   ...
>   case SOME_NEW_OPERATION: {
> if (slaveVersion < minVersionForSomeNewOperation) {
>   ... // Drop the operation.
> }
> break;
>   }
>   ...
> }
> ```
>
> Pros and cons:
> + The new operation won't go into the operation application cycle since it
> is
>   rejected in the very beginning. This means no resource metadata is
> touched.
> - Explicit slave version checks at master side make the code look not very
> clean,
>   and we will need to update this list every time we add a new operation.
>
> 2) Treat this issue as an agent crash bug. The Mesos master would forward
> the operation to the agent, regardless of the agent's Mesos version. In the
> agent,
> we deploy and backport the following logic in `Slave::applyOperation` [4]:
>
> ```
> if (message.operation_info().type() == OPERATION_UNKNOWN) {
>   ... // Drop the operation and trigger a re-registration or send an
>   // `UpdateSlaveMessage` to force the master to update the total
> resource of
>   // the slave.
> }
> ```
>
> Pros and cons:
> + Easier to add new operations since no new logic needs to be added for
> backward
>   Compability.
> - Since the old agent won't know whether the new operations are speculative
> or not,
>   a re-registration or an `UpdateSlaveMessage` is required.
> - Mesos 1.5.0 agents will still have the bug and crash when a new master
> sends a
>   new operation to them.
>
> Since both options are viable and there seems to be no clear winner, we'd
> like to
> check with the community to see which convention is preferable. Please let
> us know
> what you think. Thanks!
>
> Best,
> Chun-Hung
>
>
> [1] https://issues.apache.org/jira/browse/MESOS-4965
> [2]
> https://github.com/apache/mesos/blob/1.5.x/src/common/protob
> uf_utils.cpp#L851
> [3] https://github.com/apache/mesos/blob/master/src/master/maste
> r.cpp#L3899
> [4] https://github.com/apache/mesos/blob/1.5.x/src/slave/slave.cpp#L4359
>


Re: This Month in Mesos - March 2018

2018-03-30 Thread Vinod Kone
Thanks for the update Greg!

Sent from my phone

> On Mar 30, 2018, at 3:08 PM, Greg Mann  wrote:
> 
> Oh hai there Apache Mesos Community!
> 
> Back again with your monthly update on current events in the Mesosverse:
> 
> 
> *Working Groups*
> 
> Below you'll find a brief summary of the group meetings from this past
> month, as well as some info about related work that's been happening in the
> project. Working group meetings can be found on the Mesos community calendar
> , and you should feel
> free to add agenda items beforehand!
> 
> 
> *API Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next Meeting: April 3 @ 11am PST
> 
> In March we held the first two meetings of the new API working group! This
> has brought about a revival of our perennial discussion on the preferred
> Mesos release cadence; you can expect an updated release policy in our
> documentation shortly. It's looking like the new policy will be in line
> with what we have been doing in practice for the last few releases, so no
> big changes there.
> 
> 
> Zhitao also presented his ongoing work on new operations which will allow
> the growing/shrinking of persistent volumes. You can find his design doc
> here
> 
> .
> 
> 
> *Containerization Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next meeting: April 5 @ 9am PST
> 
> Two big items in the containerization space this month:
> 
> 
>   - Improvements to the Docker containerizer/executor to more gracefully
>   handle bugs in the Docker daemon: MESOS-8572
>   
>   - Configurable network namespaces for nested containers: MESOS-8534
>   
> 
> *Community Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next Meeting: April 9 @ 10:30am PST
> 
> Community working group had a preliminary discussion about the next
> quarterly doc-a-thon, and discussed the possibility of spinning up a new
> Releases Working Group. We also discussed plans for the next MesosCon, and
> how we may want to evolve that event going forward.
> 
> 
> *Performance Working Group*
> 
> [Agenda Doc
> 
> ]
> 
> Next meeting: April 18 @ 10am PST
> 
> We now have a performance dashboard
> 
> which lets you view tickets in ASF JIRA which have been marked as
> performance-related - take a look!
> 
> 
> Some additional copy elimination
>  patches have been
> merged, with more yet to come. The group also discussed the near-term
> performance roadmap, which includes optimization of
> authentication/authorization, master state computation, and the libprocess
> HTTP code; see the agenda document for more details.
> 
> 
> 
> Until next time,
> -Greg


Re: Adding a `FLAKY` label to flaky unit tests

2018-03-29 Thread Vinod Kone
Would the CI run FLAKY tests or will it filter it out? I'm assuming it
still does based on your observation above.

What are the other reasons tests are DISABLED today?

On Thu, Mar 29, 2018 at 10:35 AM, Meng Zhu  wrote:

> +1, the advantages are appealing.
>
> Though I am afraid that this will probably reduce the incentive to fix
> flaky tests.
>
> -Meng
>
> On Thu, Mar 29, 2018 at 9:45 AM, Benno Evers 
> wrote:
>
> > Hi all,
> >
> > if you're regularly running Mesos unit tests, e.g. because you've set up
> a
> > CI system, you probably noticed that there is a lot of noise in the
> results
> > due to flaky tests.
> >
> > As a measure to ease the pain, what do you think about adding a `FLAKY`
> > label to known flaky unit tests, similar to how we have `ROOT`,
> `INTERNET`,
> > `DISABLED`, etc. right now?
> >
> > The advantages, in my opinion, would be:
> >  - Looking at test results, it would be immediately visible whether a
> test
> > failure was known flaky or not without going to JIRA
> >  - People who want to reduce noise can disable all known flaky tests by a
> > simple gtest filter
> >  - People who want to can still run the flaky tests easier than if they
> get
> > disabled outright
> >  - With a little bit of scripting, it would be possible to add logic like
> > "for flaky tests, run them 10 times and only report a failure if more
> than
> > x% of the runs fail."
> >
> > What do you think?
> >
> > Best regards,
> > --
> > Benno Evers
> > Software Engineer, Mesosphere
> >
>


Re: 1.6 Release Manager

2018-03-27 Thread Vinod Kone
I would suggest to shoot end of April for the first RC, given our history of 
first rc rarely being the final rc. 

Sent from my phone

> On Mar 27, 2018, at 10:49 AM, Greg Mann  wrote:
> 
> Hey folks,
> I'd like to volunteer to manage the 1.6 release. AlexR has kindly offered
> to help me through the process, since this is my first time. Thanks Alex!
> 
> It seems like we're converging on a quarterly release cadence in the email
> thread on that topic, so I'll tentatively plan on a release date of May 8,
> which is 3 months after the 1.5 release on Feb. 8. Looking ahead to that
> time, I'll be unavailable on May 1st and 2nd, so I may wait until May 3rd
> to cut the first RC. This means we would hit the May 8 target if all goes
> well with that RC, or release shortly thereafter if we find some issues.
> 
> If you have any feedback on this projected timeline and how it might impact
> you or your organization, please let me know!
> 
> Cheers,
> Greg


Re: Release policy and 1.6 release schedule

2018-03-23 Thread Vinod Kone
I’m +1 for quarterly. 

Most importantly I want us to adhere to a predictable cadence. 

Sent from my phone

> On Mar 23, 2018, at 9:21 PM, Jie Yu  wrote:
> 
> It's a burden for supporting multiple releases.
> 
> 1.2 was released March, 2017 (1 year ago), and I know that some users are 
> still on that version
> 1.3 was released June, 2017 (9 months ago), and we're still maintaining it 
> (still backport patches several days ago, which some users asked)
> 1.4 was released Sept, 2017 (6 months ago).
> 1.5 was released Feb, 2018 (1 month ago).
> 
> As you can see, users expect a release to be supported 6-9 months (e.g., 
> backports are still needed for 1.3 release, which is 9 months old). If we 
> were to do monthly minor release, we'll probably need to maintain 6-9 release 
> branches? That's too much of an ask for committers and maintainers.
> 
> I also agree with folks that there're benefits doing releases more 
> frequently. Given the historical data, I'd suggest we do quarterly releases, 
> and maintain three release branches.
> 
> - Jie
> 
>> On Fri, Mar 23, 2018 at 10:03 AM, Greg Mann  wrote:
>> The best motivation I can think of for a shorter release cycle is this: if
>> the release cadence is fast enough, then developers will be less likely to
>> rush a feature into a release. I think this would be a real benefit, since
>> rushing features in hurts stability. *However*, I'm not sure if every two
>> months is fast enough to bring this benefit. I would imagine that a
>> two-month wait is still long enough that people wouldn't want to wait an
>> entire release cycle to land their feature. Just off the top of my head, I
>> might guess that a release cadence of 1 month or shorter would be often
>> enough that it would always seem reasonable for a developer to wait until
>> the next release to land a feature. What do y'all think?
>> 
>> Other motivating factors that have been raised are:
>> 1) Many users upgrade on a longer timescale than every ~2 months. I think
>> that this doesn't need to affect our decision regarding release timing -
>> since we guarantee compatibility of all releases with the same major
>> version number, there is no reason that a user needs to upgrade minor
>> releases one at a time. It's fine to go from 1.N to 1.(N+3), for example.
>> 2) Backporting will be a burden if releases are too short. I think that in
>> practice, backporting will not take too much longer. If there was a
>> conflict back in the tree somewhere, then it's likely that after resolving
>> that conflict once, the same diff can be used to backport the change to
>> previous releases as well.
>> 3) Adhering strictly to a time-based release schedule will help users plan
>> their deployments, since they'll be able to rely on features being released
>> on-schedule. However, if we do strict time-based releases, then it will be
>> less certain that a particular feature will land in a particular release,
>> and users may have to wait a release cycle to get the feature.
>> 
>> Personally, I find the idea of preventing features from being rushed into a
>> release very compelling. From that perspective, I would love to see
>> releases every month. However, if we're not going to release that often,
>> then I think it does make sense to adjust our release schedule to
>> accommodate the features that community members want to land in a
>> particular release.
>> 
>> 
>> Jie, I'm curious why you suggest a *minimal* interval between releases.
>> Could you elaborate a bit on your motivations there?
>> 
>> Cheers,
>> Greg
>> 
>> 
>> On Fri, Mar 16, 2018 at 2:01 PM, Jie Yu  wrote:
>> 
>> > Thanks Greg for starting this thread!
>> >
>> >
>> >> My primary motivation here is to bring our documented policy in line
>> >> with our practice, whatever that may be
>> >
>> >
>> > +100
>> >
>> > Do people think that we should attempt to bring our release cadence more
>> >> in line with our current stated policy, or should the policy be changed
>> >> to reflect our current practice?
>> >
>> >
>> > I think a minor release every 2 months is probably too aggressive. I don't
>> > have concrete data, but my feeling is that the frequency that folks upgrade
>> > Mesos is low. I know that many users are still on 1.2.x.
>> >
>> > I'd actually suggest that we have a *minimal* interval between two
>> > releases (e.g., 3 months), and provide some buffer for the release process.
>> > (so we're expecting about 3 releases per year, this matches what we did
>> > last year).
>> >
>> > And we use our dev sync to coordinate on a release after the minimal
>> > release interval has elapsed (and elect a release manager).
>> >
>> > - Jie
>> >
>> > On Wed, Mar 14, 2018 at 9:51 AM, Zhitao Li  wrote:
>> >
>> >> An additional data point is how long it takes from first RC being cut to
>> >> the final release tag vote passes. That probably indicates smoothness of
>> >> the release process 

Re: On disabled tests

2018-03-21 Thread Vinod Kone
Thanks for doing this Alex! Are you proposing a policy that every disabled test 
should’ve an associated ticket that is linked in the comment above the test? 
I’m all for it. 

Sent from my phone

> On Mar 21, 2018, at 9:42 AM, Alex Rukletsov  wrote:
> 
> Folks,
> 
> to increase visibility into disabled tests, I've added a "disabled-test"
> label. Whenever you disable a test, please add this label. A TODO comment
> before the test mentioning the corresponding jira helps too.
> 
> At the moment we have 20+ disabled tests in 18 tickets [1]. Some tests were
> disabled for a "brief period of time" before the release and stayed in that
> state for years. It would be great to audit all of them and either fix and
> re-enable or remove altogether. Any help is appreciated and volunteers are
> sought!
> 
> [1] https://issues.apache.org/jira/issues/?filter=12343497


Re: The state of our CI

2018-02-07 Thread Vinod Kone
Thanks for the reminder MPark. And also huge thanks (to you and others who
chipped in) for digging into the blocking fd issue plaguing our CI and
fixing it!

Please please make sure keeping our CI green is among your top priorities.
It's the responsibility of all of us.

For new contributors and committers, the emails from CI go to
bui...@mesos.apache.org. Please subscribe to that list if you haven't
already.

On Wed, Feb 7, 2018 at 8:13 PM, Michael Park  wrote:

> Last week I noticed that our CI (
> https://builds.apache.org/job/Mesos-Buildbot) has been failing for some
> time (I've heard somewhere between 2 weeks and a month). It seems like none
> of us (including me of course) are paying much attention to the
> builds.apache.org emails. Hard to blame ourselves for filtering these
> emails and not paying much attention to the failures given the history of
> flaky tests.
>
> However, I would like to point out that Alex Rukletsov has done a fantastic
> job over the last 4 months or so of trying to keep our CI in a sane state.
> He's been identifying, tracking, and keeping authors accountable of the
> flaky tests and there has been great progress. I would love it if we can
> reap the benefits of his (and others!) efforts simply by us keeping a
> closer eye on our CI.
>
> Thanks,
>
> MPark
>


Re: Introducing `support/mesos-build.sh`

2018-02-07 Thread Vinod Kone
Yay, thanks MPark! Has the change landed already?

On Wed, Feb 7, 2018 at 8:23 PM, Michael Park  wrote:

> Many of you probably know that we currently have `support/docker-build.sh`
> to power our CI for our various configurations. One of the problems for us
> has been that we create a `Dockerfile` ad-hoc and invoke `docker build`
> with it. This is very inefficient and also leads to flaky issues around
> `apt-get install`.
>
> I've introduced `support/mesos-build.sh` which operates off of docker
> images hosted on Dockerhub instead, and should aid in bringing us faster
> and more stable CI results!
>
> As a bonus, we now also test Clang on the CentOS 7!
>
> Thanks,
>
> MPark
>


Re: Soliciting Hackathon Ideas

2018-02-06 Thread Vinod Kone
Versioned documentation!

Sent from my iPhone

> On Feb 6, 2018, at 4:37 PM, Benjamin Mahler  wrote:
> 
> A couple of ideas from the performance related working group:
> 
> -Use protobuf arenas for all non-trivial outbound master messages (easy)
> This can be done piecemeal.
> -Use move semantics (take a Message&&) in all of the master message
> handlers to reduce copying (medium) This one can be done piecemeal. For
> example Master::statusUpdate would be a good one to start with.
> -Audit the Registrar code to use move semantics to reduce copying (medium)
> 
> If there are any UI programmers:
> 
> -Consider a webui "refresh", try to find a new set of fonts and style,
> could be fun.
> 
> On Fri, Feb 2, 2018 at 12:47 PM, Andrew Schwartzmeyer <
> and...@schwartzmeyer.com> wrote:
> 
>> Hello all,
>> 
>> Next month I'll be attending HackIllinois (https://hackillinois.org/) as
>> an open-source mentor. It's a huge student-run hackathon at the University
>> of Illinois at Urbana-Champaign, running from February 23rd to the 25th.
>> Students from a multitude of schools will be attending (they even bus them
>> in). The hackathon has an open-source focus, and while there will be many
>> projects for the students to work on, I want to make sure Mesos gets some
>> attention too.
>> 
>> I am asking you all for open issues and new ideas for small,
>> beginner-friendly projects that could fit a two-day Hackathon project. For
>> Mesos, I'm looking through our open issues labeled "easyfix", "beginner",
>> or "newbie", which actually returns 74 results! If you have anything in
>> particular that you think would be a good fit, please let me know. I'd like
>> to go with a list of vetted issues so I don't accidentally start some
>> students in on a giant can of worms. Our excellent new Beginner Contributor
>> Guide will be a huge help too.
>> 
>> Thanks,
>> 
>> Andy
>> 
>> P.S. If any of you also want to attend, let me know, and I'll get you in
>> touch with their director.
>> 


Re: [VOTE] Release Apache Mesos 1.5.0 (rc2)

2018-02-05 Thread Vinod Kone
+1 (binding)

Tested on ASF CI. The red builds were known flaky tests regarding
checks/health checks.

*Revision*: f7e3872b0359c6095f8eeaefe408cb7dcef5bb83

   - refs/tags/1.5.0-rc2

Configuration Matrix gcc clang
centos:7 --verbose --enable-libevent --enable-ssl autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
--verbose autotools
[image: Failed]

[image: Not run]
cmake
[image: Success]

[image: Not run]
ubuntu:14.04 --verbose --enable-libevent --enable-ssl autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]

--verbose autotools
[image: Success]

[image: Success]

cmake
[image: Success]

[image: Success]


On Sat, Feb 3, 2018 at 11:11 AM, Zhitao Li  wrote:

> +1 (non-binding)
>
> Tested with running all tests on Debian/jessie server on AWS.
>
> On Fri, Feb 2, 2018 at 3:25 PM, Jie Yu  wrote:
>
>> +1
>>
>> Verified in our internal CI that `sudo make check` passed in CentOS 6,
>> CentOS7, Debian 8, Ubuntu 14.04, Ubuntu 16.04 (both w/ or w/o SSL
>> enabled).
>>
>>
>> On Thu, Feb 1, 2018 at 5:36 PM, Gilbert Song  wrote:
>>
>> > Hi all,
>> >
>> > Please vote on releasing the following candidate as Apache Mesos 1.5.0.
>> >
>> > 1.5.0 includes the following:
>> > 
>> > 
>> >   * Support Container Storage Interface (CSI).
>> >   * Agent reconfiguration policy.
>> >   * Auto GC docker images in Mesos Containerizer.
>> >   * Standalone containers.
>> >   * Support gRPC client.
>> >   * Non-leading VOTING replica catch-up.
>> >
>> >
>> > The CHANGELOG for the release is available at:
>> > https://git-wip-us.apache.org/repos/asf?p=mesos.git;a=blob_p
>> > lain;f=CHANGELOG;hb=1.5.0-rc2
>> > 
>> > 

Re: No more Issue update emails

2018-02-02 Thread Vinod Kone
Yes, most updates to ticket metadata (assignee, comments etc have special
rules). Watchers will still get those updates.

See our notification scheme here:
https://issues.apache.org/jira/plugins/servlet/project-config/MESOS/notifications

On Thu, Feb 1, 2018 at 3:16 PM, Benjamin Mahler <bmah...@apache.org> wrote:

> This includes any updates to description, priority, target versions? If I
> watch a ticket will I still get update emails?
>
> On Thu, Feb 1, 2018 at 12:29 PM Vinod Kone <vinodk...@apache.org> wrote:
>
> > Hi folks,
> >
> > Just a heads up that I've had our JIRA notification scheme changed so
> that
> > the issues@ list is no longer emailed when an issue is updated. So no
> more
> > emails for story point changes or sprint changes! Hopefully you will
> > appreciate the reduction in email to the list.
> >
> > Thanks,
> > Vinod
> >
>


No more Issue update emails

2018-02-01 Thread Vinod Kone
Hi folks,

Just a heads up that I've had our JIRA notification scheme changed so that
the issues@ list is no longer emailed when an issue is updated. So no more
emails for story point changes or sprint changes! Hopefully you will
appreciate the reduction in email to the list.

Thanks,
Vinod


  1   2   3   4   5   6   7   8   9   10   >