On Mesos versioning and deprecation policy

2016-10-12 Thread Alex Rukletsov
Folks,

There have been a bunch of online [1, 2] and offline discussions about our
deprecation and versioning policy. I found that people—including
myself—read the versioning doc [3] differently; moreover some aspects are
not captured there. I would like to start a discussion around this topic by
sharing my confusions and suggestions. This will hopefully help us stay on
the same page and have similar expectations. The second goal is to
eliminate ambiguities from the versioning doc (thanks Vinod for
volunteering to update it).

1. API vs. semantic changes.
Current versioning guide treat features (e.g. flags, metrics, endpoints)
and API differently: incompatible changes for the former are allowed after
6 month deprecation cycle, while for the latter they require bumping a
major version. I suggest we consolidate these policies.

We should also define and clearly explain what changes require bumping the
major version. I have no strong opinion here and would love to hear what
people think. The original motivation for maintaining backwards
compatibility is to make sure vN schedulers can correctly work with vN API
without being updated. But what about semantic changes that do not touch
the API? For example, what if we decide to send less task health updates to
schedulers based on some health policy? It influences the flow of task
status updates, should such change be considered compatible? Taking it to
an extreme, we may not even be able to fix some bugs because someone may
already rely on this behaviour!

Another tightly related thing we should explicitly call out is
upgradability and rollback capabilities inside a major release. Committing
to this may significantly limit what we can change within a major release;
on the other side it will give users more time and a better experience
about using and maintaining Mesos clusters.

2. Versioned vs. unversioned protobufs.
Currently we have v1 and unnamed protobufs, which simultaneously mean v0,
v2, and internal. I am sometimes confused about what is the right way to
update or introduce a field or message there, do people feel the same? How
about splitting the unnamed version into explicit v0, v2, and internal?

Food for thought. It would be great if we can only maintain "diffs" to the
internal protobufs in the code, instead of duplicating them altogether.

3. API and feature labelling.
I suggest to introduce explicit labels for API and features, to ensure
users have the right assumptions about the their lifetime while engineers
have the ability to change a wip feature in an non-compatible way. I
propose the following:
API: stable, non-stable, pure (not used by Mesos components)
Feature: experimental, normal.

Looking forward to your thoughts and suggestions.
AlexR

[1] https://www.mail-archive.com/user@mesos.apache.org/msg08025.html
[2] https://www.mail-archive.com/dev@mesos.apache.org/msg36621.html
[3]
https://github.com/apache/mesos/blob/b2beef37f6f85a8c75e968136caa7a1f292ba20e/docs/versioning.md


Re: How to shutdown mesos-agent gracefully?

2016-10-12 Thread Klaus Ma
I'd like to notify framework to kill its tasks and then terminate the
mesos-agent. To the Maintenance feature, can not remember whether the slave
info will be clearup if that slave will not re-register back.

On Wed, Oct 12, 2016 at 10:13 PM Alex Rukletsov  wrote:

> To make sure: you are aware of SIGUSR1?
>
> On Tue, Oct 11, 2016 at 5:37 PM, tommy xiao  wrote:
>
> > Hi Ma,
> >
> > could you please input more background, why Maintenance feature  is not
> > best option for your request?
> >
> > 2016-10-11 14:47 GMT+08:00 haosdent :
> >
> > > gracefully means not affect running tasks?
> > >
> > > On Tue, Oct 11, 2016 at 2:36 PM, Klaus Ma 
> > wrote:
> > >
> > >> It seems there's not a way to shutdown mesos-agent gracefully.
> > >> Maintenance feature expect the agents re-register back in the future.
> > >>
> > >> Thanks
> > >> Klaus
> > >> --
> > >>
> > >> Regards,
> > >> 
> > >> Da (Klaus), Ma (马达), PMP® | Software Architect
> > >> IBM Platform Development & Support, STG, IBM GCG
> > >> +86-10-8245 4084 <+86%2010%208245%204084> | mad...@cn.ibm.com |
> http://k82.me
> > >>
> > >
> > >
> > >
> > > --
> > > Best Regards,
> > > Haosdent Huang
> > >
> >
> >
> >
> > --
> > Deshi Xiao
> > Twitter: xds2000
> > E-mail: xiaods(AT)gmail.com
> >
>
-- 

Regards,

Da (Klaus), Ma (马达), PMP® | Software Architect
IBM Platform Development & Support, STG, IBM GCG
+86-10-8245 4084 | mad...@cn.ibm.com | http://k82.me


Re: How to shutdown mesos-agent gracefully?

2016-10-12 Thread Alex Rukletsov
To make sure: you are aware of SIGUSR1?

On Tue, Oct 11, 2016 at 5:37 PM, tommy xiao  wrote:

> Hi Ma,
>
> could you please input more background, why Maintenance feature  is not
> best option for your request?
>
> 2016-10-11 14:47 GMT+08:00 haosdent :
>
> > gracefully means not affect running tasks?
> >
> > On Tue, Oct 11, 2016 at 2:36 PM, Klaus Ma 
> wrote:
> >
> >> It seems there's not a way to shutdown mesos-agent gracefully.
> >> Maintenance feature expect the agents re-register back in the future.
> >>
> >> Thanks
> >> Klaus
> >> --
> >>
> >> Regards,
> >> 
> >> Da (Klaus), Ma (马达), PMP® | Software Architect
> >> IBM Platform Development & Support, STG, IBM GCG
> >> +86-10-8245 4084 | mad...@cn.ibm.com | http://k82.me
> >>
> >
> >
> >
> > --
> > Best Regards,
> > Haosdent Huang
> >
>
>
>
> --
> Deshi Xiao
> Twitter: xds2000
> E-mail: xiaods(AT)gmail.com
>


Re: 1.1.0 release

2016-10-12 Thread Alex Rukletsov
Folks,

we have 23 unresolved tickets targeted for Mesos 1.1.0 release, including 7
blockers and 3 epics (MESOS-5344, MESOS-3421, MESOS-2449), which turns 23
into 55. Obviously, we can’t make a cut today.

Shepherds please either commit your blockers by Thu EOD PST or declare them
as non-blockers. For unfinished epics, please transition all unresolved
tickets to a new epic (see previous email) or retarget the epic. Make sure
CHANGELOG is in good shape.

We strive to cut the release on Fri Oct 14 around 13:00 CEST. At that time
we will bulk-transit all unresolved tickets to 1.2.

Rigorously,
Alex & Till

On Tue, Oct 11, 2016 at 5:30 PM, Alex Rukletsov  wrote:

> Folks,
>
> in preparation for Mesos 1.1.0 release we would like to ask people who
> have worked on features in 1.1.0 to either:
> * update the CHANGELOG and declare the feature implemented or
> experimental, make sure documentation is updated as well;
> * postpone to 1.2 and update the related epic;
> * promote an experimental feature to stable if necessary.
>
> If you think you need to land something in 1.1.0, please mark the
> respective JIRA as a blocker and set the target version to 1.1.0. Bear in
> mind the release cut will be cut *tomorrow*, Oct 12 2016.
>
> For experimental features, consider creating a separate epic and moving
> all unresolved tickets there, while marking the original epic as resolved
> for 1.1.0. For example, see MESOS-2449 (pods) and MESOS-6355
> (pods-improvements).
>
> Below is the list of candidates for the CHAGELOG update with their
> respective owners:
> MESOS-6014 CNI port-mapping Avinash, Jie
> MESOS-2449 Pods, subtopics: nested containers, nested isolators, default
> executor Vinod
> MESOS-5676 New Mesos CLI Kevin
> MESOS-4697 Unified Cgroups isolator Haosdent, Jie
> MESOS-6007 v1 API Anand, Vinod
> MESOS-3302 - // -
> MESOS-4855 - // -
> MESOS-4791 - // -
> MESOS-4766 Allocator performance BenM
> MESOS-4936 Container security Jie
> MESOS-4936 Capabilities and container security Benjamin Bannier, Jie
> MESOS-3421 Shared resources Yan Xu
> MESOS-5344 Partition awareness  Neil
>
> Below is the list of features marked as experimental in 1.0. Are they
> ready to be promoted and called out in the CHANGELOG?
> MESOS-4312 Power PC Vinod
> MESOS-4828 XFS disk isolator Yan Xu
> MESOS-4641 Network CNI isolator Qian, Jie
> MESOS-3094 Mesos tasks on Windows Joseph
> MESOS-4355 Docker volume isolator Guangya, Qian, Jie
>
> This one has never been even called experimental. Joseph, is it time to do
> so?
> MESOS-898 CMake (never declared even experimental) Joseph
>
> Thanks in advance for cooperation,
> Till and AlexR
>
> On Fri, Oct 7, 2016 at 7:47 PM, Vinod Kone  wrote:
>
>> I think you need to clean up the JIRA a bit.
>>
>> 1) Make sure unresolved tickets do not have fix version (1.1.0) set.
>> 2) Move "Fix version 1.1.0" to "Target version 1.1.0".
>>
>> 2) might obviate the need for 1).
>>
>>
>>
>> On Fri, Oct 7, 2016 at 7:24 AM, Till Toenshoff  wrote:
>>
>>> Hi everyone!
>>>
>>> its us who will be the Release Managers for 1.1.0 - Alex and Till!
>>>
>>> We are planning to cut the next release (1.1.0) within three workdays -
>>> that would be Wednesday next week. So, if you have any patches that need to
>>> get into 1.1.0 make sure that either is already in the master branch or the
>>> corresponding ticket has a target version set to 1.1.0.
>>>
>>> The release dashboard:
>>> https://issues.apache.org/jira/secure/Dashboard.jspa?selectP
>>> ageId=12329720
>>>
>>> Alex & Till
>>>
>>
>>
>