Re: 1.1.0 release

2016-10-14 Thread Till Toenshoff
Latest news; We dp still have blocking issues. The release cut is being delayed
once again towards the end-of-day, today. That gives us the weekend to prepare 
a release candidate.

Some blocking issues that did not show any progress or communication were now
re-targeted in silent consensus.

Alex & Till

> On  Oct 12, 2016, at 2:11 PM, Alex Rukletsov  wrote:
> 
> Folks,
> 
> we have 23 unresolved tickets targeted for Mesos 1.1.0 release, including 7
> blockers and 3 epics (MESOS-5344, MESOS-3421, MESOS-2449), which turns 23
> into 55. Obviously, we can’t make a cut today.
> 
> Shepherds please either commit your blockers by Thu EOD PST or declare them
> as non-blockers. For unfinished epics, please transition all unresolved
> tickets to a new epic (see previous email) or retarget the epic. Make sure
> CHANGELOG is in good shape.
> 
> We strive to cut the release on Fri Oct 14 around 13:00 CEST. At that time
> we will bulk-transit all unresolved tickets to 1.2.
> 
> Rigorously,
> Alex & Till
> 
> On Tue, Oct 11, 2016 at 5:30 PM, Alex Rukletsov  wrote:
> 
>> Folks,
>> 
>> in preparation for Mesos 1.1.0 release we would like to ask people who
>> have worked on features in 1.1.0 to either:
>> * update the CHANGELOG and declare the feature implemented or
>> experimental, make sure documentation is updated as well;
>> * postpone to 1.2 and update the related epic;
>> * promote an experimental feature to stable if necessary.
>> 
>> If you think you need to land something in 1.1.0, please mark the
>> respective JIRA as a blocker and set the target version to 1.1.0. Bear in
>> mind the release cut will be cut *tomorrow*, Oct 12 2016.
>> 
>> For experimental features, consider creating a separate epic and moving
>> all unresolved tickets there, while marking the original epic as resolved
>> for 1.1.0. For example, see MESOS-2449 (pods) and MESOS-6355
>> (pods-improvements).
>> 
>> Below is the list of candidates for the CHAGELOG update with their
>> respective owners:
>> MESOS-6014 CNI port-mapping Avinash, Jie
>> MESOS-2449 Pods, subtopics: nested containers, nested isolators, default
>> executor Vinod
>> MESOS-5676 New Mesos CLI Kevin
>> MESOS-4697 Unified Cgroups isolator Haosdent, Jie
>> MESOS-6007 v1 API Anand, Vinod
>> MESOS-3302 - // -
>> MESOS-4855 - // -
>> MESOS-4791 - // -
>> MESOS-4766 Allocator performance BenM
>> MESOS-4936 Container security Jie
>> MESOS-4936 Capabilities and container security Benjamin Bannier, Jie
>> MESOS-3421 Shared resources Yan Xu
>> MESOS-5344 Partition awareness  Neil
>> 
>> Below is the list of features marked as experimental in 1.0. Are they
>> ready to be promoted and called out in the CHANGELOG?
>> MESOS-4312 Power PC Vinod
>> MESOS-4828 XFS disk isolator Yan Xu
>> MESOS-4641 Network CNI isolator Qian, Jie
>> MESOS-3094 Mesos tasks on Windows Joseph
>> MESOS-4355 Docker volume isolator Guangya, Qian, Jie
>> 
>> This one has never been even called experimental. Joseph, is it time to do
>> so?
>> MESOS-898 CMake (never declared even experimental) Joseph
>> 
>> Thanks in advance for cooperation,
>> Till and AlexR
>> 
>> On Fri, Oct 7, 2016 at 7:47 PM, Vinod Kone  wrote:
>> 
>>> I think you need to clean up the JIRA a bit.
>>> 
>>> 1) Make sure unresolved tickets do not have fix version (1.1.0) set.
>>> 2) Move "Fix version 1.1.0" to "Target version 1.1.0".
>>> 
>>> 2) might obviate the need for 1).
>>> 
>>> 
>>> 
>>> On Fri, Oct 7, 2016 at 7:24 AM, Till Toenshoff  wrote:
>>> 
 Hi everyone!
 
 its us who will be the Release Managers for 1.1.0 - Alex and Till!
 
 We are planning to cut the next release (1.1.0) within three workdays -
 that would be Wednesday next week. So, if you have any patches that need to
 get into 1.1.0 make sure that either is already in the master branch or the
 corresponding ticket has a target version set to 1.1.0.
 
 The release dashboard:
 https://issues.apache.org/jira/secure/Dashboard.jspa?selectP
 ageId=12329720
 
 Alex & Till
 
>>> 
>>> 
>> 



Design doc for rlimit support in Mesos

2016-10-14 Thread Benjamin Bannier
Hi,

we are interested in exposing user resource limits (rlimits) to Mesos so 
executors can prepare environments for task with differing limit requirements. 
The design doc can be found here,


https://docs.google.com/document/d/148og6TlknWIG2d-VmyCG01eliiOGhNEc12mG4TWsfHU/edit?usp=sharing

Feedback welcome!


Cheers,

Benjamin

Non-checkpointing frameworks

2016-10-14 Thread Neil Conway
Hi folks,

I'd like input from individuals who currently use frameworks but do
not enable checkpointing.

Background: "checkpointing" is a parameter that can be enabled in
FrameworkInfo; if enabled, the agent will write the framework pid,
executor PIDs, and status updates to disk for any tasks started by
that framework. This checkpointed information means that these tasks
can survive an agent crash: if the agent exits (whether due to
crashing or as part of an upgrade procedure), a restarted agent can
use this information to reconnect to executors started by the previous
instance of the agent. The downside is that checkpointing requires
some additional disk I/O at the agent.

Checkpointing is not currently the default, but in my experience it is
often enabled for production frameworks. As part of the work on
supporting partition-aware Mesos frameworks (see MESOS-4049), we are
considering:

(a) requiring that partition-aware frameworks must also enable
checkpointing, and/or
(b) enabling checkpointing by default

If you have intentionally decided to disable checkpointing for your
Mesos framework, I'd be curious to hear more about your use-case and
why you haven't enabled it.

Thanks!

Neil


Re: On Mesos versioning and deprecation policy

2016-10-14 Thread Yan Xu
Thanks Alex for starting this!

In addition to comments below, I think it'll be helpful to keep the
existing versioning doc concise and user-friendly while having a dedicated
doc for the "implementation details" where precise requirements and
procedures go. Maybe some duplication/cross-referencing is needed but Mesos
developers will find the latter much more helpful while the users/framework
developer will find the former easy to read.

e.g., a similar split:
https://github.com/kubernetes/kubernetes/blob/master/docs/api.md
https://github.com/kubernetes/kubernetes/blob/master/docs/devel/api_changes.md
(which has a lot of details on how the kubernetes community is thinking
about similar issues, which we can learn from)

Jiang Yan Xu 

On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov  wrote:

> Folks,
>
> There have been a bunch of online [1, 2] and offline discussions about our
> deprecation and versioning policy. I found that people—including
> myself—read the versioning doc [3] differently; moreover some aspects are
> not captured there. I would like to start a discussion around this topic by
> sharing my confusions and suggestions. This will hopefully help us stay on
> the same page and have similar expectations. The second goal is to
> eliminate ambiguities from the versioning doc (thanks Vinod for
> volunteering to update it).
>

+1 Let me know if there are things I can help with.


>
> 1. API vs. semantic changes.
> Current versioning guide treat features (e.g. flags, metrics, endpoints)
> and API differently: incompatible changes for the former are allowed after
> 6 month deprecation cycle, while for the latter they require bumping a
> major version. I suggest we consolidate these policies.
>

I feel that the distinction is not API vs. semantic changes, Backwards
compatible API guarantee should imply backwards compatible semantics (of
the API).
i.e., if a change in API doesn't cause the message to be dropped to the
floor but leads to behavior change that causes problems in the system, it
still breaks compatibility.

IMO the distinction is more between:
- Compatibility between components that are impossible/very unpleasant to
upgrade in lockstep - high priority for compatibility guarantee.
- Compatibility between components that are generally bundled (modules) or
things that usually aren't built into automated tooling (e.g., the /state
endpoint) - more relaxed for now but we should explicitly exclude them from
the guarantee.


>
> We should also define and clearly explain what changes require bumping the
> major version. I have no strong opinion here and would love to hear what
> people think. The original motivation for maintaining backwards
> compatibility is to make sure vN schedulers can correctly work with vN API
> without being updated. But what about semantic changes that do not touch
> the API? For example, what if we decide to send less task health updates to
> schedulers based on some health policy? It influences the flow of task
> status updates, should such change be considered compatible? Taking it to
> an extreme, we may not even be able to fix some bugs because someone may
> already rely on this behaviour!
>

API changes should warrant a major version bump. Also the API is not just
what the machine reads but all the documentation associated with it, right?
It depends on what the documentation says; what the user _should_ expect.

That said, I feel that these things are hard to be talked about in the
abstract. Even with a guideline, we still need to make case-by-case
decisions. (e.g., has the documentation precisely defined this precise
behavior? If not, is it reasonable for the users to expect some behavior
because it's common sense? How bad is it if some behavior just changes a
tiny bit?) Therefore we need to make sure the process for API changes are
more rigorously defined.

Whether something is a bug depends on whether the API does what it says
it'll do. The line may sometimes be blurry but in general I don't feel it's
a problem. If someone is relying on the behavior that is a bug, we should
still help them fix it but the bug shouldn't count as "our guarantee".


>
> Another tightly related thing we should explicitly call out is
> upgradability and rollback capabilities inside a major release. Committing
> to this may significantly limit what we can change within a major release;
> on the other side it will give users more time and a better experience
> about using and maintaining Mesos clusters.
>

According to the versioning doc upgradability depends on whether you depend
on deprecated/removed features.

That paragraph should be explained more precisely:
- "deprecated" means your system won't break but warnings are shown (Maybe
we should use some standard deprecation warning keywords so the operator
can monitor the log for such warnings!
- "removed": means it may break.

If you deprecate a flag/env that interface with operator tooling in the
next minor release, the operator basicall

Re: On Mesos versioning and deprecation policy

2016-10-14 Thread Yan Xu
On Fri, Oct 14, 2016 at 3:37 PM, Yan Xu  wrote:

> Thanks Alex for starting this!
>
> In addition to comments below, I think it'll be helpful to keep the
> existing versioning doc concise and user-friendly while having a dedicated
> doc for the "implementation details" where precise requirements and
> procedures go. Maybe some duplication/cross-referencing is needed but Mesos
> developers will find the latter much more helpful while the users/framework
> developer will find the former easy to read.
>
> e.g., a similar split:
> https://github.com/kubernetes/kubernetes/blob/master/docs/api.md
> https://github.com/kubernetes/kubernetes/blob/master/docs/de
> vel/api_changes.md (which has a lot of details on how the kubernetes
> community is thinking about similar issues, which we can learn from)
>
> Jiang Yan Xu 
>
> On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov 
> wrote:
>
>> Folks,
>>
>> There have been a bunch of online [1, 2] and offline discussions about our
>> deprecation and versioning policy. I found that people—including
>> myself—read the versioning doc [3] differently; moreover some aspects are
>> not captured there. I would like to start a discussion around this topic
>> by
>> sharing my confusions and suggestions. This will hopefully help us stay on
>> the same page and have similar expectations. The second goal is to
>> eliminate ambiguities from the versioning doc (thanks Vinod for
>> volunteering to update it).
>>
>
> +1 Let me know if there are things I can help with.
>
>
>>
>> 1. API vs. semantic changes.
>> Current versioning guide treat features (e.g. flags, metrics, endpoints)
>> and API differently: incompatible changes for the former are allowed after
>> 6 month deprecation cycle, while for the latter they require bumping a
>> major version. I suggest we consolidate these policies.
>>
>
> I feel that the distinction is not API vs. semantic changes, Backwards
> compatible API guarantee should imply backwards compatible semantics (of
> the API).
> i.e., if a change in API doesn't cause the message to be dropped to the
> floor but leads to behavior change that causes problems in the system, it
> still breaks compatibility.
>
> IMO the distinction is more between:
> - Compatibility between components that are impossible/very unpleasant to
> upgrade in lockstep - high priority for compatibility guarantee.
> - Compatibility between components that are generally bundled (modules) or
> things that usually aren't built into automated tooling (e.g., the /state
> endpoint) - more relaxed for now but we should explicitly exclude them from
> the guarantee.
>
>
>>
>> We should also define and clearly explain what changes require bumping the
>> major version. I have no strong opinion here and would love to hear what
>> people think. The original motivation for maintaining backwards
>> compatibility is to make sure vN schedulers can correctly work with vN API
>> without being updated. But what about semantic changes that do not touch
>> the API? For example, what if we decide to send less task health updates
>> to
>> schedulers based on some health policy? It influences the flow of task
>> status updates, should such change be considered compatible? Taking it to
>> an extreme, we may not even be able to fix some bugs because someone may
>> already rely on this behaviour!
>>
>
> API changes should warrant a major version bump. Also the API is not just
> what the machine reads but all the documentation associated with it, right?
> It depends on what the documentation says; what the user _should_ expect.
>
> That said, I feel that these things are hard to be talked about in the
> abstract. Even with a guideline, we still need to make case-by-case
> decisions. (e.g., has the documentation precisely defined this precise
> behavior? If not, is it reasonable for the users to expect some behavior
> because it's common sense? How bad is it if some behavior just changes a
> tiny bit?) Therefore we need to make sure the process for API changes are
> more rigorously defined.
>
> Whether something is a bug depends on whether the API does what it says
> it'll do. The line may sometimes be blurry but in general I don't feel it's
> a problem. If someone is relying on the behavior that is a bug, we should
> still help them fix it but the bug shouldn't count as "our guarantee".
>
>
>>
>> Another tightly related thing we should explicitly call out is
>> upgradability and rollback capabilities inside a major release. Committing
>> to this may significantly limit what we can change within a major release;
>> on the other side it will give users more time and a better experience
>> about using and maintaining Mesos clusters.
>>
>
> According to the versioning doc upgradability depends on whether you
> depend on deprecated/removed features.
>
> That paragraph should be explained more precisely:
> - "deprecated" means your system won't break but warnings are shown (Maybe
> we should use some standard deprecation warning key

Re: On Mesos versioning and deprecation policy

2016-10-14 Thread Vinod Kone
We will chat about this in the upcoming community sync (thursday 3 PM). So,
please make sure to attend if you are interested.

On Fri, Oct 14, 2016 at 3:44 PM, Yan Xu  wrote:

>
> On Fri, Oct 14, 2016 at 3:37 PM, Yan Xu  wrote:
>
>> Thanks Alex for starting this!
>>
>> In addition to comments below, I think it'll be helpful to keep the
>> existing versioning doc concise and user-friendly while having a dedicated
>> doc for the "implementation details" where precise requirements and
>> procedures go. Maybe some duplication/cross-referencing is needed but Mesos
>> developers will find the latter much more helpful while the users/framework
>> developer will find the former easy to read.
>>
>> e.g., a similar split:
>> https://github.com/kubernetes/kubernetes/blob/master/docs/api.md
>> https://github.com/kubernetes/kubernetes/blob/master/docs/de
>> vel/api_changes.md (which has a lot of details on how the kubernetes
>> community is thinking about similar issues, which we can learn from)
>>
>> Jiang Yan Xu 
>>
>> On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov 
>> wrote:
>>
>>> Folks,
>>>
>>> There have been a bunch of online [1, 2] and offline discussions about
>>> our
>>> deprecation and versioning policy. I found that people—including
>>> myself—read the versioning doc [3] differently; moreover some aspects are
>>> not captured there. I would like to start a discussion around this topic
>>> by
>>> sharing my confusions and suggestions. This will hopefully help us stay
>>> on
>>> the same page and have similar expectations. The second goal is to
>>> eliminate ambiguities from the versioning doc (thanks Vinod for
>>> volunteering to update it).
>>>
>>
>> +1 Let me know if there are things I can help with.
>>
>>
>>>
>>> 1. API vs. semantic changes.
>>> Current versioning guide treat features (e.g. flags, metrics, endpoints)
>>> and API differently: incompatible changes for the former are allowed
>>> after
>>> 6 month deprecation cycle, while for the latter they require bumping a
>>> major version. I suggest we consolidate these policies.
>>>
>>
>> I feel that the distinction is not API vs. semantic changes, Backwards
>> compatible API guarantee should imply backwards compatible semantics (of
>> the API).
>> i.e., if a change in API doesn't cause the message to be dropped to the
>> floor but leads to behavior change that causes problems in the system, it
>> still breaks compatibility.
>>
>> IMO the distinction is more between:
>> - Compatibility between components that are impossible/very unpleasant to
>> upgrade in lockstep - high priority for compatibility guarantee.
>> - Compatibility between components that are generally bundled (modules)
>> or things that usually aren't built into automated tooling (e.g., the
>> /state endpoint) - more relaxed for now but we should explicitly exclude
>> them from the guarantee.
>>
>>
>>>
>>> We should also define and clearly explain what changes require bumping
>>> the
>>> major version. I have no strong opinion here and would love to hear what
>>> people think. The original motivation for maintaining backwards
>>> compatibility is to make sure vN schedulers can correctly work with vN
>>> API
>>> without being updated. But what about semantic changes that do not touch
>>> the API? For example, what if we decide to send less task health updates
>>> to
>>> schedulers based on some health policy? It influences the flow of task
>>> status updates, should such change be considered compatible? Taking it to
>>> an extreme, we may not even be able to fix some bugs because someone may
>>> already rely on this behaviour!
>>>
>>
>> API changes should warrant a major version bump. Also the API is not just
>> what the machine reads but all the documentation associated with it, right?
>> It depends on what the documentation says; what the user _should_ expect.
>>
>> That said, I feel that these things are hard to be talked about in the
>> abstract. Even with a guideline, we still need to make case-by-case
>> decisions. (e.g., has the documentation precisely defined this precise
>> behavior? If not, is it reasonable for the users to expect some behavior
>> because it's common sense? How bad is it if some behavior just changes a
>> tiny bit?) Therefore we need to make sure the process for API changes are
>> more rigorously defined.
>>
>> Whether something is a bug depends on whether the API does what it says
>> it'll do. The line may sometimes be blurry but in general I don't feel it's
>> a problem. If someone is relying on the behavior that is a bug, we should
>> still help them fix it but the bug shouldn't count as "our guarantee".
>>
>>
>>>
>>> Another tightly related thing we should explicitly call out is
>>> upgradability and rollback capabilities inside a major release.
>>> Committing
>>> to this may significantly limit what we can change within a major
>>> release;
>>> on the other side it will give users more time and a better experience
>>> about using and maintaining Mesos clust