Thanks Alex for starting this! In addition to comments below, I think it'll be helpful to keep the existing versioning doc concise and user-friendly while having a dedicated doc for the "implementation details" where precise requirements and procedures go. Maybe some duplication/cross-referencing is needed but Mesos developers will find the latter much more helpful while the users/framework developer will find the former easy to read.
e.g., a similar split: https://github.com/kubernetes/kubernetes/blob/master/docs/api.md https://github.com/kubernetes/kubernetes/blob/master/docs/devel/api_changes.md (which has a lot of details on how the kubernetes community is thinking about similar issues, which we can learn from) Jiang Yan Xu On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov <a...@mesosphere.com> wrote: > Folks, > > There have been a bunch of online [1, 2] and offline discussions about our > deprecation and versioning policy. I found that people—including > myself—read the versioning doc  differently; moreover some aspects are > not captured there. I would like to start a discussion around this topic by > sharing my confusions and suggestions. This will hopefully help us stay on > the same page and have similar expectations. The second goal is to > eliminate ambiguities from the versioning doc (thanks Vinod for > volunteering to update it). > +1 Let me know if there are things I can help with. > > 1. API vs. semantic changes. > Current versioning guide treat features (e.g. flags, metrics, endpoints) > and API differently: incompatible changes for the former are allowed after > 6 month deprecation cycle, while for the latter they require bumping a > major version. I suggest we consolidate these policies. > I feel that the distinction is not API vs. semantic changes, Backwards compatible API guarantee should imply backwards compatible semantics (of the API). i.e., if a change in API doesn't cause the message to be dropped to the floor but leads to behavior change that causes problems in the system, it still breaks compatibility. IMO the distinction is more between: - Compatibility between components that are impossible/very unpleasant to upgrade in lockstep - high priority for compatibility guarantee. - Compatibility between components that are generally bundled (modules) or things that usually aren't built into automated tooling (e.g., the /state endpoint) - more relaxed for now but we should explicitly exclude them from the guarantee. > > We should also define and clearly explain what changes require bumping the > major version. I have no strong opinion here and would love to hear what > people think. The original motivation for maintaining backwards > compatibility is to make sure vN schedulers can correctly work with vN API > without being updated. But what about semantic changes that do not touch > the API? For example, what if we decide to send less task health updates to > schedulers based on some health policy? It influences the flow of task > status updates, should such change be considered compatible? Taking it to > an extreme, we may not even be able to fix some bugs because someone may > already rely on this behaviour! > API changes should warrant a major version bump. Also the API is not just what the machine reads but all the documentation associated with it, right? It depends on what the documentation says; what the user _should_ expect. That said, I feel that these things are hard to be talked about in the abstract. Even with a guideline, we still need to make case-by-case decisions. (e.g., has the documentation precisely defined this precise behavior? If not, is it reasonable for the users to expect some behavior because it's common sense? How bad is it if some behavior just changes a tiny bit?) Therefore we need to make sure the process for API changes are more rigorously defined. Whether something is a bug depends on whether the API does what it says it'll do. The line may sometimes be blurry but in general I don't feel it's a problem. If someone is relying on the behavior that is a bug, we should still help them fix it but the bug shouldn't count as "our guarantee". > > Another tightly related thing we should explicitly call out is > upgradability and rollback capabilities inside a major release. Committing > to this may significantly limit what we can change within a major release; > on the other side it will give users more time and a better experience > about using and maintaining Mesos clusters. > According to the versioning doc upgradability depends on whether you depend on deprecated/removed features. That paragraph should be explained more precisely: - "deprecated" means your system won't break but warnings are shown (Maybe we should use some standard deprecation warning keywords so the operator can monitor the log for such warnings! - "removed": means it may break. If you deprecate a flag/env that interface with operator tooling in the next minor release, the operator basically has 6 months from the next minor release to change the her tooling. I feel this is pretty acceptable. If you deprecate a flag/env variable that interface with the framework (executor) in the next minor release, I feel it may not be enough and it probably warrants a major version bump. So perhaps the API shouldn't be just the protos. > 2. Versioned vs. unversioned protobufs. > Currently we have v1 and unnamed protobufs, which simultaneously mean v0, > v2, and internal. I am sometimes confused about what is the right way to > update or introduce a field or message there, do people feel the same? How > about splitting the unnamed version into explicit v0, v2, and internal? > As haosdent mentioned, we have captured this in MESOS-6268. The benefit is clear but I guess the people will be more motivated when we find some v2 feature can't be made compatible with the v0 API. (Anand's point in MESOS-6016). On the other hand, if we cut v0 API access before that happens (is v0 API obsolete and should be removed 6 months after 1.0?) then we don't need to worry about v0 and can use unversioned protos as "internal"? > Food for thought. It would be great if we can only maintain "diffs" to the > internal protobufs in the code, instead of duplicating them altogether. > > 3. API and feature labelling. > I suggest to introduce explicit labels for API and features, to ensure > users have the right assumptions about the their lifetime while engineers > have the ability to change a wip feature in an non-compatible way. I > propose the following: > API: stable, non-stable, pure (not used by Mesos components) > Feature: experimental, normal. > +1 on formalizing the terminologies. Historically the distinction is not clear for the following: 1. The API has no compatibility guarantee at all. 2. The feature provided by this API is experimental IMO It's OK that we say that we don't distinguish the two (the API has no compatibility guarantee until the feature is fully released) but we have to make it clear. If we don't make such distinction, ALL API additions should be marked as unstable first and be changed stable later (as a formal process). > > Looking forward to your thoughts and suggestions. > AlexR > >  https://firstname.lastname@example.org/msg08025.html >  https://email@example.com/msg36621.html >  > https://github.com/apache/mesos/blob/b2beef37f6f85a8c75e968136caa7a > 1f292ba20e/docs/versioning.md >