+1 For the sum up. Now it is clear for me. On Sat, Oct 29, 2016 at 6:45 AM, Vinod Kone <vinodk...@apache.org> wrote:
> We had an extended discussion around this in the last community sync. > Thanks for those who participated! > > To sum up the discussion: > > --> As mesos devs, we should strive to not make incompatible changes in > APIs, flags, environment variables. > > --> In the rare case where an incompatible change is preferred (e.g., code > complexity), we should give a clear 6 months heads up the users that a > breaking change is going to take place. > > --> Breaking changes do not necessitate a major version bump. This is > because we want to allow live upgrades between major versions (e.g., 1.10 > to 2.0). > > --> Compatibility guarantees do not apply to experimental features (incl. > APIs). > > --> We need to have clear documentation about procedure that devs could > follow when deprecating/removing stable features and adding experimental > features. > > --> We need to improve upgrades.md to make it easy for operators to know > what features are deprecated/removed between versions X and Y. > > --> We should decouple internal protos used by Mesos from the unversioned > protos used by driver based frameworks. > > I will spend some time in the next few weeks to create/update the > documentation reflecting these points. > > Anything else I missed? > > Thanks, > > On Sat, Oct 15, 2016 at 11:47 AM, haosdent <haosd...@gmail.com> wrote: > > > Thanks @yan's great inputs! I couldn't agree more almost of them. > > > > > Also the API is not just what the machine reads but all the > documentation > > associated with it, right? It depends on what the documentation says; > what > > the user _should_ expect. > > > > I think different users may have different expectations. And the guy who > > developed the APIs may have different understand from some users as well. > > Our documentations should cover most of cases. > > > > But in case that we didn't or forgot to write it explicitly in the > > document, should we give up to update the API? Just like user Alice said > > this is a BUG while user Bob said this is a feature. I think we still > need > > to raise it case by case to ensure most users are not affected by the > > breaking API changes. > > > > On Sat, Oct 15, 2016 at 6:55 AM, Vinod Kone <vinodk...@apache.org> > wrote: > > > > > We will chat about this in the upcoming community sync (thursday 3 PM). > > > So, please make sure to attend if you are interested. > > > > > > On Fri, Oct 14, 2016 at 3:44 PM, Yan Xu <xuj...@apple.com> wrote: > > > > > >> > > >> On Fri, Oct 14, 2016 at 3:37 PM, Yan Xu <xuj...@apple.com> wrote: > > >> > > >>> Thanks Alex for starting this! > > >>> > > >>> In addition to comments below, I think it'll be helpful to keep the > > >>> existing versioning doc concise and user-friendly while having a > > dedicated > > >>> doc for the "implementation details" where precise requirements and > > >>> procedures go. Maybe some duplication/cross-referencing is needed but > > Mesos > > >>> developers will find the latter much more helpful while the > > users/framework > > >>> developer will find the former easy to read. > > >>> > > >>> e.g., a similar split: > > >>> https://github.com/kubernetes/kubernetes/blob/master/docs/api.md > > >>> https://github.com/kubernetes/kubernetes/blob/master/docs/de > > >>> vel/api_changes.md (which has a lot of details on how the kubernetes > > >>> community is thinking about similar issues, which we can learn from) > > >>> > > >>> Jiang Yan Xu > > >>> > > >>> On Wed, Oct 12, 2016 at 9:34 AM, Alex Rukletsov <a...@mesosphere.com > > > > >>> wrote: > > >>> > > >>>> Folks, > > >>>> > > >>>> There have been a bunch of online [1, 2] and offline discussions > about > > >>>> our > > >>>> deprecation and versioning policy. I found that people—including > > >>>> myself—read the versioning doc [3] differently; moreover some > aspects > > >>>> are > > >>>> not captured there. I would like to start a discussion around this > > >>>> topic by > > >>>> sharing my confusions and suggestions. This will hopefully help us > > stay > > >>>> on > > >>>> the same page and have similar expectations. The second goal is to > > >>>> eliminate ambiguities from the versioning doc (thanks Vinod for > > >>>> volunteering to update it). > > >>>> > > >>> > > >>> +1 Let me know if there are things I can help with. > > >>> > > >>> > > >>>> > > >>>> 1. API vs. semantic changes. > > >>>> Current versioning guide treat features (e.g. flags, metrics, > > endpoints) > > >>>> and API differently: incompatible changes for the former are allowed > > >>>> after > > >>>> 6 month deprecation cycle, while for the latter they require > bumping a > > >>>> major version. I suggest we consolidate these policies. > > >>>> > > >>> > > >>> I feel that the distinction is not API vs. semantic changes, > Backwards > > >>> compatible API guarantee should imply backwards compatible semantics > > (of > > >>> the API). > > >>> i.e., if a change in API doesn't cause the message to be dropped to > the > > >>> floor but leads to behavior change that causes problems in the > system, > > it > > >>> still breaks compatibility. > > >>> > > >>> IMO the distinction is more between: > > >>> - Compatibility between components that are impossible/very > unpleasant > > >>> to upgrade in lockstep - high priority for compatibility guarantee. > > >>> - Compatibility between components that are generally bundled > (modules) > > >>> or things that usually aren't built into automated tooling (e.g., the > > >>> /state endpoint) - more relaxed for now but we should explicitly > > exclude > > >>> them from the guarantee. > > >>> > > >>> > > >>>> > > >>>> We should also define and clearly explain what changes require > bumping > > >>>> the > > >>>> major version. I have no strong opinion here and would love to hear > > what > > >>>> people think. The original motivation for maintaining backwards > > >>>> compatibility is to make sure vN schedulers can correctly work with > vN > > >>>> API > > >>>> without being updated. But what about semantic changes that do not > > touch > > >>>> the API? For example, what if we decide to send less task health > > >>>> updates to > > >>>> schedulers based on some health policy? It influences the flow of > task > > >>>> status updates, should such change be considered compatible? Taking > it > > >>>> to > > >>>> an extreme, we may not even be able to fix some bugs because someone > > may > > >>>> already rely on this behaviour! > > >>>> > > >>> > > >>> API changes should warrant a major version bump. Also the API is not > > >>> just what the machine reads but all the documentation associated with > > it, > > >>> right? It depends on what the documentation says; what the user > > _should_ > > >>> expect. > > >>> > > >>> That said, I feel that these things are hard to be talked about in > the > > >>> abstract. Even with a guideline, we still need to make case-by-case > > >>> decisions. (e.g., has the documentation precisely defined this > precise > > >>> behavior? If not, is it reasonable for the users to expect some > > behavior > > >>> because it's common sense? How bad is it if some behavior just > changes > > a > > >>> tiny bit?) Therefore we need to make sure the process for API changes > > are > > >>> more rigorously defined. > > >>> > > >>> Whether something is a bug depends on whether the API does what it > says > > >>> it'll do. The line may sometimes be blurry but in general I don't > feel > > it's > > >>> a problem. If someone is relying on the behavior that is a bug, we > > should > > >>> still help them fix it but the bug shouldn't count as "our > guarantee". > > >>> > > >>> > > >>>> > > >>>> Another tightly related thing we should explicitly call out is > > >>>> upgradability and rollback capabilities inside a major release. > > >>>> Committing > > >>>> to this may significantly limit what we can change within a major > > >>>> release; > > >>>> on the other side it will give users more time and a better > experience > > >>>> about using and maintaining Mesos clusters. > > >>>> > > >>> > > >>> According to the versioning doc upgradability depends on whether you > > >>> depend on deprecated/removed features. > > >>> > > >>> That paragraph should be explained more precisely: > > >>> - "deprecated" means your system won't break but warnings are shown > > >>> (Maybe we should use some standard deprecation warning keywords so > the > > >>> operator can monitor the log for such warnings! > > >>> - "removed": means it may break. > > >>> > > >>> If you deprecate a flag/env that interface with operator tooling in > the > > >>> next minor release, the operator basically has 6 months from the next > > minor > > >>> release to change the her tooling. I feel this is pretty acceptable. > > >>> If you deprecate a flag/env variable that interface with the > framework > > >>> (executor) in the next minor release, I feel it may not be enough and > > it > > >>> probably warrants a major version bump. So perhaps the API shouldn't > be > > >>> just the protos. > > >>> > > >>> > > >>>> 2. Versioned vs. unversioned protobufs. > > >>>> Currently we have v1 and unnamed protobufs, which simultaneously > mean > > >>>> v0, > > >>>> v2, and internal. I am sometimes confused about what is the right > way > > to > > >>>> update or introduce a field or message there, do people feel the > same? > > >>>> How > > >>>> about splitting the unnamed version into explicit v0, v2, and > > internal? > > >>>> > > >>> > > >>> As haosdent mentioned, we have captured this in MESOS-6268. The > benefit > > >>> is clear but I guess the people will be more motivated when we find > > some v2 > > >>> feature can't be made compatible with the v0 API. (Anand's point > > >>> in MESOS-6016). On the other hand, if we cut v0 API access before > that > > >>> happens (is v0 API obsolete and should be removed 6 months after > 1.0?) > > then > > >>> we don't need to worry about v0 and can use unversioned protos as > > >>> "internal"? > > >>> > > >>> > > >>>> Food for thought. It would be great if we can only maintain "diffs" > to > > >>>> the > > >>>> internal protobufs in the code, instead of duplicating them > > altogether. > > >>>> > > >>>> 3. API and feature labelling. > > >>>> I suggest to introduce explicit labels for API and features, to > ensure > > >>>> users have the right assumptions about the their lifetime while > > >>>> engineers > > >>>> have the ability to change a wip feature in an non-compatible way. I > > >>>> propose the following: > > >>>> API: stable, non-stable, pure (not used by Mesos components) > > >>>> Feature: experimental, normal. > > >>>> > > >>> > > >>> +1 on formalizing the terminologies. > > >>> > > >>> Historically the distinction is not clear for the following: > > >>> > > >>> 1. The API has no compatibility guarantee at all. > > >>> 2. The feature provided by this API is experimental > > >>> > > >> > > >> To add to this point: because 2) logically doesn't apply to the "pure > > >> (not used by Mesos components)" fields in the API, it could be more > > >> confusing and thus require more precise definition. > > >> > > >> > > >>> > > >>> IMO It's OK that we say that we don't distinguish the two (the API > has > > >>> no compatibility guarantee until the feature is fully released) but > we > > have > > >>> to make it clear. > > >>> If we don't make such distinction, ALL API additions should be marked > > as > > >>> unstable first and be changed stable later (as a formal process). > > >>> > > >>> > > >>>> > > >>>> Looking forward to your thoughts and suggestions. > > >>>> AlexR > > >>>> > > >>>> [1] https://www.mail-archive.com/user@mesos.apache.org/ > msg08025.html > > >>>> [2] https://www.mail-archive.com/dev@mesos.apache.org/msg36621.html > > >>>> [3] > > >>>> https://github.com/apache/mesos/blob/b2beef37f6f85a8c75e9681 > > >>>> 36caa7a1f292ba20e/docs/versioning.md > > >>>> > > >>> > > >>> > > >> > > > > > > > > > -- > > Best Regards, > > Haosdent Huang > > > -- Best Regards, Haosdent Huang