Excerpts from Matthew Treinish's message of 2016-06-16 13:56:31 -0400: > On Thu, Jun 16, 2016 at 12:59:41PM -0400, Doug Hellmann wrote: > > Excerpts from Matthew Treinish's message of 2016-06-15 19:27:13 -0400: > > > On Wed, Jun 15, 2016 at 09:10:30AM -0400, Doug Hellmann wrote: > > > > Excerpts from Chris Hoge's message of 2016-06-14 16:37:06 -0700: > > > > > Top posting one note and direct comments inline, I’m proposing > > > > > this as a member of the DefCore working group, but this > > > > > proposal itself has not been accepted as the forward course of > > > > > action by the working group. These are my own views as the > > > > > administrator of the program and not that of the working group > > > > > itself, which may independently reject the idea outside of the > > > > > response from the upstream devs. > > > > > > > > > > I posted a link to this thread to the DefCore mailing list to make > > > > > that working group aware of the outstanding issues. > > > > > > > > > > > On Jun 14, 2016, at 3:50 PM, Matthew Treinish > > > > > > <[email protected]> wrote: > > > > > > > > > > > > On Tue, Jun 14, 2016 at 05:42:16PM -0400, Doug Hellmann wrote: > > > > > >> Excerpts from Matthew Treinish's message of 2016-06-14 15:12:45 > > > > > >> -0400: > > > > > >>> On Tue, Jun 14, 2016 at 02:41:10PM -0400, Doug Hellmann wrote: > > > > > >>>> Excerpts from Matthew Treinish's message of 2016-06-14 14:21:27 > > > > > >>>> -0400: > > > > > >>>>> On Tue, Jun 14, 2016 at 10:57:05AM -0700, Chris Hoge wrote: > > > > > >>>>>> Last year, in response to Nova micro-versioning and extension > > > > > >>>>>> updates[1], > > > > > >>>>>> the QA team added strict API schema checking to Tempest to > > > > > >>>>>> ensure that > > > > > >>>>>> no additional properties were added to Nova API > > > > > >>>>>> responses[2][3]. In the > > > > > >>>>>> last year, at least three vendors participating the the > > > > > >>>>>> OpenStack Powered > > > > > >>>>>> Trademark program have been impacted by this change, two of > > > > > >>>>>> which > > > > > >>>>>> reported this to the DefCore Working Group mailing list > > > > > >>>>>> earlier this year[4]. > > > > > >>>>>> > > > > > >>>>>> The DefCore Working Group determines guidelines for the > > > > > >>>>>> OpenStack Powered > > > > > >>>>>> program, which includes capabilities with associated > > > > > >>>>>> functional tests > > > > > >>>>>> from Tempest that must be passed, and designated sections with > > > > > >>>>>> associated > > > > > >>>>>> upstream code [5][6]. In determining these guidelines, the > > > > > >>>>>> working group > > > > > >>>>>> attempts to balance the future direction of development with > > > > > >>>>>> lagging > > > > > >>>>>> indicators of deployments and user adoption. > > > > > >>>>>> > > > > > >>>>>> After a tremendous amount of consideration, I believe that the > > > > > >>>>>> DefCore > > > > > >>>>>> Working Group needs to implement a temporary waiver for the > > > > > >>>>>> strict API > > > > > >>>>>> checking requirements that were introduced last year, to give > > > > > >>>>>> downstream > > > > > >>>>>> deployers more time to catch up with the strict > > > > > >>>>>> micro-versioning > > > > > >>>>>> requirements determined by the Nova/Compute team and enforced > > > > > >>>>>> by the > > > > > >>>>>> Tempest/QA team. > > > > > >>>>> > > > > > >>>>> I'm very much opposed to this being done. If we're actually > > > > > >>>>> concerned with > > > > > >>>>> interoperability and verify that things behave in the same > > > > > >>>>> manner between multiple > > > > > >>>>> clouds then doing this would be a big step backwards. The > > > > > >>>>> fundamental disconnect > > > > > >>>>> here is that the vendors who have implemented out of band > > > > > >>>>> extensions or were > > > > > >>>>> taking advantage of previously available places to inject extra > > > > > >>>>> attributes > > > > > >>>>> believe that doing so means they're interoperable, which is > > > > > >>>>> quite far from > > > > > >>>>> reality. **The API is not a place for vendor differentiation.** > > > > > >>>> > > > > > >>>> This is a temporary measure to address the fact that a large > > > > > >>>> number > > > > > >>>> of existing tests changed their behavior, rather than having new > > > > > >>>> tests added to enforce this new requirement. The result is > > > > > >>>> deployments > > > > > >>>> that previously passed these tests may no longer pass, and in > > > > > >>>> fact > > > > > >>>> we have several cases where that's true with deployers who are > > > > > >>>> trying to maintain their own standard of backwards-compatibility > > > > > >>>> for their end users. > > > > > >>> > > > > > >>> That's not what happened though. The API hasn't changed and the > > > > > >>> tests haven't > > > > > >>> really changed either. We made our enforcement on Nova's APIs a > > > > > >>> bit stricter to > > > > > >>> ensure nothing unexpected appeared. For the most these tests work > > > > > >>> on any version > > > > > >>> of OpenStack. (we only test it in the gate on supported stable > > > > > >>> releases, but I > > > > > >>> don't expect things to have drastically shifted on older > > > > > >>> releases) It also > > > > > >>> doesn't matter which version of the API you run, v2.0 or v2.1. > > > > > >>> Literally, the > > > > > >>> only case it ever fails is when you run something extra, not from > > > > > >>> the community, > > > > > >>> either as an extension (which themselves are going away [1]) or > > > > > >>> another service > > > > > >>> that wraps nova or imitates nova. I'm personally not comfortable > > > > > >>> saying those > > > > > >>> extras are ever part of the OpenStack APIs. > > > > > >>> > > > > > >>>> We have basically three options. > > > > > >>>> > > > > > >>>> 1. Tell deployers who are trying to do the right for their > > > > > >>>> immediate > > > > > >>>> users that they can't use the trademark. > > > > > >>>> > > > > > >>>> 2. Flag the related tests or remove them from the DefCore > > > > > >>>> enforcement > > > > > >>>> suite entirely. > > > > > >>>> > > > > > >>>> 3. Be flexible about giving consumers of Tempest time to meet the > > > > > >>>> new requirement by providing a way to disable the checks. > > > > > >>>> > > > > > >>>> Option 1 goes against our own backwards compatibility policies. > > > > > >>> > > > > > >>> I don't think backwards compatibility policies really apply to > > > > > >>> what what define > > > > > >>> as the set of tests that as a community we are saying a vendor > > > > > >>> has to pass to > > > > > >>> say they're OpenStack. From my perspective as a community we > > > > > >>> either take a hard > > > > > >>> stance on this and say to be considered an interoperable cloud > > > > > >>> (and to get the > > > > > >>> trademark) you have to actually have an interoperable product. We > > > > > >>> slowly ratchet > > > > > >>> up the requirements every 6 months, there isn't any implied > > > > > >>> backwards > > > > > >>> compatibility in doing that. You passed in the past but not in > > > > > >>> the newer stricter > > > > > >>> guidelines. > > > > > >>> > > > > > >>> Also, even if I did think it applied, we're not talking about a > > > > > >>> change which > > > > > >>> would fall into breaking that. The change was introduced a year > > > > > >>> and half ago > > > > > >>> during kilo and landed a year ago during liberty: > > > > > >>> > > > > > >>> https://review.openstack.org/#/c/156130/ > > > > > >>> > > > > > >>> That's way longer than our normal deprecation period of 3 months > > > > > >>> and a release > > > > > >>> boundary. > > > > > >>> > > > > > >>>> > > > > > >>>> Option 2 gives us no winners and actually reduces the > > > > > >>>> interoperability > > > > > >>>> guarantees we already have in place. > > > > > >>>> > > > > > >>>> Option 3 applies our usual community standard of slowly rolling > > > > > >>>> forward while maintaining compatibility as broadly as possible. > > > > > >>> > > > > > >>> Except in this case there isn't actually any compatibility being > > > > > >>> maintained. > > > > > >>> We're saying that we can't make the requirements for > > > > > >>> interoperability testing > > > > > >>> stricter until all the vendors who were passing in the past are > > > > > >>> able to pass > > > > > >>> the stricter version. > > > > > >>> > > > > > >>>> > > > > > >>>> No one is suggesting that a permanent, or even open-ended, > > > > > >>>> exception > > > > > >>>> be granted. > > > > > >>> > > > > > >>> Sure, I agree an permanent or open-ended exception would be even > > > > > >>> worse. But, I > > > > > >>> still think as a community we need to draw a hard line in the > > > > > >>> sand here. Just > > > > > >>> because this measure is temporary doesn't make it any more > > > > > >>> palatable. > > > > > >>> > > > > > >>> By doing this, even as a temporary measure, we're saying it's ok > > > > > >>> to call things > > > > > >>> an OpenStack API when you add random gorp to the responses. Which > > > > > >>> is something we've > > > > > >>> very clearly said as a community is the exact opposite of the > > > > > >>> case, which the > > > > > >>> testing reflects. I still contend just because some vendors were > > > > > >>> running old > > > > > >>> versions of tempest and old versions of openstack where their > > > > > >>> incompatible API > > > > > >>> changes weren't caught doesn't mean they should be given pass now. > > > > > >> > > > > > >> Nobody is saying random gorp is OK, and I'm not sure "line in the > > > > > >> sand" rhetoric is really constructive. The issue is not with the > > > > > >> nature of the API policies, it's with the implementation of those > > > > > >> policies and how they were rolled out. > > > > > >> > > > > > >> DefCore defines its rules using named tests in Tempest. If these > > > > > >> new enforcement policies had been applied by adding new tests to > > > > > >> Tempest, then DefCore could have added them using its processes > > > > > >> over a period of time and we wouldn't have had any issues. That's > > > > > >> not what happened. Instead, the behavior of a bunch of *existing* > > > > > >> tests changed. As a result, deployments that have not changed fail > > > > > >> tests that they used to pass, without any action being taken on the > > > > > >> deployer's part. We've moved the goal posts on our users in a way > > > > > >> that was not easily discoverable, because it couldn't be tracked > > > > > >> through the (admittedly limited) process we have in place for doing > > > > > >> that tracking. > > > > > >> > > > > > >> So, we want a way to get the test results back to their existing > > > > > >> status, which will then let us roll adoption forward smoothly > > > > > >> instead > > > > > >> of lurching from "pass" to "fail" to "pass". > > > > > > > > > > > > It doesn't have to be a bright line pass or fail. My primary > > > > > > concern here is > > > > > > that making this change is basically saying we're going to let > > > > > > things "pass" > > > > > > when running out of tree stuff that's adding arbitrary fields to > > > > > > the response. This > > > > > > isn't really interoperable and isn't being honest with what the > > > > > > vendor clouds are > > > > > > actually doing. It would hide the truth from the people who rely on > > > > > > these results > > > > > > to determine interoperability. The proposal as I read it (and maybe > > > > > > it's my > > > > > > misconception) was to mask this and vendor clouds "pass" until they > > > > > > can fix it, > > > > > > which essentially hides the issue. Especially given there are a lot > > > > > > of clouds and > > > > > > products that don't have any issue here. > > > > > > > > > > The opposite is the intention of this proposal. It’s a compromise > > > > > that admits > > > > > that since the introduction of the OpenStack Powered program, and the > > > > > release > > > > > of this strict checking on additional properties, vendors that once > > > > > passed > > > > > now fail, and the incentives to force that change didn’t start being > > > > > felt until > > > > > they hit their product renewal cycle. > > > > > > > > > > It’s not trying to mask anything, to the contrary by bringing it up > > > > > here and > > > > > stating their public test results would indicate which APIs send > > > > > additional > > > > > properties back, it’s shining a light on the issue and publicly > > > > > stating that it’s > > > > > not an acceptable long-term solution. > > > > > > > > > > > But, if we add another possible state on the defcore side like > > > > > > conditional pass, > > > > > > warning, yellow, etc. (the name doesn't matter) which is used to > > > > > > indicate that > > > > > > things on product X could only pass when strict validation was > > > > > > disabled (and > > > > > > be clear about where and why) then my concerns would be alleviated. > > > > > > I just do > > > > > > not want this to end up not being visible to end users trying to > > > > > > evaluate > > > > > > interoperability of different clouds using the test results. > > > > > > > > > > The OpenStack Marketplace is where these comparisons would happen, > > > > > and the APIs with additional response data would be stated. > > > > > > > > > > >> > > > > > >> We should, separately, address the process issues and the > > > > > >> limitations > > > > > >> this situation has exposed. That may mean changing the way DefCore > > > > > >> defines its policies, or tracks things, or uses Tempest. For > > > > > >> example, in the future, we may want tie versions of Tempest to > > > > > >> versions of the trademark more closely, so that it's possible for > > > > > >> someone running the Mitaka version of OpenStack to continue to use > > > > > >> the Mitaka version of Tempest and not have to upgrade Tempest in > > > > > >> order to retain their trademark (maybe that's how it already > > > > > >> works?). > > > > > > > > > > > > Tempest master supports all currently supported stable branches. So > > > > > > right now > > > > > > any commit to master is tested against a master cloud, a mitaka > > > > > > cloud, and a > > > > > > liberty cloud in the gate. We tag/push a release whenever we add or > > > > > > drop support > > > > > > for a release, the most recent being dropping kilo. [1][2] That > > > > > > being said the > > > > > > openstack apis **should** be backwards compatible so ideally master > > > > > > tempest would > > > > > > work fine on older clouds. (although this might not be reality) The > > > > > > primary > > > > > > wrinkle here are the tests which would depend on feature flags to > > > > > > indicate it's > > > > > > availability on newer versions. We eventually remove flags after > > > > > > all supported > > > > > > releases have a given feature. But, this can be worked around with > > > > > > test > > > > > > selection. (ie don't even try to run tests that require a feature > > > > > > juno didn’t > > > > > > have) > > > > > > > > > > The current active guidelines cover icehouse through mitaka. The > > > > > release > > > > > of 2016.08 will change that to cover juno through mitaka (with newton > > > > > as an add-on to 2016.08 when it’s released). There’s overlap between > > > > > the guidelines, so 2016.01 covers juno through mitaka while 2016.08 > > > > > will cover kilo through newton. Essentially two years of releases. > > > > > > > > > > >> We may also need to consider that test implementation details may > > > > > >> change, and have a review process within DefCore to help expose > > > > > >> those changes to make them clearer to deployers. > > > > > >> > > > > > >> Fixing the process issue may also mean changing the way we > > > > > >> implement > > > > > >> things in Tempest. In this case, adding a flag helps move ahead > > > > > >> more smoothly. Perhaps we adopt that as a general policy in the > > > > > >> future when we make underlying behavioral changes like this to > > > > > >> existing tests. Perhaps instead we have a policy that we do not > > > > > >> change the behavior of existing tests in such significant ways, at > > > > > >> least if they're tagged as being used by DefCore. I don't know -- > > > > > >> those are things we need to discuss. > > > > > > > > > > > > Sure I agree, this thread raises larger issues which need to be > > > > > > figured out. > > > > > > But, that is probably an independent discussion. > > > > > > > > > > I’m beginning to wonder if we need to make DefCore use release > > > > > branches then back-port bug-fixes and relevant features additions > > > > > as necessary. > > > > > > > > We should definitely have that conversation, to understand what > > > > effect it would have both on Tempest and on DefCore. > > > > > > > > > > While from a quick glance this would seem like it would solve some of the > > > problems when you start to dig into it you'll see that it actually > > > wouldn't, > > > and would just end up causing more issues in the long run. Branchless > > > tempest > > > was originally started back at the icehouse release and was implemented to > > > actually enforce the API is the same across release boundaries. We had > > > hit many > > > > The guarantees we're trying to make in our CI system and the needs > > DefCore has are slightly different in this regard. It sounds like > > they're still needing to test against versions that we're no longer > > supporting, while also avoiding changing the rules on those older > > clouds. > > Right, the crux of the problem here is defcore is trying to support something > we > stopped supporting in the community. However, the actual thing being checked > in > both use cases is actually the same; the API is the same regardless of the > cloud run against. (which includes different versions as well as different > deployment choices) It's just a conflict between our upstream support windows > and what defcore says they support. > > > > > I don't think it's appropriate to create stable/$series branches > > in the Tempest repository, for all of the reasons you stated in > > your email. It might be appropriate to create defcore/$version > > branches, if we think we need to support backporting changes for > > some reason. If not, simply creating defcore-$version tags would > > give them a way to get a consistent version of Tempest that worked > > with older versions of OpenStack. > > This actually doesn't solve the problem, which is what my second paragraph > addressed (which got lost in the snip) and is where my issue with doing > branching or custom tagging lies. When we tag a release to mark a stable > branches EOL there isn't any infrastructure to run tests against that branch > at > all anymore. It's gone, the stable branches of the projects are deleted, we > remove the devstack branch, the g-r branch, etc. all the workarounds we had to > put in place to keep things working over the stable support window go away. > That's something we're never going to ever maintain after a branches EOL. The > only point to doing a separate branch would be to support running against an > EOL > branch, but you couldn't actually test that, you'd just be merging "backports" > blindly. That's *not* something we do in openstack. All the releases where we > have support master tempest as well as past tags support running against those > clouds. > > There also isn't a reason to add additional tags, because we already have the > support milestones tagged. What defcore should be doing is specifying a > version > range (well really just a min version) to match up with what they say is ok > to be running. > > So if they want the LCD for kilo, liberty, and mitaka it would be: > > http://git.openstack.org/cgit/openstack/tempest/tag/?h=12.0.0 > > for juno, kilo, and liberty it would be: > > http://git.openstack.org/cgit/openstack/tempest/tag/?h=8 > > But, as I said in an earlier email the API shouldn't really be changing under > this model (and even if it did things would not diverge very quickly) So: > > http://git.openstack.org/cgit/openstack/tempest/tag/?h=11.0.0 > > will likely works against juno, kilo, liberty, and mitaka.[1] The only thing > that > would potentially be missing are feature flags in the tempest config to skip > tests for features that didn't exist in juno.[2] However, we just can't test > it > against juno because that branch was EOL when 11.0.0 was pushed and the > infrastrucutre for running against juno was gone. > > The reverse also should be true, and old versions of tempest should work fine > against newer clouds, we just can't and don't test that. What we outline and I > try to make very clear in the release notes is that when we say supports a > version that means testing against it in the gate. If the API is truly a > stable > interface then it should work against any cloud, aside from the new features > thing I mentioned before. (which by the way is why microversions are awesome, > because it solves that problem) > > [1] It's also worth noting that the strict API validation which prompted this > thread was included in all of these releases. It was verified working on > kilo, juno, and **icehouse** before it could land: > > https://review.openstack.org/#/c/156130/ > > [2] But, that wouldn't actually matter for the defcore use case because they > specify running a subset of tests that by definition can't include those. > (otherwise they wouldn't actually support juno) > > > > > There shouldn't ever be a need to run those older versions of Tempest > > with newer clouds, and we should ensure there is a policy that > > validation must happen using a version of Tempest no older than the > > version of OpenStack to ensure that as we move ahead with new > > capabilities, compatibility checks, etc. new deployments are validated > > properly. > > As someone running defcore on their product trying to get the certification > this > is probably true. So they should be setting a min version for passing the > certification. Which they do: > > https://git.openstack.org/cgit/openstack/defcore/tree/2016.01.json#n111 > > It's just shown as the sha1 not tempest 4: > > http://git.openstack.org/cgit/openstack/tempest/tag/?h=4 > > But for developing tempest (even for a hypothetical defcore branch of tempest) > it is not. You need to be able to use old clients with new versions of the > projects otherwise you've failed in your goal of maintaining API stability and > interoperability the code should be verified against all the versions you're > supporting. > > > -Matt Treinish
I think all of that is saying something like what I was proposing, except that the tags they need already exist. Is that right? I don't think DefCore actually needs to change old versions of Tempest, but maybe Chris or Mark can verify that? Doug __________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
