On Thu, Jul 3, 2014 at 6:12 AM, Salvatore Orlando <sorla...@nicira.com> wrote: > Apologies for quoting again the top post of the thread. > > Comments inline (mostly thinking aloud) > Salvatore > > > On 30 June 2014 22:22, Jay Pipes <jaypi...@gmail.com> wrote: >> >> Hi Stackers, >> >> Some recent ML threads [1] and a hot IRC meeting today [2] brought up some >> legitimate questions around how a newly-proposed Stackalytics report page >> for Neutron External CI systems [2] represented the results of an external >> CI system as "successful" or not. >> >> First, I want to say that Ilya and all those involved in the Stackalytics >> program simply want to provide the most accurate information to developers >> in a format that is easily consumed. While there need to be some changes in >> how data is shown (and the wording of things like "Tests Succeeded"), I hope >> that the community knows there isn't any ill intent on the part of Mirantis >> or anyone who works on Stackalytics. OK, so let's keep the conversation >> civil -- we're all working towards the same goals of transparency and >> accuracy. :) >> >> Alright, now, Anita and Kurt Taylor were asking a very poignant question: >> >> "But what does CI tested really mean? just running tests? or tested to >> pass some level of requirements?" >> >> In this nascent world of external CI systems, we have a set of issues that >> we need to resolve: >> >> 1) All of the CI systems are different. >> >> Some run Bash scripts. Some run Jenkins slaves and devstack-gate scripts. >> Others run custom Python code that spawns VMs and publishes logs to some >> public domain. >> >> As a community, we need to decide whether it is worth putting in the >> effort to create a single, unified, installable and runnable CI system, so >> that we can legitimately say "all of the external systems are identical, >> with the exception of the driver code for vendor X being substituted in the >> Neutron codebase." > > > I think such system already exists, and it's documented here: > http://ci.openstack.org/ > Still, understanding it is quite a learning curve, and running it is not > exactly straightforward. But I guess that's pretty much understandable given > the complexity of the system, isn't it? > >> >> >> If the goal of the external CI systems is to produce reliable, consistent >> results, I feel the answer to the above is "yes", but I'm interested to hear >> what others think. Frankly, in the world of benchmarks, it would be >> unthinkable to say "go ahead and everyone run your own benchmark suite", >> because you would get wildly different results. A similar problem has >> emerged here. > > > I don't think the particular infrastructure which might range from an > openstack-ci clone to a 100-line bash script would have an impact on the > "reliability" of the quality assessment regarding a particular driver or > plugin. This is determined, in my opinion, by the quantity and nature of > tests one runs on a specific driver. In Neutron for instance, there is a > wide range of choices - from a few test cases in tempest.api.network to the > full smoketest job. As long there is no minimal standard here, then it would > be difficult to assess the quality of the evaluation from a CI system, > unless we explicitly keep into account coverage into the evaluation. > > On the other hand, different CI infrastructures will have different levels > in terms of % of patches tested and % of infrastructure failures. I think it > might not be a terrible idea to use these parameters to evaluate how good a > CI is from an infra standpoint. However, there are still open questions. For > instance, a CI might have a low patch % score because it only needs to test > patches affecting a given driver. > >> >> 2) There is no mediation or verification that the external CI system is >> actually testing anything at all >> >> As a community, we need to decide whether the current system of >> self-policing should continue. If it should, then language on reports like >> [3] should be very clear that any numbers derived from such systems should >> be taken with a grain of salt. Use of the word "Success" should be avoided, >> as it has connotations (in English, at least) that the result has been >> verified, which is simply not the case as long as no verification or >> mediation occurs for any external CI system. > > > > >> >> 3) There is no clear indication of what tests are being run, and therefore >> there is no clear indication of what "success" is >> >> I think we can all agree that a test has three possible outcomes: pass, >> fail, and skip. The results of a test suite run therefore is nothing more >> than the aggregation of which tests passed, which failed, and which were >> skipped. >> >> As a community, we must document, for each project, what are expected set >> of tests that must be run for each merged patch into the project's source >> tree. This documentation should be discoverable so that reports like [3] can >> be crystal-clear on what the data shown actually means. The report is simply >> displaying the data it receives from Gerrit. The community needs to be >> proactive in saying "this is what is expected to be tested." This alone >> would allow the report to give information such as "External CI system ABC >> performed the expected tests. X tests passed. Y tests failed. Z tests were >> skipped." Likewise, it would also make it possible for the report to give >> information such as "External CI system DEF did not perform the expected >> tests.", which is excellent information in and of itself. >> > > Agreed. In Neutron we have enforced CIs but not yet agreed on what's the > minimum set of tests we expect them to run. I reckon this will be fixed > soon. > This is actually documented here [1] under the "What Tests to Run" section. Perhaps I haven't done enough work to showcase this. But you can clearly see what tests are expected of 3rd party Neutron CI systems there.
[1] https://wiki.openstack.org/wiki/NeutronThirdPartyTesting > I'll try to look at what "SUCCESS" is from a naive standpoint: a CI says > "SUCCESS" if the test suite it rans passed; then one should have means to > understand whether a CI might blatantly lie or tell "half truths". For > instance saying it passes tempest.api.network while > tempest.scenario.test_network_basic_ops has not been executed is a half > truth, in my opinion. > Stackalitycs can help here, I think. One could create "CI classes" according > to how much they're close to the level of the upstream gate, and then parse > results posted to classify CIs. Now, before cursing me, I totally understand > that this won't be easy at all to implement! > Furthermore, I don't know whether how this should be reflected in gerrit. > >> >> === >> >> In thinking about the likely answers to the above questions, I believe it >> would be prudent to change the Stackalytics report in question [3] in the >> following ways: >> >> a. Change the "Success %" column header to "% Reported +1 Votes" >> b. Change the phrase " Green cell - tests ran successfully, red cell - >> tests failed" to "Green cell - System voted +1, red cell - System voted -1" > > > That makes sense to me. > >> >> >> and then, when we have more and better data (for example, # tests passed, >> failed, skipped, etc), we can provide more detailed information than just >> "reported +1" or not. > > > I think it should not be too hard to start adding minimal measures such as > "% of voted patches" >> >> >> Thoughts? >> >> Best, >> -jay >> >> [1] >> http://lists.openstack.org/pipermail/openstack-dev/2014-June/038933.html >> [2] >> http://eavesdrop.openstack.org/meetings/third_party/2014/third_party.2014-06-30-18.01.log.html >> [3] http://stackalytics.com/report/ci/neutron/7 >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev