Hi Jay, Couple of points.
I support the fact that we need to define what is "success" is. I believe that the metrics that should be used are "Voted +1" and "Skipped". But to certain valid case, I would say that the "Voted -1" is really mostly a metric of bad health of a CI. Most of the -1 are due to environment issue, configuration problem, etc... In my case, the -1 are done manually since I want to avoid giving some extra work to the developer. That are some possible solutions ? On the Jenkins, I think we could develop a script that will parse the result html file. Jenkins will then vote (+1, 0, -1) on the behalf of the 3rd party CI. - It would prevent the abusive +1 - If the result HTML is empty, it would indicate the CI health is bad - if all the result are failing, it would also indicate that CI health is bad Franck Franck Franck On Mon, Jun 30, 2014 at 1:22 PM, Jay Pipes <jaypi...@gmail.com> wrote: > Hi Stackers, > > Some recent ML threads [1] and a hot IRC meeting today [2] brought up some > legitimate questions around how a newly-proposed Stackalytics report page > for Neutron External CI systems [2] represented the results of an external > CI system as "successful" or not. > > First, I want to say that Ilya and all those involved in the Stackalytics > program simply want to provide the most accurate information to developers > in a format that is easily consumed. While there need to be some changes in > how data is shown (and the wording of things like "Tests Succeeded"), I > hope that the community knows there isn't any ill intent on the part of > Mirantis or anyone who works on Stackalytics. OK, so let's keep the > conversation civil -- we're all working towards the same goals of > transparency and accuracy. :) > > Alright, now, Anita and Kurt Taylor were asking a very poignant question: > > "But what does CI tested really mean? just running tests? or tested to > pass some level of requirements?" > > In this nascent world of external CI systems, we have a set of issues that > we need to resolve: > > 1) All of the CI systems are different. > > Some run Bash scripts. Some run Jenkins slaves and devstack-gate scripts. > Others run custom Python code that spawns VMs and publishes logs to some > public domain. > > As a community, we need to decide whether it is worth putting in the > effort to create a single, unified, installable and runnable CI system, so > that we can legitimately say "all of the external systems are identical, > with the exception of the driver code for vendor X being substituted in the > Neutron codebase." > > If the goal of the external CI systems is to produce reliable, consistent > results, I feel the answer to the above is "yes", but I'm interested to > hear what others think. Frankly, in the world of benchmarks, it would be > unthinkable to say "go ahead and everyone run your own benchmark suite", > because you would get wildly different results. A similar problem has > emerged here. > > 2) There is no mediation or verification that the external CI system is > actually testing anything at all > > As a community, we need to decide whether the current system of > self-policing should continue. If it should, then language on reports like > [3] should be very clear that any numbers derived from such systems should > be taken with a grain of salt. Use of the word "Success" should be avoided, > as it has connotations (in English, at least) that the result has been > verified, which is simply not the case as long as no verification or > mediation occurs for any external CI system. > > 3) There is no clear indication of what tests are being run, and therefore > there is no clear indication of what "success" is > > I think we can all agree that a test has three possible outcomes: pass, > fail, and skip. The results of a test suite run therefore is nothing more > than the aggregation of which tests passed, which failed, and which were > skipped. > > As a community, we must document, for each project, what are expected set > of tests that must be run for each merged patch into the project's source > tree. This documentation should be discoverable so that reports like [3] > can be crystal-clear on what the data shown actually means. The report is > simply displaying the data it receives from Gerrit. The community needs to > be proactive in saying "this is what is expected to be tested." This alone > would allow the report to give information such as "External CI system ABC > performed the expected tests. X tests passed. Y tests failed. Z tests were > skipped." Likewise, it would also make it possible for the report to give > information such as "External CI system DEF did not perform the expected > tests.", which is excellent information in and of itself. > > === > > In thinking about the likely answers to the above questions, I believe it > would be prudent to change the Stackalytics report in question [3] in the > following ways: > > a. Change the "Success %" column header to "% Reported +1 Votes" > b. Change the phrase " Green cell - tests ran successfully, red cell - > tests failed" to "Green cell - System voted +1, red cell - System voted -1" > > and then, when we have more and better data (for example, # tests passed, > failed, skipped, etc), we can provide more detailed information than just > "reported +1" or not. > > Thoughts? > > Best, > -jay > > [1] http://lists.openstack.org/pipermail/openstack-dev/2014- > June/038933.html > [2] http://eavesdrop.openstack.org/meetings/third_party/2014/ > third_party.2014-06-30-18.01.log.html > [3] http://stackalytics.com/report/ci/neutron/7 > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >
_______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev