On 07/03/2014 02:33 PM, Kevin Benton wrote: > Maybe we can require period checks against the head of the master > branch (which should always pass) and build statistics based on the results > of that. I like this suggestion. I really like this suggestion.
Hmmmm, what to do with a good suggestion? I wonder if we could capture it in an infra-spec and work on it from there. Would you feel comfortable offering a draft as an infra-spec and then perhaps we can discuss the design through the spec? What do you think? Thanks Kevin, Anita. > Otherwise it seems like we have to take a CI system's word for it > that a particular patch indeed broke that system. > > -- > Kevin Benton > > > On Thu, Jul 3, 2014 at 11:07 AM, Anita Kuno <ante...@anteaya.info> wrote: > >> On 07/03/2014 01:27 PM, Kevin Benton wrote: >>>> This allows the viewer to see categories of reviews based upon their >>>> divergence from OpenStack's Jenkins results. I think evaluating >>>> divergence from Jenkins might be a metric worth consideration. >>> >>> I think the only thing this really reflects though is how much the third >>> party CI system is mirroring Jenkins. >>> A system that frequently diverges may be functioning perfectly fine and >>> just has a vastly different code path that it is integration testing so >> it >>> is legitimately detecting failures the OpenStack CI cannot. >> Great. >> >> How do we measure the degree to which it is legitimately detecting >> failures? >> >> Thanks Kevin, >> Anita. >>> >>> -- >>> Kevin Benton >>> >>> >>> On Thu, Jul 3, 2014 at 6:49 AM, Anita Kuno <ante...@anteaya.info> wrote: >>> >>>> On 07/03/2014 07:12 AM, Salvatore Orlando wrote: >>>>> Apologies for quoting again the top post of the thread. >>>>> >>>>> Comments inline (mostly thinking aloud) >>>>> Salvatore >>>>> >>>>> >>>>> On 30 June 2014 22:22, Jay Pipes <jaypi...@gmail.com> wrote: >>>>> >>>>>> Hi Stackers, >>>>>> >>>>>> Some recent ML threads [1] and a hot IRC meeting today [2] brought up >>>> some >>>>>> legitimate questions around how a newly-proposed Stackalytics report >>>> page >>>>>> for Neutron External CI systems [2] represented the results of an >>>> external >>>>>> CI system as "successful" or not. >>>>>> >>>>>> First, I want to say that Ilya and all those involved in the >>>> Stackalytics >>>>>> program simply want to provide the most accurate information to >>>> developers >>>>>> in a format that is easily consumed. While there need to be some >>>> changes in >>>>>> how data is shown (and the wording of things like "Tests Succeeded"), >> I >>>>>> hope that the community knows there isn't any ill intent on the part >> of >>>>>> Mirantis or anyone who works on Stackalytics. OK, so let's keep the >>>>>> conversation civil -- we're all working towards the same goals of >>>>>> transparency and accuracy. :) >>>>>> >>>>>> Alright, now, Anita and Kurt Taylor were asking a very poignant >>>> question: >>>>>> >>>>>> "But what does CI tested really mean? just running tests? or tested to >>>>>> pass some level of requirements?" >>>>>> >>>>>> In this nascent world of external CI systems, we have a set of issues >>>> that >>>>>> we need to resolve: >>>>>> >>>>>> 1) All of the CI systems are different. >>>>>> >>>>>> Some run Bash scripts. Some run Jenkins slaves and devstack-gate >>>> scripts. >>>>>> Others run custom Python code that spawns VMs and publishes logs to >> some >>>>>> public domain. >>>>>> >>>>>> As a community, we need to decide whether it is worth putting in the >>>>>> effort to create a single, unified, installable and runnable CI >> system, >>>> so >>>>>> that we can legitimately say "all of the external systems are >> identical, >>>>>> with the exception of the driver code for vendor X being substituted >> in >>>> the >>>>>> Neutron codebase." >>>>>> >>>>> >>>>> I think such system already exists, and it's documented here: >>>>> http://ci.openstack.org/ >>>>> Still, understanding it is quite a learning curve, and running it is >> not >>>>> exactly straightforward. But I guess that's pretty much understandable >>>>> given the complexity of the system, isn't it? >>>>> >>>>> >>>>>> >>>>>> If the goal of the external CI systems is to produce reliable, >>>> consistent >>>>>> results, I feel the answer to the above is "yes", but I'm interested >> to >>>>>> hear what others think. Frankly, in the world of benchmarks, it would >> be >>>>>> unthinkable to say "go ahead and everyone run your own benchmark >> suite", >>>>>> because you would get wildly different results. A similar problem has >>>>>> emerged here. >>>>>> >>>>> >>>>> I don't think the particular infrastructure which might range from an >>>>> openstack-ci clone to a 100-line bash script would have an impact on >> the >>>>> "reliability" of the quality assessment regarding a particular driver >> or >>>>> plugin. This is determined, in my opinion, by the quantity and nature >> of >>>>> tests one runs on a specific driver. In Neutron for instance, there is >> a >>>>> wide range of choices - from a few test cases in tempest.api.network to >>>> the >>>>> full smoketest job. As long there is no minimal standard here, then it >>>>> would be difficult to assess the quality of the evaluation from a CI >>>>> system, unless we explicitly keep into account coverage into the >>>> evaluation. >>>>> >>>>> On the other hand, different CI infrastructures will have different >>>> levels >>>>> in terms of % of patches tested and % of infrastructure failures. I >> think >>>>> it might not be a terrible idea to use these parameters to evaluate how >>>>> good a CI is from an infra standpoint. However, there are still open >>>>> questions. For instance, a CI might have a low patch % score because it >>>>> only needs to test patches affecting a given driver. >>>>> >>>>> >>>>>> 2) There is no mediation or verification that the external CI system >> is >>>>>> actually testing anything at all >>>>>> >>>>>> As a community, we need to decide whether the current system of >>>>>> self-policing should continue. If it should, then language on reports >>>> like >>>>>> [3] should be very clear that any numbers derived from such systems >>>> should >>>>>> be taken with a grain of salt. Use of the word "Success" should be >>>> avoided, >>>>>> as it has connotations (in English, at least) that the result has been >>>>>> verified, which is simply not the case as long as no verification or >>>>>> mediation occurs for any external CI system. >>>>>> >>>>> >>>>> >>>>> >>>>> >>>>>> 3) There is no clear indication of what tests are being run, and >>>> therefore >>>>>> there is no clear indication of what "success" is >>>>>> >>>>>> I think we can all agree that a test has three possible outcomes: >> pass, >>>>>> fail, and skip. The results of a test suite run therefore is nothing >>>> more >>>>>> than the aggregation of which tests passed, which failed, and which >> were >>>>>> skipped. >>>>>> >>>>>> As a community, we must document, for each project, what are expected >>>> set >>>>>> of tests that must be run for each merged patch into the project's >>>> source >>>>>> tree. This documentation should be discoverable so that reports like >> [3] >>>>>> can be crystal-clear on what the data shown actually means. The report >>>> is >>>>>> simply displaying the data it receives from Gerrit. The community >> needs >>>> to >>>>>> be proactive in saying "this is what is expected to be tested." This >>>> alone >>>>>> would allow the report to give information such as "External CI system >>>> ABC >>>>>> performed the expected tests. X tests passed. Y tests failed. Z tests >>>> were >>>>>> skipped." Likewise, it would also make it possible for the report to >>>> give >>>>>> information such as "External CI system DEF did not perform the >> expected >>>>>> tests.", which is excellent information in and of itself. >>>>>> >>>>>> >>>>> Agreed. In Neutron we have enforced CIs but not yet agreed on what's >> the >>>>> minimum set of tests we expect them to run. I reckon this will be fixed >>>>> soon. >>>>> >>>>> I'll try to look at what "SUCCESS" is from a naive standpoint: a CI >> says >>>>> "SUCCESS" if the test suite it rans passed; then one should have means >> to >>>>> understand whether a CI might blatantly lie or tell "half truths". For >>>>> instance saying it passes tempest.api.network while >>>>> tempest.scenario.test_network_basic_ops has not been executed is a half >>>>> truth, in my opinion. >>>>> Stackalitycs can help here, I think. One could create "CI classes" >>>>> according to how much they're close to the level of the upstream gate, >>>> and >>>>> then parse results posted to classify CIs. Now, before cursing me, I >>>>> totally understand that this won't be easy at all to implement! >>>>> Furthermore, I don't know whether how this should be reflected in >> gerrit. >>>>> >>>>> >>>>>> === >>>>>> >>>>>> In thinking about the likely answers to the above questions, I believe >>>> it >>>>>> would be prudent to change the Stackalytics report in question [3] in >>>> the >>>>>> following ways: >>>>>> >>>>>> a. Change the "Success %" column header to "% Reported +1 Votes" >>>>>> b. Change the phrase " Green cell - tests ran successfully, red cell - >>>>>> tests failed" to "Green cell - System voted +1, red cell - System >> voted >>>> -1" >>>>>> >>>>> >>>>> That makes sense to me. >>>>> >>>>> >>>>>> >>>>>> and then, when we have more and better data (for example, # tests >>>> passed, >>>>>> failed, skipped, etc), we can provide more detailed information than >>>> just >>>>>> "reported +1" or not. >>>>>> >>>>> >>>>> I think it should not be too hard to start adding minimal measures such >>>> as >>>>> "% of voted patches" >>>>> >>>>>> >>>>>> Thoughts? >>>>>> >>>>>> Best, >>>>>> -jay >>>>>> >>>>>> [1] http://lists.openstack.org/pipermail/openstack-dev/2014- >>>>>> June/038933.html >>>>>> [2] http://eavesdrop.openstack.org/meetings/third_party/2014/ >>>>>> third_party.2014-06-30-18.01.log.html >>>>>> [3] http://stackalytics.com/report/ci/neutron/7 >>>>>> >>>>>> _______________________________________________ >>>>>> OpenStack-dev mailing list >>>>>> OpenStack-dev@lists.openstack.org >>>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>>> >>>>> >>>>> >>>>> >>>>> _______________________________________________ >>>>> OpenStack-dev mailing list >>>>> OpenStack-dev@lists.openstack.org >>>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>>> >>>> Thanks for sharing your thoughts, Salvadore. >>>> >>>> Some additional things to look at: >>>> >>>> Sean Dague has created a tool in stackforge gerrit-dash-creator: >>>> >>>> >> http://git.openstack.org/cgit/stackforge/gerrit-dash-creator/tree/README.rst >>>> which has the ability to make interesting queries on gerrit results. One >>>> such example can be found here: http://paste.openstack.org/show/85416/ >>>> (Note when this url was created there was a bug in the syntax and this >>>> url works in chrome but not firefox, Sean tells me the firefox bug has >>>> been addressed - though this url hasn't been altered with the new syntax >>>> yet) >>>> >>>> This allows the viewer to see categories of reviews based upon their >>>> divergence from OpenStack's Jenkins results. I think evaluating >>>> divergence from Jenkins might be a metric worth consideration. >>>> >>>> Also a gui representation worth looking at is Mikal Still's gui for >>>> Neutron ci health: >>>> http://www.rcbops.com/gerrit/reports/neutron-cireport.html >>>> and Nova ci health: >>>> http://www.rcbops.com/gerrit/reports/nova-cireport.html >>>> >>>> I don't know the details of how the graphs are calculated in these >>>> pages, but being able to view passed/failed/missed and compare them to >>>> Jenkins is an interesting approach and I feel has some merit. >>>> >>>> Thanks I think we are getting some good information out in this thread >>>> and look forward to hearing more thoughts. >>>> >>>> Thank you, >>>> Anita. >>>> >>>> _______________________________________________ >>>> OpenStack-dev mailing list >>>> OpenStack-dev@lists.openstack.org >>>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>>> >>> >>> >>> >>> >>> >>> _______________________________________________ >>> OpenStack-dev mailing list >>> OpenStack-dev@lists.openstack.org >>> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >>> >> >> >> _______________________________________________ >> OpenStack-dev mailing list >> OpenStack-dev@lists.openstack.org >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > > > > _______________________________________________ > OpenStack-dev mailing list > OpenStack-dev@lists.openstack.org > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > _______________________________________________ OpenStack-dev mailing list OpenStack-dev@lists.openstack.org http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev