Agreed, CI systems are not reliable. Most of failures are related to mis-configuration or devstack problems, not driver problems itself. What happen then, is that people just don't care if there's a red FAILURE in the CIs results. A 4) option would be to rate CIs according to their trustfulness (may be a composition counter of uptime and false negatives), then the developers would pay more attention if they broke a reliable CI. Also, this rating could be used as parameter to decide if a CI should vote or not.
On Thu, Feb 26, 2015 at 5:24 PM, John Griffith <[email protected]> wrote: > > > On Thu, Feb 26, 2015 at 1:10 PM, Walter A. Boring IV <[email protected] > > wrote: > >> Hey folks, >> Today we found out that a patch[1] that was made against our lefthand >> driver caused the driver to fail. The 3rd party CI that we have setup >> to test our driver (hp_lefthand_rest_proxy.py) caught the CI failure and >> reported it correctly[2]. The patch that broke the driver was reviewed and >> approved without a mention of 3rd Party CI failures. >> >> This is a perfect example of 3rd Party CI working perfectly and catching >> a failure, and being completely ignored by everyone >> involved in the review process for the patch >> >> I know that 3rd party CI isn't perfect, and has been ripe with false >> failures, which is one of the reasons why they aren't voting today. >> But, that being said, if patch submitters aren't even looking at the >> failures for CI when they are touching drivers that they don't maintain, >> and reviewers >> aren't looking at the CI failures, then why are we even doing 3rd party >> CI? >> >> Our team is partially responsible for not seeing the failure as well. We >> should be watching the CI failures closely, but we are doing the best we >> can. There are enough patches for Cinder ongoing at any one time, that >> we simply can't watch every single one of them for failures. We did >> eventually >> see that every single patchset in gerrit was now failing against our >> driver, and this is how we caught it. Yes, it was a bit after the fact, >> but we did notice >> it and now have a new patch up that fixes it. So, in that regard 3rd >> party CI did eventually vet out a problem that our team caught. >> >> How can we prevent this in the future? >> 1) Make 3rd party CI voting. I don't believe this is ready yet. >> > > Agreed, look at the history on that driver (and many others) and you'll > see we are in no way ready for that. > > > 2) Authors and reviews need to look at 3rd party CI failures when a patch >> touches a driver. If a failure is seen, contact the CI maintainer and work >> with them and >> see if the failure is related to the patch, if it's not obvious. In this >> case, the failure was obvious. The import changed, and now the package >> can't find the module. >> > > If things were more "stable" yeah, I might, but the reality is as I've > pointed out we have a serious signal to noise ratio problem IMO > > > >> 3) CI maintainers watch every single patchset and report -1's on >> reviews? (ouch) >> 4) ? >> > > Option 4 in my opinion is exactly what I've been doing since this > started. I receive a notification for any change that my CI setup fails > on, and then it's up to me to go and verify if something is truly broken or > if it's my system that's messed up. It's not perfect and it's not really a > true CI but it is a continuous test system which for me is what's most > important here. I completely understand that's not the case for other > things. > > This is where the churn of typo fixes, hacking changes etc can bight us. > Sure, while reviewing that probably could've/should've been caught. The > problem is for me at least if we're going to do this sorts of semantic > changes that touch so many files I'm likely to be half asleep before I get > to the "cinder/volume" section. Doesn't make it right, but kinda how it is. > > Anyway, yeah it sucks, but I'd argue this worked out GREAT. A change was > made that broke things in LHN driver, but historically nobody would've > known until a customer tried to use it. In this case, the problem was > found in less than a day and fixed. That's a pretty huge win in my opinion! > > Thanks, > John > > >> >> >> >> Here is the patch that broke the lefthand driver[1] >> Here is the reported failure in the c-vol log for the patch by our 3rd >> party CI system[2] >> Here is my new patch that fixes the lefthand driver again.[3] >> >> [1] https://review.openstack.org/#/c/145780/ >> [2] http://15.126.198.151/80/145780/15/check/lefthand- >> iscsi-driver-master-client-pip-dsvm/3927e3d/logs/screen- >> c-vol.txt.gz?level=ERROR >> [3] https://review.openstack.org/#/c/159586/ >> >> $0.02 >> Walt >> >> ____________________________________________________________ >> ______________ >> OpenStack Development Mailing List (not for usage questions) >> Unsubscribe: [email protected]?subject: >> unsubscribe >> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev >> > > > __________________________________________________________________________ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: [email protected]?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev > >
__________________________________________________________________________ OpenStack Development Mailing List (not for usage questions) Unsubscribe: [email protected]?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
