Re: [openstack-dev] [nova] The same SRIOV / NFV CI failures missed a regression, why?
> -Original Message- > From: Jay Pipes [mailto:jaypi...@gmail.com] > Sent: Friday, March 25, 2016 10:20 PM > To: openstack-dev@lists.openstack.org > Subject: Re: [openstack-dev] [nova] The same SRIOV / NFV CI failures missed a > regression, why? > > On 03/24/2016 09:35 AM, Matt Riedemann wrote: > > We have another mitaka-rc-potential bug [1] due to a regression when > > detaching SR-IOV interfaces in the libvirt driver. > > > > There were two NFV CIs that ran on the original change [2]. > > > > Both failed with the same devstack setup error [3][4]. > > > > So it sucks that we have a regression, it sucks that no one watched > > for those CI results before approving the change, and it really sucks > > in this case since it was specifically reported from mellanox for > > sriov which failed in [4]. But it happens. > > > > What I'd like to know is, have the CI problems been fixed? There is a > > change up to fix the regression [5] and this time the Mellanox CI > > check is passing [6]. The Intel NFV CI hasn't reported, but with the > > mellanox one also testing the suspend scenario, it's probably good enough. > > From the commit message of the original patch that introduced the > regression: > > "This fix was tested on a real environment containing the above type of VMs. > test_driver.test_detach_sriov_ports was slightly modified so that the VIF from > which data is sent to _detach_pci_devices will contain the correct SRIOV > values > (pci_slot, vlan and hw_veb VIF type)" > > I'm not sure if the above statement could ever have been true considering the > AttributeError that occurred in the bug... > > In any case, I think that it's pretty clear that the CI systems for NFV and > PCI > have been less than reliable at functionally testing the PCI and NFV-specific > functionality in Nova. > > This isn't trying to put down the people that work on those systems -- I know > first hand that it can be difficult to build and maintain CI systems that > report in > to upstream, and I appreciate the effort that goes into this. > > But, going forward, I think we need to do something as a concerned > community. > > How about this for a proposal? > > 1) We establish a joint lab environment that contains heterogeneous hardware > to which all interested hardware vendors must provide hardware. > > 2) The OpenStack Foundation and the hardware vendors each foot some > portion of the bill to hire 2 or more systems administrators to maintain this > lab > environment. > > 3) The upstream Infrastructure team works with the hired system > administrators to create a single CI system that can spawn functional test > jobs > on the lab hardware and report results back to upstream Gerrit > > Given the will to do this, I think the benefits of more trusted testing > results for > the PCI and SR-IOV/NFV areas would more than make up for the cost. +1 I like this proposal. We can help by providing Mellanox hardware and share our CI knowledge. > > Best, > -jay > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] The same SRIOV / NFV CI failures missed a regression, why?
> -Original Message- > From: Matt Riedemann [mailto:mrie...@linux.vnet.ibm.com] > Sent: Thursday, March 24, 2016 3:35 PM > To: OpenStack Development Mailing List (not for usage questions) d...@lists.openstack.org> > Subject: [openstack-dev] [nova] The same SRIOV / NFV CI failures missed a > regression, why? > > We have another mitaka-rc-potential bug [1] due to a regression when > detaching SR-IOV interfaces in the libvirt driver. > > There were two NFV CIs that ran on the original change [2]. > > Both failed with the same devstack setup error [3][4]. > > So it sucks that we have a regression, it sucks that no one watched for those > CI > results before approving the change, and it really sucks in this case since > it was > specifically reported from mellanox for sriov which failed in [4]. But it > happens. > > What I'd like to know is, have the CI problems been fixed? There is a change > up > to fix the regression [5] and this time the Mellanox CI check is passing [6]. > The > Intel NFV CI hasn't reported, but with the mellanox one also testing the > suspend > scenario, it's probably good enough. Patch-Set 6 of patch [2] passed in Mellanox CI see http://144.76.193.39/ci-artifacts/262341/6/Nova-ML2-Sriov/testr_results.html.gz I am not sure why patch-set 7 failed. At first I thought it was because of that in PS 6 we install oslo.utils==3.4.0 and in PS7 oslo.utils==3.5.0 but I could not find a difference that can be related to this: 2016-02-16 05:05:42.164 7182 ERROR nova File "/usr/local/lib/python2.7/dist-packages/oslo_utils/importutils.py", line 30, in import_class 2016-02-16 05:05:42.164 7182 ERROR nova __import__(mod_str) 2016-02-16 05:05:42.164 7182 ERROR nova ValueError: Empty module name Putting that a side, the Mellanox CI in nova is currently running and passing so the SR-IOV for Ethernet is not broken. The fix in [5] is for our SR-IOV InfiniBand solution. At the moment we only test it in neutron (SR-IOV InfiniBand solution) and the reason for that is that we don't have many physical server to run the CI for nova and neutron. > > [1] https://bugs.launchpad.net/nova/+bug/1560860 > [2] https://review.openstack.org/#/c/262341/ > [3] > http://intel-openstack-ci-logs.ovh/compute- > ci/refs/changes/41/262341/7/compute-nfv- > flavors/20160215_232057/screen/n-sch.log.gz > [4] > http://144.76.193.39/ci-artifacts/262341/7/Nova-ML2-Sriov/logs/n-sch.log.gz > [5] https://review.openstack.org/#/c/296305/ > [6] > http://144.76.193.39/ci-artifacts/296305/1/Nova-ML2- > Sriov/testr_results.html.gz > > -- > > Thanks, > > Matt Riedemann > > > > __ > OpenStack Development Mailing List (not for usage questions) > Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe > http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
Re: [openstack-dev] [nova] The same SRIOV / NFV CI failures missed a regression, why?
On 03/24/2016 09:35 AM, Matt Riedemann wrote: We have another mitaka-rc-potential bug [1] due to a regression when detaching SR-IOV interfaces in the libvirt driver. There were two NFV CIs that ran on the original change [2]. Both failed with the same devstack setup error [3][4]. So it sucks that we have a regression, it sucks that no one watched for those CI results before approving the change, and it really sucks in this case since it was specifically reported from mellanox for sriov which failed in [4]. But it happens. What I'd like to know is, have the CI problems been fixed? There is a change up to fix the regression [5] and this time the Mellanox CI check is passing [6]. The Intel NFV CI hasn't reported, but with the mellanox one also testing the suspend scenario, it's probably good enough. From the commit message of the original patch that introduced the regression: "This fix was tested on a real environment containing the above type of VMs. test_driver.test_detach_sriov_ports was slightly modified so that the VIF from which data is sent to _detach_pci_devices will contain the correct SRIOV values (pci_slot, vlan and hw_veb VIF type)" I'm not sure if the above statement could ever have been true considering the AttributeError that occurred in the bug... In any case, I think that it's pretty clear that the CI systems for NFV and PCI have been less than reliable at functionally testing the PCI and NFV-specific functionality in Nova. This isn't trying to put down the people that work on those systems -- I know first hand that it can be difficult to build and maintain CI systems that report in to upstream, and I appreciate the effort that goes into this. But, going forward, I think we need to do something as a concerned community. How about this for a proposal? 1) We establish a joint lab environment that contains heterogeneous hardware to which all interested hardware vendors must provide hardware. 2) The OpenStack Foundation and the hardware vendors each foot some portion of the bill to hire 2 or more systems administrators to maintain this lab environment. 3) The upstream Infrastructure team works with the hired system administrators to create a single CI system that can spawn functional test jobs on the lab hardware and report results back to upstream Gerrit Given the will to do this, I think the benefits of more trusted testing results for the PCI and SR-IOV/NFV areas would more than make up for the cost. Best, -jay __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
[openstack-dev] [nova] The same SRIOV / NFV CI failures missed a regression, why?
We have another mitaka-rc-potential bug [1] due to a regression when detaching SR-IOV interfaces in the libvirt driver. There were two NFV CIs that ran on the original change [2]. Both failed with the same devstack setup error [3][4]. So it sucks that we have a regression, it sucks that no one watched for those CI results before approving the change, and it really sucks in this case since it was specifically reported from mellanox for sriov which failed in [4]. But it happens. What I'd like to know is, have the CI problems been fixed? There is a change up to fix the regression [5] and this time the Mellanox CI check is passing [6]. The Intel NFV CI hasn't reported, but with the mellanox one also testing the suspend scenario, it's probably good enough. [1] https://bugs.launchpad.net/nova/+bug/1560860 [2] https://review.openstack.org/#/c/262341/ [3] http://intel-openstack-ci-logs.ovh/compute-ci/refs/changes/41/262341/7/compute-nfv-flavors/20160215_232057/screen/n-sch.log.gz [4] http://144.76.193.39/ci-artifacts/262341/7/Nova-ML2-Sriov/logs/n-sch.log.gz [5] https://review.openstack.org/#/c/296305/ [6] http://144.76.193.39/ci-artifacts/296305/1/Nova-ML2-Sriov/testr_results.html.gz -- Thanks, Matt Riedemann __ OpenStack Development Mailing List (not for usage questions) Unsubscribe: openstack-dev-requ...@lists.openstack.org?subject:unsubscribe http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev