On Thu, Jun 26, 2014 at 4:12 AM, Daniel P. Berrange <berra...@redhat.com>
wrote:

> On Thu, Jun 26, 2014 at 07:00:32AM -0400, Sean Dague wrote:
> > While the Trusty transition was mostly uneventful, it has exposed a
> > particular issue in libvirt, which is generating ~ 25% failure rate now
> > on most tempest jobs.
> >
> > As can be seen here -
> >
> https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297
> >
> >
> > ... the libvirt live_snapshot code is something that our test pipeline
> > has never tested before, because it wasn't a new enough libvirt for us
> > to take that path.
> >
> > Right now it's exploding, a lot -
> > https://bugs.launchpad.net/nova/+bug/1334398
> >
> > Snapshotting gets used in Tempest to create images for testing, so image
> > setup tests are doing a decent number of snapshots. If I had to take a
> > completely *wild guess*, it's that libvirt can't do 2 live_snapshots at
> > the same time. It's probably something that most people haven't hit. The
> > wild guess is based on other libvirt issues we've hit that other people
> > haven't, and they are basically always a parallel ops triggered problem.
> >
> > My 'stop the bleeding' suggested fix is this -
> > https://review.openstack.org/#/c/102643/ which just effectively disables
> > this code path for now. Then we can get some libvirt experts engaged to
> > help figure out the right long term fix.
>
> Yes, this is a sensible pragmatic workaround for the short term until
> we diagnose the root cause & fix it.
>
> > I think there are a couple:
> >
> > 1) see if newer libvirt fixes this (1.2.5 just came out), and if so
> > mandate at some known working version. This would actually take a bunch
> > of work to be able to test a non packaged libvirt in our pipeline. We'd
> > need volunteers for that.
> >
> > 2) lock snapshot operations in nova-compute, so that we can only do 1 at
> > a time. Hopefully it's just 2 snapshot operations that is the issue, not
> > any other libvirt op during a snapshot, so serializing snapshot ops in
> > n-compute could put the kid gloves on libvirt and make it not break
> > here. This also needs some volunteers as we're going to be playing a
> > game of progressive serialization until we get to a point where it looks
> > like the failures go away.
> >
> > 3) Roll back to precise. I put this idea here for completeness, but I
> > think it's a terrible choice. This is one isolated, previously untested
> > (by us), code path. We can't stay on libvirt 0.9.6 forever, so actually
> > need to fix this for real (be it in nova's use of libvirt, or libvirt
> > itself).
>
> Yep, since we *never* tested this code path in the gate before, rolling
> back to precise would not even really be a fix for the problem. It would
> merely mean we're not testing the code path again, which is really akin
> to sticking our head in the sand.
>
> > But for right now, we should stop the bleeding, so that nova/libvirt
> > isn't blocking everyone else from merging code.
>
> Agreed, we should merge the hack and treat the bug as release blocker
> to be resolve prior to Juno GA.
>


How can we prevent libvirt issues like this from landing in trunk in the
first place? If we don't figure out a way to prevent this from landing the
first place I fear we will keep repeating this same pattern of failure.


>
> Regards,
> Daniel
> --
> |: http://berrange.com      -o-    http://www.flickr.com/photos/dberrange/
> :|
> |: http://libvirt.org              -o-             http://virt-manager.org
> :|
> |: http://autobuild.org       -o-         http://search.cpan.org/~danberr/
> :|
> |: http://entangle-photo.org       -o-       http://live.gnome.org/gtk-vnc
> :|
>
> _______________________________________________
> OpenStack-dev mailing list
> OpenStack-dev@lists.openstack.org
> http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
>
_______________________________________________
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

Reply via email to