Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-15 Thread Alex Xu
Question about swap volume, swap volume's implementation is very similar 
with live snapshot.
Both implemented by blockRebase. But swap volume didn't check any 
libvirt and qemu version.
Should we add version check for swap_volume now? That means swap_volume 
will be disable also.


On 2014?06?26? 19:00, Sean Dague wrote:

While the Trusty transition was mostly uneventful, it has exposed a
particular issue in libvirt, which is generating ~ 25% failure rate now
on most tempest jobs.

As can be seen here -
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297


... the libvirt live_snapshot code is something that our test pipeline
has never tested before, because it wasn't a new enough libvirt for us
to take that path.

Right now it's exploding, a lot -
https://bugs.launchpad.net/nova/+bug/1334398

Snapshotting gets used in Tempest to create images for testing, so image
setup tests are doing a decent number of snapshots. If I had to take a
completely *wild guess*, it's that libvirt can't do 2 live_snapshots at
the same time. It's probably something that most people haven't hit. The
wild guess is based on other libvirt issues we've hit that other people
haven't, and they are basically always a parallel ops triggered problem.

My 'stop the bleeding' suggested fix is this -
https://review.openstack.org/#/c/102643/ which just effectively disables
this code path for now. Then we can get some libvirt experts engaged to
help figure out the right long term fix.

I think there are a couple:

1) see if newer libvirt fixes this (1.2.5 just came out), and if so
mandate at some known working version. This would actually take a bunch
of work to be able to test a non packaged libvirt in our pipeline. We'd
need volunteers for that.

2) lock snapshot operations in nova-compute, so that we can only do 1 at
a time. Hopefully it's just 2 snapshot operations that is the issue, not
any other libvirt op during a snapshot, so serializing snapshot ops in
n-compute could put the kid gloves on libvirt and make it not break
here. This also needs some volunteers as we're going to be playing a
game of progressive serialization until we get to a point where it looks
like the failures go away.

3) Roll back to precise. I put this idea here for completeness, but I
think it's a terrible choice. This is one isolated, previously untested
(by us), code path. We can't stay on libvirt 0.9.6 forever, so actually
need to fix this for real (be it in nova's use of libvirt, or libvirt
itself).

There might be other options as well, ideas welcomed.

But for right now, we should stop the bleeding, so that nova/libvirt
isn't blocking everyone else from merging code.

-Sean



___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-10 Thread Daniel P. Berrange
On Wed, Jul 09, 2014 at 06:23:27PM -0400, Sean Dague wrote:
 The libvirt logs needed are huge, so we can't run them all the time. And
 realistically, I don't think they provided us the info we needed. There
 has been at least one fail on Dan's log hack patch for this scenario
 today, so maybe it will be in there.

I did finally get lucky and hit the failure, and the libvirtd.log has
provided the info to narrow down the problem in QEMU I believe. I'm
going to be talking with QEMU developers about it based on this info
now.

FYI, the logs are approximately 3 MB compressed for a full tempest
run. If turned on this would be either the 3rd or 4th largest log
file we'd be collecting, adding 8-10% to the total size of all.

Currently I had to do a crude hack to enable it

diff --git a/etc/nova/rootwrap.d/compute.filters 
b/etc/nova/rootwrap.d/compute.filters
index b79851b..7e4469a 100644
--- a/etc/nova/rootwrap.d/compute.filters
+++ b/etc/nova/rootwrap.d/compute.filters
@@ -226,3 +226,6 @@ cp: CommandFilter, cp, root
 # nova/virt/xenapi/vm_utils.py:
 sync: CommandFilter, sync, root
 
+apt-get: CommandFilter, apt-get, root
+service: CommandFilter, service, root
+augtool: CommandFilter, augtool, root
diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
index 99edf12..93e60af 100644
--- a/nova/virt/libvirt/driver.py
+++ b/nova/virt/libvirt/driver.py
@@ -28,6 +28,7 @@ Supports KVM, LXC, QEMU, UML, and XEN.
@@ -611,6 +619,16 @@ class LibvirtDriver(driver.ComputeDriver):
 {'type': CONF.libvirt.virt_type, 'arch': arch})
 
 def init_host(self, host):
+utils.execute(apt-get, -y, install, augeas-tools, 
run_as_root=True)
+utils.execute(augtool,
+  process_input=set 
/files/etc/libvirt/libvirtd.conf/log_filters 1:libvirt.c 1:qemu 1:conf 
1:security 3:object 3:event 3:json 3:file 1:util
+set /files/etc/libvirt/libvirtd.conf/log_outputs 
1:file:/var/log/libvirt/libvirtd.log
+save
+, run_as_root=True)
+utils.execute(service, libvirt-bin, restart,
+  run_as_root=True)
+time.sleep(10)
+



If we genuinely can't enable it all the time, then I think we really need
to figure out a way to let us turn it on selectively per review, in a bit
of an easier manner. devstack lets you set DEBUG_LIBVIRT environment
variable to turn this on, but there's no way for people to get that env
var set in the gate runs - AFAICT infra team would have to toggle that
globally each time it was needed which isn't really practical.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-10 Thread Sean Dague
On 07/10/2014 05:03 AM, Daniel P. Berrange wrote:
 On Wed, Jul 09, 2014 at 06:23:27PM -0400, Sean Dague wrote:
 The libvirt logs needed are huge, so we can't run them all the time. And
 realistically, I don't think they provided us the info we needed. There
 has been at least one fail on Dan's log hack patch for this scenario
 today, so maybe it will be in there.
 
 I did finally get lucky and hit the failure, and the libvirtd.log has
 provided the info to narrow down the problem in QEMU I believe. I'm
 going to be talking with QEMU developers about it based on this info
 now.
 
 FYI, the logs are approximately 3 MB compressed for a full tempest
 run. If turned on this would be either the 3rd or 4th largest log
 file we'd be collecting, adding 8-10% to the total size of all.

It's larger than anything other that the ceilometer logs, which are
their own issue. Remember that we are doing 20 - 30k runs a week. So 3MB
+ 20k = 60 GB / week. We're currently trying to keep 6 months of logs.
So * 26 = 1.5 TB of libvirt logs. We're currently limited by having a
max of 14 x 1 TB volumes on our log server in Rax.

We're hoping to fix that with using swift for log storage, if we get
that in place, we could probably do that on every run.

Is it possible to make libvirt log to 2 log files? One that is the
normal light load, and an enhanced error log? Then we could maybe make a
decision on cleanup time about if we need the error log saved or not.
Like if things failed we'd keep it. This all starts to get more
complicated, but might be worth exploring.

 
 Currently I had to do a crude hack to enable it
 
 diff --git a/etc/nova/rootwrap.d/compute.filters 
 b/etc/nova/rootwrap.d/compute.filters
 index b79851b..7e4469a 100644
 --- a/etc/nova/rootwrap.d/compute.filters
 +++ b/etc/nova/rootwrap.d/compute.filters
 @@ -226,3 +226,6 @@ cp: CommandFilter, cp, root
  # nova/virt/xenapi/vm_utils.py:
  sync: CommandFilter, sync, root
  
 +apt-get: CommandFilter, apt-get, root
 +service: CommandFilter, service, root
 +augtool: CommandFilter, augtool, root
 diff --git a/nova/virt/libvirt/driver.py b/nova/virt/libvirt/driver.py
 index 99edf12..93e60af 100644
 --- a/nova/virt/libvirt/driver.py
 +++ b/nova/virt/libvirt/driver.py
 @@ -28,6 +28,7 @@ Supports KVM, LXC, QEMU, UML, and XEN.
 @@ -611,6 +619,16 @@ class LibvirtDriver(driver.ComputeDriver):
  {'type': CONF.libvirt.virt_type, 'arch': arch})
  
  def init_host(self, host):
 +utils.execute(apt-get, -y, install, augeas-tools, 
 run_as_root=True)
 +utils.execute(augtool,
 +  process_input=set 
 /files/etc/libvirt/libvirtd.conf/log_filters 1:libvirt.c 1:qemu 1:conf 
 1:security 3:object 3:event 3:json 3:file 1:util
 +set /files/etc/libvirt/libvirtd.conf/log_outputs 
 1:file:/var/log/libvirt/libvirtd.log
 +save
 +, run_as_root=True)
 +utils.execute(service, libvirt-bin, restart,
 +  run_as_root=True)
 +time.sleep(10)
 +
 
 
 
 If we genuinely can't enable it all the time, then I think we really need
 to figure out a way to let us turn it on selectively per review, in a bit
 of an easier manner. devstack lets you set DEBUG_LIBVIRT environment
 variable to turn this on, but there's no way for people to get that env
 var set in the gate runs - AFAICT infra team would have to toggle that
 globally each time it was needed which isn't really practical.
 
 Regards,
 Daniel
 


-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Daniel P. Berrange
On Tue, Jul 08, 2014 at 02:50:40PM -0700, Joe Gordon wrote:
   But for right now, we should stop the bleeding, so that nova/libvirt
   isn't blocking everyone else from merging code.
 
  Agreed, we should merge the hack and treat the bug as release blocker
  to be resolve prior to Juno GA.
 
 
 
 How can we prevent libvirt issues like this from landing in trunk in the
 first place? If we don't figure out a way to prevent this from landing the
 first place I fear we will keep repeating this same pattern of failure.

Realistically I don't think there was much/any chance of avoiding this
problem. Despite many days of work trying to reproduce it by multiple
people, no one has managed even 1 single failure outside of the gate.
Even inside the gate it is hard to reproduce. I still have absolutely
no clue what is failing after days of investigation  debugging with
all the tricks I can think of, because as I say, it works perfectly
every time I try it, except in the gate where it is impossible to
debug it.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Daniel P. Berrange
On Wed, Jul 09, 2014 at 08:58:02AM +1000, Michael Still wrote:
 On Wed, Jul 9, 2014 at 8:21 AM, Sean Dague s...@dague.net wrote:
 
  This is also why I find it unlikely to be a qemu bug, because that's not
  shared state between guests. If qemu just randomly wedges itself, that
  would be detectable much easier outside of the gate. And there have been
  attempts by danpb to sniff that out, and they haven't worked.
 
 Do you think it would help if we added logging of what eventlet
 threads are running at the time of a failure like this? I can see that
 it might be a bit noisey, but it might also help nail down what this
 is an interaction between.

I don't think so. What I really need is more verbose libvirtd daemon
logs when a time when it fails. I've done a gross hack with a review
I have posted [1] which munges rootwrap to allow me to reconfigure
libvirtd and capture logs. Unfortunately I've been unable to get it
to fail on the snapshot bug since then - it is always hitting other
bugs so far :-(

Regards,
Daniel

[1] https://review.openstack.org/#/c/103066/
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Kashyap Chamarthy
On Tue, Jul 08, 2014 at 06:21:31PM -0400, Sean Dague wrote:
 On 07/08/2014 06:12 PM, Joe Gordon wrote:
  
  
  
  On Tue, Jul 8, 2014 at 2:56 PM, Michael Still mi...@stillhq.com
  mailto:mi...@stillhq.com wrote:
  
  The associated bug says this is probably a qemu bug, so I think we
  should rephrase that to we need to start thinking about how to make
  sure upstream changes don't break nova.
  
  
  Good point.
   
  
  Would running devstack-tempest on the latest upstream release of ? help.
  Not as a voting job but as a periodic (third party?) job, that we can
  hopefully identify these issues early on. I think the big question here
  is who would volunteer to help run a job like this.

Although, I'm familiar with Gate and infra in depth, I can help
volunteer debug such issues (as I try to test libvirt/QEMU upstreams and
from git quite frequently).

 The running of the job really isn't the issue.
 
 It's the debugging of the jobs when the go wrong. Creating a new test
 job and getting it lit is really  10% of the work, sifting through the
 fails and getting to the bottom of things is the hard and time consuming
 part.

Very true. For instance -- the live snapshot issue[1], I wish we could
get to the logical end of it (without letting it languish) and enable it
back in Nova soon. But, as of now, we're not able to pin point the
root cause and it's not reproducible any more from Dan Berrange's
detailed analysis after a week of tests outside the Gate or tests 
with some debugging enabled[2] when there's a light load on the Gate --
both cases, he didn't hit the issue after multiple test runs.

Dan raised on #openstack-nova if there might be some  weird I/O issue in
HP cloud that's leading to these timeouts, but Sean said  timeout would
be an issue only if this (the test in question) take 2 minutes some
times and succeed.

FWIW, from my local tests of exact Nova invocation of libvirt
blockRebase API to do parallel blockcopy operations followed by an
explicit abort (to gracefully end the block operation), I couldn't
reproduce it on multiple runs either.

 
  [1] https://bugs.launchpad.net/nova/+bug/1334398 -- libvirt
  live_snapshot periodically explodes on libvirt 1.2.2 in the gate
  [2] https://review.openstack.org/#/c/103066/
 
 
 The other option is to remove more concurrency from nova-compute. It's
 pretty clear that this problem only seems to happen when the
 snapshotting is going on at the same time guests are being created or
 destroyed (possibly also a second snapshot going on).
 
 This is also why I find it unlikely to be a qemu bug, because that's not
 shared state between guests. If qemu just randomly wedges itself, that
 would be detectable much easier outside of the gate. And there have been
 attempts by danpb to sniff that out, and they haven't worked.
 
   -Sean
 


-- 
/kashyap

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Sean Dague
On 07/09/2014 03:58 AM, Daniel P. Berrange wrote:
 On Tue, Jul 08, 2014 at 02:50:40PM -0700, Joe Gordon wrote:
 But for right now, we should stop the bleeding, so that nova/libvirt
 isn't blocking everyone else from merging code.

 Agreed, we should merge the hack and treat the bug as release blocker
 to be resolve prior to Juno GA.



 How can we prevent libvirt issues like this from landing in trunk in the
 first place? If we don't figure out a way to prevent this from landing the
 first place I fear we will keep repeating this same pattern of failure.

Right, this is where math is against us. If a race shows up 1% of the
time, you need 66 runs to have a 50% of seeing it. I still haven't
calibrated the bugs to an absolute scale, but I think based on what I
remember this livesnapshot bug was probably a 3-4% bug (per Tempest
run). So you'd need 50 Tempest runs to have an 80% to see it show up again.

(Absolute calibration of the bugs is on my todo list for Elastic
Recheck, maybe it's time to put that in front of fixing the bugs)

 Realistically I don't think there was much/any chance of avoiding this
 problem. Despite many days of work trying to reproduce it by multiple
 people, no one has managed even 1 single failure outside of the gate.
 Even inside the gate it is hard to reproduce. I still have absolutely
 no clue what is failing after days of investigation  debugging with
 all the tricks I can think of, because as I say, it works perfectly
 every time I try it, except in the gate where it is impossible to
 debug it.

Out of curiosity, is your reproduce using eventlet? My expectation is
that eventlet's concurency actually exacerbates this because when the
snapshot starts we're now doing IO, and that means it's exactly the time
that other compute work will be triggered.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Daniel P. Berrange
On Wed, Jul 09, 2014 at 08:34:06AM -0400, Sean Dague wrote:
 On 07/09/2014 03:58 AM, Daniel P. Berrange wrote:
  On Tue, Jul 08, 2014 at 02:50:40PM -0700, Joe Gordon wrote:
  But for right now, we should stop the bleeding, so that nova/libvirt
  isn't blocking everyone else from merging code.
 
  Agreed, we should merge the hack and treat the bug as release blocker
  to be resolve prior to Juno GA.
 
 
 
  How can we prevent libvirt issues like this from landing in trunk in the
  first place? If we don't figure out a way to prevent this from landing the
  first place I fear we will keep repeating this same pattern of failure.
 
 Right, this is where math is against us. If a race shows up 1% of the
 time, you need 66 runs to have a 50% of seeing it. I still haven't
 calibrated the bugs to an absolute scale, but I think based on what I
 remember this livesnapshot bug was probably a 3-4% bug (per Tempest
 run). So you'd need 50 Tempest runs to have an 80% to see it show up again.
 
 (Absolute calibration of the bugs is on my todo list for Elastic
 Recheck, maybe it's time to put that in front of fixing the bugs)
 
  Realistically I don't think there was much/any chance of avoiding this
  problem. Despite many days of work trying to reproduce it by multiple
  people, no one has managed even 1 single failure outside of the gate.
  Even inside the gate it is hard to reproduce. I still have absolutely
  no clue what is failing after days of investigation  debugging with
  all the tricks I can think of, because as I say, it works perfectly
  every time I try it, except in the gate where it is impossible to
  debug it.
 
 Out of curiosity, is your reproduce using eventlet? My expectation is
 that eventlet's concurency actually exacerbates this because when the
 snapshot starts we're now doing IO, and that means it's exactly the time
 that other compute work will be triggered.

I've tried both running the tempest suite itself, and also running
a dedicated stress test written against libvirt snapshot APIs in C.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Kashyap Chamarthy
On Wed, Jul 09, 2014 at 05:47:47PM +0530, Kashyap Chamarthy wrote:
 On Tue, Jul 08, 2014 at 06:21:31PM -0400, Sean Dague wrote:
  On 07/08/2014 06:12 PM, Joe Gordon wrote:
   
   
   
   On Tue, Jul 8, 2014 at 2:56 PM, Michael Still mi...@stillhq.com
   mailto:mi...@stillhq.com wrote:
   
   The associated bug says this is probably a qemu bug, so I think we
   should rephrase that to we need to start thinking about how to make
   sure upstream changes don't break nova.
   
   
   Good point.

   
   Would running devstack-tempest on the latest upstream release of ? help.
   Not as a voting job but as a periodic (third party?) job, that we can
   hopefully identify these issues early on. I think the big question here
   is who would volunteer to help run a job like this.
 
 Although, I'm familiar 

Oops, typo: *Not familiar :-)

 with Gate and infra in depth, I can help
 volunteer debug such issues (as I try to test libvirt/QEMU upstreams and
 from git quite frequently).
 

-- 
/kashyap

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Michael Still
Can we get our gate images tweaked to have more verbose libvirt
logging on in general? There's been a few times in the last year or so
when we've really needed it.

Michael

On Wed, Jul 9, 2014 at 6:01 PM, Daniel P. Berrange berra...@redhat.com wrote:
 On Wed, Jul 09, 2014 at 08:58:02AM +1000, Michael Still wrote:
 On Wed, Jul 9, 2014 at 8:21 AM, Sean Dague s...@dague.net wrote:

  This is also why I find it unlikely to be a qemu bug, because that's not
  shared state between guests. If qemu just randomly wedges itself, that
  would be detectable much easier outside of the gate. And there have been
  attempts by danpb to sniff that out, and they haven't worked.

 Do you think it would help if we added logging of what eventlet
 threads are running at the time of a failure like this? I can see that
 it might be a bit noisey, but it might also help nail down what this
 is an interaction between.

 I don't think so. What I really need is more verbose libvirtd daemon
 logs when a time when it fails. I've done a gross hack with a review
 I have posted [1] which munges rootwrap to allow me to reconfigure
 libvirtd and capture logs. Unfortunately I've been unable to get it
 to fail on the snapshot bug since then - it is always hitting other
 bugs so far :-(

 Regards,
 Daniel

 [1] https://review.openstack.org/#/c/103066/
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-09 Thread Sean Dague
The libvirt logs needed are huge, so we can't run them all the time. And
realistically, I don't think they provided us the info we needed. There
has been at least one fail on Dan's log hack patch for this scenario
today, so maybe it will be in there.

On 07/09/2014 05:44 PM, Michael Still wrote:
 Can we get our gate images tweaked to have more verbose libvirt
 logging on in general? There's been a few times in the last year or so
 when we've really needed it.
 
 Michael
 
 On Wed, Jul 9, 2014 at 6:01 PM, Daniel P. Berrange berra...@redhat.com 
 wrote:
 On Wed, Jul 09, 2014 at 08:58:02AM +1000, Michael Still wrote:
 On Wed, Jul 9, 2014 at 8:21 AM, Sean Dague s...@dague.net wrote:

 This is also why I find it unlikely to be a qemu bug, because that's not
 shared state between guests. If qemu just randomly wedges itself, that
 would be detectable much easier outside of the gate. And there have been
 attempts by danpb to sniff that out, and they haven't worked.

 Do you think it would help if we added logging of what eventlet
 threads are running at the time of a failure like this? I can see that
 it might be a bit noisey, but it might also help nail down what this
 is an interaction between.

 I don't think so. What I really need is more verbose libvirtd daemon
 logs when a time when it fails. I've done a gross hack with a review
 I have posted [1] which munges rootwrap to allow me to reconfigure
 libvirtd and capture logs. Unfortunately I've been unable to get it
 to fail on the snapshot bug since then - it is always hitting other
 bugs so far :-(

 Regards,
 Daniel

 [1] https://review.openstack.org/#/c/103066/
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
 |: http://libvirt.org  -o- http://virt-manager.org :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 


-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-08 Thread Joe Gordon
On Thu, Jun 26, 2014 at 4:12 AM, Daniel P. Berrange berra...@redhat.com
wrote:

 On Thu, Jun 26, 2014 at 07:00:32AM -0400, Sean Dague wrote:
  While the Trusty transition was mostly uneventful, it has exposed a
  particular issue in libvirt, which is generating ~ 25% failure rate now
  on most tempest jobs.
 
  As can be seen here -
 
 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297
 
 
  ... the libvirt live_snapshot code is something that our test pipeline
  has never tested before, because it wasn't a new enough libvirt for us
  to take that path.
 
  Right now it's exploding, a lot -
  https://bugs.launchpad.net/nova/+bug/1334398
 
  Snapshotting gets used in Tempest to create images for testing, so image
  setup tests are doing a decent number of snapshots. If I had to take a
  completely *wild guess*, it's that libvirt can't do 2 live_snapshots at
  the same time. It's probably something that most people haven't hit. The
  wild guess is based on other libvirt issues we've hit that other people
  haven't, and they are basically always a parallel ops triggered problem.
 
  My 'stop the bleeding' suggested fix is this -
  https://review.openstack.org/#/c/102643/ which just effectively disables
  this code path for now. Then we can get some libvirt experts engaged to
  help figure out the right long term fix.

 Yes, this is a sensible pragmatic workaround for the short term until
 we diagnose the root cause  fix it.

  I think there are a couple:
 
  1) see if newer libvirt fixes this (1.2.5 just came out), and if so
  mandate at some known working version. This would actually take a bunch
  of work to be able to test a non packaged libvirt in our pipeline. We'd
  need volunteers for that.
 
  2) lock snapshot operations in nova-compute, so that we can only do 1 at
  a time. Hopefully it's just 2 snapshot operations that is the issue, not
  any other libvirt op during a snapshot, so serializing snapshot ops in
  n-compute could put the kid gloves on libvirt and make it not break
  here. This also needs some volunteers as we're going to be playing a
  game of progressive serialization until we get to a point where it looks
  like the failures go away.
 
  3) Roll back to precise. I put this idea here for completeness, but I
  think it's a terrible choice. This is one isolated, previously untested
  (by us), code path. We can't stay on libvirt 0.9.6 forever, so actually
  need to fix this for real (be it in nova's use of libvirt, or libvirt
  itself).

 Yep, since we *never* tested this code path in the gate before, rolling
 back to precise would not even really be a fix for the problem. It would
 merely mean we're not testing the code path again, which is really akin
 to sticking our head in the sand.

  But for right now, we should stop the bleeding, so that nova/libvirt
  isn't blocking everyone else from merging code.

 Agreed, we should merge the hack and treat the bug as release blocker
 to be resolve prior to Juno GA.



How can we prevent libvirt issues like this from landing in trunk in the
first place? If we don't figure out a way to prevent this from landing the
first place I fear we will keep repeating this same pattern of failure.



 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
 :|
 |: http://libvirt.org  -o- http://virt-manager.org
 :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
 :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
 :|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-08 Thread Michael Still
The associated bug says this is probably a qemu bug, so I think we
should rephrase that to we need to start thinking about how to make
sure upstream changes don't break nova.

Michael

On Wed, Jul 9, 2014 at 7:50 AM, Joe Gordon joe.gord...@gmail.com wrote:

 On Thu, Jun 26, 2014 at 4:12 AM, Daniel P. Berrange berra...@redhat.com
 wrote:

 On Thu, Jun 26, 2014 at 07:00:32AM -0400, Sean Dague wrote:
  While the Trusty transition was mostly uneventful, it has exposed a
  particular issue in libvirt, which is generating ~ 25% failure rate now
  on most tempest jobs.
 
  As can be seen here -
 
  https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297
 
 
  ... the libvirt live_snapshot code is something that our test pipeline
  has never tested before, because it wasn't a new enough libvirt for us
  to take that path.
 
  Right now it's exploding, a lot -
  https://bugs.launchpad.net/nova/+bug/1334398
 
  Snapshotting gets used in Tempest to create images for testing, so image
  setup tests are doing a decent number of snapshots. If I had to take a
  completely *wild guess*, it's that libvirt can't do 2 live_snapshots at
  the same time. It's probably something that most people haven't hit. The
  wild guess is based on other libvirt issues we've hit that other people
  haven't, and they are basically always a parallel ops triggered problem.
 
  My 'stop the bleeding' suggested fix is this -
  https://review.openstack.org/#/c/102643/ which just effectively disables
  this code path for now. Then we can get some libvirt experts engaged to
  help figure out the right long term fix.

 Yes, this is a sensible pragmatic workaround for the short term until
 we diagnose the root cause  fix it.

  I think there are a couple:
 
  1) see if newer libvirt fixes this (1.2.5 just came out), and if so
  mandate at some known working version. This would actually take a bunch
  of work to be able to test a non packaged libvirt in our pipeline. We'd
  need volunteers for that.
 
  2) lock snapshot operations in nova-compute, so that we can only do 1 at
  a time. Hopefully it's just 2 snapshot operations that is the issue, not
  any other libvirt op during a snapshot, so serializing snapshot ops in
  n-compute could put the kid gloves on libvirt and make it not break
  here. This also needs some volunteers as we're going to be playing a
  game of progressive serialization until we get to a point where it looks
  like the failures go away.
 
  3) Roll back to precise. I put this idea here for completeness, but I
  think it's a terrible choice. This is one isolated, previously untested
  (by us), code path. We can't stay on libvirt 0.9.6 forever, so actually
  need to fix this for real (be it in nova's use of libvirt, or libvirt
  itself).

 Yep, since we *never* tested this code path in the gate before, rolling
 back to precise would not even really be a fix for the problem. It would
 merely mean we're not testing the code path again, which is really akin
 to sticking our head in the sand.

  But for right now, we should stop the bleeding, so that nova/libvirt
  isn't blocking everyone else from merging code.

 Agreed, we should merge the hack and treat the bug as release blocker
 to be resolve prior to Juno GA.



 How can we prevent libvirt issues like this from landing in trunk in the
 first place? If we don't figure out a way to prevent this from landing the
 first place I fear we will keep repeating this same pattern of failure.



 Regards,
 Daniel
 --
 |: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/
 :|
 |: http://libvirt.org  -o- http://virt-manager.org
 :|
 |: http://autobuild.org   -o- http://search.cpan.org/~danberr/
 :|
 |: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc
 :|

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev



 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev




-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-08 Thread Boris Pavlovic
Joe,

What about running benchmarks (with small load), for all major functions
(like snaphshoting, booting/deleting, ..) on every patch in nova. It can
catch a lot of related stuff.


Best regards,
Boris Pavlovic


On Wed, Jul 9, 2014 at 1:56 AM, Michael Still mi...@stillhq.com wrote:

 The associated bug says this is probably a qemu bug, so I think we
 should rephrase that to we need to start thinking about how to make
 sure upstream changes don't break nova.

 Michael

 On Wed, Jul 9, 2014 at 7:50 AM, Joe Gordon joe.gord...@gmail.com wrote:
 
  On Thu, Jun 26, 2014 at 4:12 AM, Daniel P. Berrange berra...@redhat.com
 
  wrote:
 
  On Thu, Jun 26, 2014 at 07:00:32AM -0400, Sean Dague wrote:
   While the Trusty transition was mostly uneventful, it has exposed a
   particular issue in libvirt, which is generating ~ 25% failure rate
 now
   on most tempest jobs.
  
   As can be seen here -
  
  
 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297
  
  
   ... the libvirt live_snapshot code is something that our test pipeline
   has never tested before, because it wasn't a new enough libvirt for us
   to take that path.
  
   Right now it's exploding, a lot -
   https://bugs.launchpad.net/nova/+bug/1334398
  
   Snapshotting gets used in Tempest to create images for testing, so
 image
   setup tests are doing a decent number of snapshots. If I had to take a
   completely *wild guess*, it's that libvirt can't do 2 live_snapshots
 at
   the same time. It's probably something that most people haven't hit.
 The
   wild guess is based on other libvirt issues we've hit that other
 people
   haven't, and they are basically always a parallel ops triggered
 problem.
  
   My 'stop the bleeding' suggested fix is this -
   https://review.openstack.org/#/c/102643/ which just effectively
 disables
   this code path for now. Then we can get some libvirt experts engaged
 to
   help figure out the right long term fix.
 
  Yes, this is a sensible pragmatic workaround for the short term until
  we diagnose the root cause  fix it.
 
   I think there are a couple:
  
   1) see if newer libvirt fixes this (1.2.5 just came out), and if so
   mandate at some known working version. This would actually take a
 bunch
   of work to be able to test a non packaged libvirt in our pipeline.
 We'd
   need volunteers for that.
  
   2) lock snapshot operations in nova-compute, so that we can only do 1
 at
   a time. Hopefully it's just 2 snapshot operations that is the issue,
 not
   any other libvirt op during a snapshot, so serializing snapshot ops in
   n-compute could put the kid gloves on libvirt and make it not break
   here. This also needs some volunteers as we're going to be playing a
   game of progressive serialization until we get to a point where it
 looks
   like the failures go away.
  
   3) Roll back to precise. I put this idea here for completeness, but I
   think it's a terrible choice. This is one isolated, previously
 untested
   (by us), code path. We can't stay on libvirt 0.9.6 forever, so
 actually
   need to fix this for real (be it in nova's use of libvirt, or libvirt
   itself).
 
  Yep, since we *never* tested this code path in the gate before, rolling
  back to precise would not even really be a fix for the problem. It would
  merely mean we're not testing the code path again, which is really akin
  to sticking our head in the sand.
 
   But for right now, we should stop the bleeding, so that nova/libvirt
   isn't blocking everyone else from merging code.
 
  Agreed, we should merge the hack and treat the bug as release blocker
  to be resolve prior to Juno GA.
 
 
 
  How can we prevent libvirt issues like this from landing in trunk in the
  first place? If we don't figure out a way to prevent this from landing
 the
  first place I fear we will keep repeating this same pattern of failure.
 
 
 
  Regards,
  Daniel
  --
  |: http://berrange.com  -o-
 http://www.flickr.com/photos/dberrange/
  :|
  |: http://libvirt.org  -o-
 http://virt-manager.org
  :|
  |: http://autobuild.org   -o-
 http://search.cpan.org/~danberr/
  :|
  |: http://entangle-photo.org   -o-
 http://live.gnome.org/gtk-vnc
  :|
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 



 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-08 Thread Joe Gordon
On Tue, Jul 8, 2014 at 2:56 PM, Michael Still mi...@stillhq.com wrote:

 The associated bug says this is probably a qemu bug, so I think we
 should rephrase that to we need to start thinking about how to make
 sure upstream changes don't break nova.


Good point.


Would running devstack-tempest on the latest upstream release of ? help.
Not as a voting job but as a periodic (third party?) job, that we can
hopefully identify these issues early on. I think the big question here is
who would volunteer to help run a job like this.



 Michael

 On Wed, Jul 9, 2014 at 7:50 AM, Joe Gordon joe.gord...@gmail.com wrote:
 
  On Thu, Jun 26, 2014 at 4:12 AM, Daniel P. Berrange berra...@redhat.com
 
  wrote:
 
  On Thu, Jun 26, 2014 at 07:00:32AM -0400, Sean Dague wrote:
   While the Trusty transition was mostly uneventful, it has exposed a
   particular issue in libvirt, which is generating ~ 25% failure rate
 now
   on most tempest jobs.
  
   As can be seen here -
  
  
 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297
  
  
   ... the libvirt live_snapshot code is something that our test pipeline
   has never tested before, because it wasn't a new enough libvirt for us
   to take that path.
  
   Right now it's exploding, a lot -
   https://bugs.launchpad.net/nova/+bug/1334398
  
   Snapshotting gets used in Tempest to create images for testing, so
 image
   setup tests are doing a decent number of snapshots. If I had to take a
   completely *wild guess*, it's that libvirt can't do 2 live_snapshots
 at
   the same time. It's probably something that most people haven't hit.
 The
   wild guess is based on other libvirt issues we've hit that other
 people
   haven't, and they are basically always a parallel ops triggered
 problem.
  
   My 'stop the bleeding' suggested fix is this -
   https://review.openstack.org/#/c/102643/ which just effectively
 disables
   this code path for now. Then we can get some libvirt experts engaged
 to
   help figure out the right long term fix.
 
  Yes, this is a sensible pragmatic workaround for the short term until
  we diagnose the root cause  fix it.
 
   I think there are a couple:
  
   1) see if newer libvirt fixes this (1.2.5 just came out), and if so
   mandate at some known working version. This would actually take a
 bunch
   of work to be able to test a non packaged libvirt in our pipeline.
 We'd
   need volunteers for that.
  
   2) lock snapshot operations in nova-compute, so that we can only do 1
 at
   a time. Hopefully it's just 2 snapshot operations that is the issue,
 not
   any other libvirt op during a snapshot, so serializing snapshot ops in
   n-compute could put the kid gloves on libvirt and make it not break
   here. This also needs some volunteers as we're going to be playing a
   game of progressive serialization until we get to a point where it
 looks
   like the failures go away.
  
   3) Roll back to precise. I put this idea here for completeness, but I
   think it's a terrible choice. This is one isolated, previously
 untested
   (by us), code path. We can't stay on libvirt 0.9.6 forever, so
 actually
   need to fix this for real (be it in nova's use of libvirt, or libvirt
   itself).
 
  Yep, since we *never* tested this code path in the gate before, rolling
  back to precise would not even really be a fix for the problem. It would
  merely mean we're not testing the code path again, which is really akin
  to sticking our head in the sand.
 
   But for right now, we should stop the bleeding, so that nova/libvirt
   isn't blocking everyone else from merging code.
 
  Agreed, we should merge the hack and treat the bug as release blocker
  to be resolve prior to Juno GA.
 
 
 
  How can we prevent libvirt issues like this from landing in trunk in the
  first place? If we don't figure out a way to prevent this from landing
 the
  first place I fear we will keep repeating this same pattern of failure.
 
 
 
  Regards,
  Daniel
  --
  |: http://berrange.com  -o-
 http://www.flickr.com/photos/dberrange/
  :|
  |: http://libvirt.org  -o-
 http://virt-manager.org
  :|
  |: http://autobuild.org   -o-
 http://search.cpan.org/~danberr/
  :|
  |: http://entangle-photo.org   -o-
 http://live.gnome.org/gtk-vnc
  :|
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 
 
 
  ___
  OpenStack-dev mailing list
  OpenStack-dev@lists.openstack.org
  http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev
 



 --
 Rackspace Australia

 ___
 OpenStack-dev mailing list
 OpenStack-dev@lists.openstack.org
 http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org

Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-07-08 Thread Michael Still
On Wed, Jul 9, 2014 at 8:21 AM, Sean Dague s...@dague.net wrote:

 This is also why I find it unlikely to be a qemu bug, because that's not
 shared state between guests. If qemu just randomly wedges itself, that
 would be detectable much easier outside of the gate. And there have been
 attempts by danpb to sniff that out, and they haven't worked.

Do you think it would help if we added logging of what eventlet
threads are running at the time of a failure like this? I can see that
it might be a bit noisey, but it might also help nail down what this
is an interaction between.

Michael

-- 
Rackspace Australia

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


[openstack-dev] [nova] top gate bug is libvirt snapshot

2014-06-26 Thread Sean Dague
While the Trusty transition was mostly uneventful, it has exposed a
particular issue in libvirt, which is generating ~ 25% failure rate now
on most tempest jobs.

As can be seen here -
https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297


... the libvirt live_snapshot code is something that our test pipeline
has never tested before, because it wasn't a new enough libvirt for us
to take that path.

Right now it's exploding, a lot -
https://bugs.launchpad.net/nova/+bug/1334398

Snapshotting gets used in Tempest to create images for testing, so image
setup tests are doing a decent number of snapshots. If I had to take a
completely *wild guess*, it's that libvirt can't do 2 live_snapshots at
the same time. It's probably something that most people haven't hit. The
wild guess is based on other libvirt issues we've hit that other people
haven't, and they are basically always a parallel ops triggered problem.

My 'stop the bleeding' suggested fix is this -
https://review.openstack.org/#/c/102643/ which just effectively disables
this code path for now. Then we can get some libvirt experts engaged to
help figure out the right long term fix.

I think there are a couple:

1) see if newer libvirt fixes this (1.2.5 just came out), and if so
mandate at some known working version. This would actually take a bunch
of work to be able to test a non packaged libvirt in our pipeline. We'd
need volunteers for that.

2) lock snapshot operations in nova-compute, so that we can only do 1 at
a time. Hopefully it's just 2 snapshot operations that is the issue, not
any other libvirt op during a snapshot, so serializing snapshot ops in
n-compute could put the kid gloves on libvirt and make it not break
here. This also needs some volunteers as we're going to be playing a
game of progressive serialization until we get to a point where it looks
like the failures go away.

3) Roll back to precise. I put this idea here for completeness, but I
think it's a terrible choice. This is one isolated, previously untested
(by us), code path. We can't stay on libvirt 0.9.6 forever, so actually
need to fix this for real (be it in nova's use of libvirt, or libvirt
itself).

There might be other options as well, ideas welcomed.

But for right now, we should stop the bleeding, so that nova/libvirt
isn't blocking everyone else from merging code.

-Sean

-- 
Sean Dague
http://dague.net



signature.asc
Description: OpenPGP digital signature
___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev


Re: [openstack-dev] [nova] top gate bug is libvirt snapshot

2014-06-26 Thread Daniel P. Berrange
On Thu, Jun 26, 2014 at 07:00:32AM -0400, Sean Dague wrote:
 While the Trusty transition was mostly uneventful, it has exposed a
 particular issue in libvirt, which is generating ~ 25% failure rate now
 on most tempest jobs.
 
 As can be seen here -
 https://github.com/openstack/nova/blob/master/nova/virt/libvirt/driver.py#L294-L297
 
 
 ... the libvirt live_snapshot code is something that our test pipeline
 has never tested before, because it wasn't a new enough libvirt for us
 to take that path.
 
 Right now it's exploding, a lot -
 https://bugs.launchpad.net/nova/+bug/1334398
 
 Snapshotting gets used in Tempest to create images for testing, so image
 setup tests are doing a decent number of snapshots. If I had to take a
 completely *wild guess*, it's that libvirt can't do 2 live_snapshots at
 the same time. It's probably something that most people haven't hit. The
 wild guess is based on other libvirt issues we've hit that other people
 haven't, and they are basically always a parallel ops triggered problem.
 
 My 'stop the bleeding' suggested fix is this -
 https://review.openstack.org/#/c/102643/ which just effectively disables
 this code path for now. Then we can get some libvirt experts engaged to
 help figure out the right long term fix.

Yes, this is a sensible pragmatic workaround for the short term until
we diagnose the root cause  fix it.

 I think there are a couple:
 
 1) see if newer libvirt fixes this (1.2.5 just came out), and if so
 mandate at some known working version. This would actually take a bunch
 of work to be able to test a non packaged libvirt in our pipeline. We'd
 need volunteers for that.
 
 2) lock snapshot operations in nova-compute, so that we can only do 1 at
 a time. Hopefully it's just 2 snapshot operations that is the issue, not
 any other libvirt op during a snapshot, so serializing snapshot ops in
 n-compute could put the kid gloves on libvirt and make it not break
 here. This also needs some volunteers as we're going to be playing a
 game of progressive serialization until we get to a point where it looks
 like the failures go away.
 
 3) Roll back to precise. I put this idea here for completeness, but I
 think it's a terrible choice. This is one isolated, previously untested
 (by us), code path. We can't stay on libvirt 0.9.6 forever, so actually
 need to fix this for real (be it in nova's use of libvirt, or libvirt
 itself).

Yep, since we *never* tested this code path in the gate before, rolling
back to precise would not even really be a fix for the problem. It would
merely mean we're not testing the code path again, which is really akin
to sticking our head in the sand.

 But for right now, we should stop the bleeding, so that nova/libvirt
 isn't blocking everyone else from merging code.

Agreed, we should merge the hack and treat the bug as release blocker
to be resolve prior to Juno GA.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|

___
OpenStack-dev mailing list
OpenStack-dev@lists.openstack.org
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-dev