Thanks for confirming I'm not fully insane. We only have one cluster left to 
upgrade now (naturally the oldest, biggest and most dangerous one). Hopefully 
it doesn't repeat there, but if it does, you've given me a few more things to 
look at.

From: [email protected] 
Subject: Re: [Openstack-operators] libvirt freezing when loading Nova instance 
nwfilters

We ran into the "virsh nwfilter-list hanging indefinitely" thing back in early 
January. I spent hours and I almost went insane trying to figure it out. We 
weren't upgrading nodes, though, it just sort of happened.

I have no idea if the following was the correct way of handling this, but this 
ultimately got nova-compute back up and running:

I ran:

$ ss -ax

on the hypervisor and saw that some monitor sockets had a Recv-Q of non-zero. 
On the processes related to those sockets, I ran:

$ strace -p <pid>

and saw no activity. Compared to sockets with zero Recv-Q, strace showed 
activity. By now, I figured my only options were a full hypervisor reboot or to 
kill the instances with no activity. Since those instances would be killed from 
a full reboot anyway, I did a "virsh destroy" on the instances. Once they were 
destroyed, nova-compute was able to start cleanly.

We had this happen on 3 hypervisors. Each one had between 1 and 3 of these 
types of instances, so not a lot at all. Once they were destroyed, nova-compute 
began working again on all 3.

We later had a user report that he noticed some problems with his instance (not 
one of the ones destroyed) and thought it might have to do with the leap 
second. No idea if that's true, but the timing kind of works out.

Hope that helps,
Joe


On Wed, Feb 22, 2017 at 8:33 AM, Edmund Rhudy (BLOOMBERG/ 120 PARK) 
<[email protected]> wrote:

I recently witnessed a strange issue with libvirt when upgrading one of our 
clusters from Kilo to Liberty. I'm not really looking for a specific diagnosis 
here because of the large number of confounding factors and the relative ease 
of remediating it, but I'm interested to hear if anyone else has witnessed this 
particular problem.

Background is we had a number of Kilo-based clusters, all running Ubuntu 
14.04.4 with OpenStack installed from the Ubuntu cloud archive. The upgrade 
process to Liberty involved upgrading the OpenStack components and their 
dependencies (including libvirt), then afterward upgrading all remaining 
packages via dist-upgrade (and staging a kernel upgrade from 3.13 to 4.4, to 
take effect on the next reboot). 7 clusters had all been upgraded successfully 
using this strategy.

One cluster, however, decided to get a bit weird. After the upgrade, 4 
hypervisors showed that nova-compute was refusing to come up properly and was 
showing as enabled/down in nova service-list. Upon further investigation, 
nova-compute was starting up but was getting jammed on loading nwfilters. When 
I ran "virsh nwfilter-list", the command stalled indefinitely. Killing 
nova-compute and restarting libvirt-bin service allowed the command to work 
again, but it did not list any of the nova-instance-instance-* nwfilters. Once 
nova-compute was started, it tried to start loading the instance-specific 
filters and libvirt would wedge. I spent a while tinkering with the affected 
systems but could not find any way of correcting the issue other than rebooting 
the hypervisor, after which everything was fine.

Has anyone ever seen anything like this? libvirt was upgraded from 1.2.12 to 
1.2.16. Hundreds of hypervisors had already received this exact same upgrade 
without showing this problem, and I have no idea how I could reproduce it. I'm 
interested to hear if anyone else has ever run into this and if they figured 
out what the root cause was, though I've already braced myself for tumbleweeds.
_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators


_______________________________________________
OpenStack-operators mailing list
[email protected]
http://lists.openstack.org/cgi-bin/mailman/listinfo/openstack-operators

Reply via email to