[Bug 1746630] Re: virsh api is stuck when vm is down with NFS borken

2018-02-02 Thread Seyeong Kim
hello paelzer

thanks a lot

PPA link is

https://launchpad.net/~xtrusia/+archive/ubuntu/sf161119

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1746630

Title:
  virsh api is stuck when vm is down with NFS borken

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1746630/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1746630] Re: virsh api is stuck when vm is down with NFS borken

2018-02-02 Thread ChristianEhrhardt
Hi Seyeong,
thanks for the analysis and backport.

The patches look good to me
- two structural changes with no net-effect (to let the backport apply)
- there could have been a backport without those, but I agree that this looks 
clearer
- the actual fix seems ok, skipping if no data avail sounds right
- some cleanups in the patch headers is required, but I can do that on upload 
for you.
- these changes are upstream a long time and are still the way implemented by 
this change in 4.0

Thanks a lot, two things:
1. do you have a ppa with that already that I should run checks against 
(otherwise I'll open one up when really prepping the SRU)?
2. there is a security update in flight we have to wait for - I'm postponing 
this fix until that is complete.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1746630

Title:
  virsh api is stuck when vm is down with NFS borken

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1746630/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1746630] Re: virsh api is stuck when vm is down with NFS borken

2018-02-01 Thread ChristianEhrhardt
** Also affects: libvirt (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Changed in: libvirt (Ubuntu)
   Status: New => Fix Released

** Changed in: libvirt (Ubuntu Xenial)
   Status: New => Triaged

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1746630

Title:
  virsh api is stuck when vm is down with NFS borken

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1746630/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1746630] Re: virsh api is stuck when vm is down with NFS borken

2018-01-31 Thread Ubuntu Foundations Team Bug Bot
The attachment "lp1746630_xenial.debdiff" seems to be a debdiff.  The
ubuntu-sponsors team has been subscribed to the bug report so that they
can review and hopefully sponsor the debdiff.  If the attachment isn't a
patch, please remove the "patch" flag from the attachment, remove the
"patch" tag, and if you are member of the ~ubuntu-sponsors, unsubscribe
the team.

[This is an automated message performed by a Launchpad user owned by
~brian-murray, for any issue please contact him.]

** Tags added: patch

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1746630

Title:
  virsh api is stuck when vm is down with NFS borken

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1746630/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1746630] Re: virsh api is stuck when vm is down with NFS borken

2018-01-31 Thread Bug Watch Updater
Launchpad has imported 15 comments from the remote bug at
https://bugzilla.redhat.com/show_bug.cgi?id=1337073.

If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.


On 2016-05-18T08:46:03+00:00 Francesco wrote:

Description of problem:
Short summary:
if a QEMU/KVM VM hangs for unresponsive storage (NFS server unreachable), after 
a random amount of time virDomainGetControlInfo() stops to respond.

Packages:
qemu-kvm-tools-rhev-2.3.0-31.el7_2.14.x86_64
ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch
qemu-kvm-rhev-2.3.0-31.el7_2.14.x86_64
libvirt-daemon-driver-qemu-1.3.4-1.el7.x86_64
qemu-img-rhev-2.3.0-31.el7_2.14.x86_64
qemu-kvm-common-rhev-2.3.0-31.el7_2.14.x86_64

libvirt-daemon-driver-storage-1.3.4-1.el7.x86_64
libvirt-daemon-driver-interface-1.3.4-1.el7.x86_64
libvirt-debuginfo-1.3.4-1.el7.x86_64
libvirt-daemon-kvm-1.3.4-1.el7.x86_64
libvirt-daemon-config-nwfilter-1.3.4-1.el7.x86_64
libvirt-daemon-config-network-1.3.4-1.el7.x86_64
libvirt-client-1.3.4-1.el7.x86_64
libvirt-daemon-driver-lxc-1.3.4-1.el7.x86_64
libvirt-lock-sanlock-1.3.4-1.el7.x86_64
libvirt-daemon-1.3.4-1.el7.x86_64
libvirt-daemon-driver-qemu-1.3.4-1.el7.x86_64
libvirt-devel-1.3.4-1.el7.x86_64
libvirt-daemon-driver-secret-1.3.4-1.el7.x86_64
libvirt-daemon-lxc-1.3.4-1.el7.x86_64
libvirt-nss-1.3.4-1.el7.x86_64
libvirt-1.3.4-1.el7.x86_64
libvirt-daemon-driver-nodedev-1.3.4-1.el7.x86_64
libvirt-python-1.2.17-2.el7.x86_64
libvirt-daemon-driver-network-1.3.4-1.el7.x86_64
libvirt-login-shell-1.3.4-1.el7.x86_64
libvirt-daemon-driver-nwfilter-1.3.4-1.el7.x86_64
libvirt-docs-1.3.4-1.el7.x86_64

libvirt recompiled from git, qemu from RHEL


Context:
Vdsm is the node management system of oVirt (http://www.ovirt.org) and uses 
libvirt to run and monitor VMs. We use QEMU/KVM VMs, over shared storage.
Among the calls Vdsm periodically run to monitor the VM state:

virConnectGetAllDomainStats
virDomainListGetStats
virDomainGetBlockIoTune
virDomainBlockJobInfo
virDomainGetBlockInfo
virDomainGetVcpus

We know from experience storage may get unresponsive/unreachable, so
QEMU monitor calls can hang, leading in turn to libvirt call to hang.

Vdsm does the monitoring using a thread pool. Should one of the worker
thread become unresponsive, it is replaced. To avoid to stall libvirt,
and to leak threads undefinitely, Vdsm has one additional protection
layer: it inspects libvirt state before to call which go down to QEMU,
using code like

def isDomainReadyForCommands(self):
try:
state, details, stateTime = self._dom.controlInfo()
except virdomain.NotConnectedError:
# this method may be called asynchronously by periodic
# operations. Thus, we must use a try/except block
# to avoid racy checks.
return False
except libvirt.libvirtError as e:
if e.get_error_code() == libvirt.VIR_ERR_NO_DOMAIN:
return False
else:
raise
else:
return state == libvirt.VIR_DOMAIN_CONTROL_OK


Vdsm actually issues the potentially hanging call if and only if the call above 
returns True (hence virDomainControlInfo() state is VIR_DOMAIN_CONTROL_OK)

When the NFS server is unreachable, the protection layer in Vdsm triggers, and 
Vdsm avoid to send libvirt calls. After a while, however we see 
virDomainGetControlInfo() calls not responding anymore, like
(full log attached)

2016-05-18 06:01:45.920+: 3069: debug : virThreadJobSet:96 : Thread 3069 
(virNetServerHandleJob) is now running job remoteDispatchDomainGetVcpus
2016-05-18 06:01:45.920+: 3069: info : virObjectNew:202 : OBJECT_NEW: 
obj=0x7f5a70004070 classname=virDomain
2016-05-18 06:01:45.920+: 3069: info : virObjectRef:296 : OBJECT_REF: 
obj=0x7f5a5c000ec0
2016-05-18 06:01:45.920+: 3069: debug : virDomainGetVcpus:7733 : 
dom=0x7f5a70004070, (VM: name=a1, uuid=048f8624-03fc-4729-8f4d-12cb4387f018), 
info=0x7f5a70002140, maxinfo=2, cpumaps=0x7f5a70002200, maplen=1
2016-05-18 06:01:45.920+: 3069: info : virObjectRef:296 : OBJECT_REF: 
obj=0x7f5a64009bf0
2016-05-18 06:01:45.920+: 3069: info : virObjectRef:296 : OBJECT_REF: 
obj=0x7f5a930f6f00
2016-05-18 06:01:45.920+: 3069: debug : virAccessManagerCheckDomain:234 : 
manager=0x7f5a930f6f00(name=stack) driver=QEMU domain=0x7f5a64012c40 perm=1
2016-05-18 06:01:45.920+: 3069: debug : virAccessManagerCheckDomain:234 : 
manager=0x7f5a930ebdf0(name=none) driver=QEMU domain=0x7f5a64012c40 perm=1
2016-05-18 06:01:45.920+: 3069: info : virObjectUnref:259 : OBJECT_UNREF: 
obj=0x7f5a930f6f00
2016-05-18 06:01:45.920+: 3069: debug : qemuGetProcessInfo:1486 : Got 
status for 3500/3505 user=1507 sys=209 cpu=1 rss=531128
2016-05-18 06:01:45.920+: 3069: debug : qem

[Bug 1746630] Re: virsh api is stuck when vm is down with NFS borken

2018-01-31 Thread Seyeong Kim
** Bug watch added: Red Hat Bugzilla #1337073
   https://bugzilla.redhat.com/show_bug.cgi?id=1337073

** Also affects: libvirt via
   https://bugzilla.redhat.com/show_bug.cgi?id=1337073
   Importance: Unknown
   Status: Unknown

** Also affects: cloud-archive
   Importance: Undecided
   Status: New

** Description changed:

  [Impact]
  
  virsh command is hang if there is broken VM on broken NFS
  
- This is affected to Xenial
+ This is affected to Xenial, UCA-Mitaka
  
  [Test Case]
  
  1. deploy VM with NFS storage ( running )
  2. block NFS via iptables
  - iptables -A OUTPUT -d NFS_SERVER_IP -p tcp --dport 2049 -j DROP ( on host 
machine )
  3. virsh blkdeviotune generic hda => hang
  4. virsh domstats => hang
  5. virsh list => lang
  
  [Regression]
  After patch, we can command domstats and list with short timeout. and 
libvirt-bin needs to be restarted. so if there are many VMs it will be affected 
short time while it is restarting.
  
  [Others]
  
  This bug is fixed in redhat bug report[1] and mailing list[2] and git
  commit[3][4][5]
  
  and it is merged 1.3.5 upstream
  
  
https://libvirt.org/git/?p=libvirt.git;a=blobdiff;f=docs/news.html.in;h=1ad8337f5f8443b5ac76450dc3370f95c51503fd;hp=d035f6833fb5eaaced8f5a7010872f3e61b6955b;hb=732bc70dcc3e2d1fe0baa640712efb99e273;hpb=d57e73d06fe5901ac4ab9c025b3531251292b509
  
- 
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1337073
  [2] https://www.redhat.com/archives/libvir-list/2016-May/msg01353.html
  [3] 
https://libvirt.org/git/?p=libvirt.git;a=commit;h=5d2b0e6f12b4e57d75ed1047ab1c36443b7a54b3
  [4] 
https://libvirt.org/git/?p=libvirt.git;a=commit;h=3aa5d51a9530a8737ca584b393c29297dd9bbc37
  [5] 
https://libvirt.org/git/?p=libvirt.git;a=commit;h=71d2c172edb997bae1e883b2e1bafa97d9f953a1

** Patch added: "lp1746630_mitaka.debdiff"
   
https://bugs.launchpad.net/cloud-archive/+bug/1746630/+attachment/5046659/+files/lp1746630_mitaka.debdiff

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1746630

Title:
  virsh api is stuck when vm is down with NFS borken

To manage notifications about this bug go to:
https://bugs.launchpad.net/cloud-archive/+bug/1746630/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1746630] Re: virsh api is stuck when vm is down with NFS borken

2018-01-31 Thread Seyeong Kim
** Description changed:

  [Impact]
  
  virsh command is hang if there is broken VM on broken NFS
  
  This is affected to Xenial
  
  [Test Case]
  
  1. deploy VM with NFS storage ( running )
  2. block NFS via iptables
  - iptables -A OUTPUT -d NFS_SERVER_IP -p tcp --dport 2049 -j DROP ( on host 
machine )
  3. virsh blkdeviotune generic hda => hang
  4. virsh domstats => hang
  5. virsh list => lang
  
  [Regression]
  After patch, we can command domstats and list with short timeout. and 
libvirt-bin needs to be restarted. so if there are many VMs it will be affected 
short time while it is restarting.
  
  [Others]
  
  This bug is fixed in redhat bug report[1] and mailing list[2] and git
  commit[3][4][5]
  
+ and it is merged 1.3.5 upstream
+ 
+ 
https://libvirt.org/git/?p=libvirt.git;a=blobdiff;f=docs/news.html.in;h=1ad8337f5f8443b5ac76450dc3370f95c51503fd;hp=d035f6833fb5eaaced8f5a7010872f3e61b6955b;hb=732bc70dcc3e2d1fe0baa640712efb99e273;hpb=d57e73d06fe5901ac4ab9c025b3531251292b509
+ 
+ 
  [1] https://bugzilla.redhat.com/show_bug.cgi?id=1337073
  [2] https://www.redhat.com/archives/libvir-list/2016-May/msg01353.html
  [3] 
https://libvirt.org/git/?p=libvirt.git;a=commit;h=5d2b0e6f12b4e57d75ed1047ab1c36443b7a54b3
  [4] 
https://libvirt.org/git/?p=libvirt.git;a=commit;h=3aa5d51a9530a8737ca584b393c29297dd9bbc37
  [5] 
https://libvirt.org/git/?p=libvirt.git;a=commit;h=71d2c172edb997bae1e883b2e1bafa97d9f953a1

** Patch added: "lp1746630_xenial.debdiff"
   
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1746630/+attachment/5046651/+files/lp1746630_xenial.debdiff

** Tags added: sts-sru-needed

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1746630

Title:
  virsh api is stuck when vm is down with NFS borken

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/1746630/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs