Re: [libvirt] udevadm settle can take too long
Osier Yang wrote: On 2012年04月24日 03:47, Guido Günther wrote: Hi, On Sun, Apr 22, 2012 at 02:41:54PM -0400, Jim Paris wrote: Hi, http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager times out on the initial connection to libvirt. I reassigned the bug back to libvirt. I still wonder what triggers this though for some users but not for others? Cheers, -- Guido The basic problem is that, while checking storage volumes, virt-manager causes libvirt to call udevadm settle. There's an interaction where libvirt's earlier use of network namespaces (to probe LXC features) had caused some uevents to be sent that get filtered out before they reach udev. This confuses udevadm settle a bit, and so it sits there waiting for a 2-3 minute built-in timeout before returning. Eventually libvirtd prints: 2012-04-22 18:22:18.678+: 30503: warning : virKeepAliveTimer:182 : No response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds and virt-manager prints: 2012-04-22 18:22:18.931+: 30647: warning : virKeepAliveSend:128 : Failed to send keepalive response to client 0x25004e0 and the connection gets dropped. One workaround could be to specify a shorter timeout when doing the settle. The patch appended below allows virt-manager to work, although the connection still has to wait for the 10 second timeout before it succeeds. I don't know what a better solution would be, though. It seems the udevadm behavior might not be considered a bug from the udev/kernel point of view: https://lkml.org/lkml/2012/4/22/60 I'm using Linux 3.2.14 with libvirt 0.9.11. You can trigger the udevadm issue using a program I posted at the Debian bug report link above. -jim From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001 From: Jim Parisj...@jtan.com Date: Sun, 22 Apr 2012 14:35:47 -0400 Subject: [PATCH] shorten udevadmin settle timeout Otherwise, udevadmin settle can take so long that connections from e.g. virt-manager will get closed. --- src/util/util.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 6e041d6..dfe458e 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED) void virFileWaitForDevices(void) { # ifdef UDEVADM -const char *const settleprog[] = { UDEVADM, settle, NULL }; +const char *const settleprog[] = { UDEVADM, settle, --timeout, 10, NULL }; Though I don't have a good idea to fix it either, I guess this change could cause lvremove to fail again for the udev race. See BZs: https://bugzilla.redhat.com/show_bug.cgi?id=702260 https://bugzilla.redhat.com/show_bug.cgi?id=570359 It seems that those bugs were caused by something like 1. open(lv, O_RDWR) 2. close(lv) 3. system(lvremove ...) where udev would fire off a command between 2 and 3 that caused 3 to fail. Adding udevadm settle as step 2.5 is a good way to wait for that command to finish, but: - it doesn't necessarily fix the issue; something could easily re-open the device between 2.5 and 3 and cause the same failure. - the race condition sounds like it was a short window, and sometimes the original sequence would still work even without the settle. That would suggest to me that a timeout of 10s is still plenty long. A few thoughts: - For lvremove: can we try a short timeout (3 seconds), then if the lvremove still fails, try again with the default udevadm timeout (120 seconds)? - Even in that case, we need to fix libvirtd to not kill the connection after 30 seconds when it's libvirtd's fault that the connection is blocked for so long anyway. - When connecting with virt-manager, is the udevadm settle really necessary? We're not calling lvremove. Thanks, -jim -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] udevadm settle can take too long
[ CC to Cole ] Osier Yang wrote: On 2012年04月24日 03:47, Guido Günther wrote: Hi, On Sun, Apr 22, 2012 at 02:41:54PM -0400, Jim Paris wrote: Hi, http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager times out on the initial connection to libvirt. I reassigned the bug back to libvirt. I still wonder what triggers this though for some users but not for others? Cheers, -- Guido The basic problem is that, while checking storage volumes, virt-manager causes libvirt to call udevadm settle. There's an interaction where libvirt's earlier use of network namespaces (to probe LXC features) had caused some uevents to be sent that get filtered out before they reach udev. This confuses udevadm settle a bit, and so it sits there waiting for a 2-3 minute built-in timeout before returning. Eventually libvirtd prints: 2012-04-22 18:22:18.678+: 30503: warning : virKeepAliveTimer:182 : No response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds and virt-manager prints: 2012-04-22 18:22:18.931+: 30647: warning : virKeepAliveSend:128 : Failed to send keepalive response to client 0x25004e0 and the connection gets dropped. One workaround could be to specify a shorter timeout when doing the settle. The patch appended below allows virt-manager to work, although the connection still has to wait for the 10 second timeout before it succeeds. I don't know what a better solution would be, though. It seems the udevadm behavior might not be considered a bug from the udev/kernel point of view: https://lkml.org/lkml/2012/4/22/60 I'm using Linux 3.2.14 with libvirt 0.9.11. You can trigger the udevadm issue using a program I posted at the Debian bug report link above. -jim From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001 From: Jim Parisj...@jtan.com Date: Sun, 22 Apr 2012 14:35:47 -0400 Subject: [PATCH] shorten udevadmin settle timeout Otherwise, udevadmin settle can take so long that connections from e.g. virt-manager will get closed. --- src/util/util.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 6e041d6..dfe458e 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED) void virFileWaitForDevices(void) { # ifdef UDEVADM -const char *const settleprog[] = { UDEVADM, settle, NULL }; +const char *const settleprog[] = { UDEVADM, settle, --timeout, 10, NULL }; Though I don't have a good idea to fix it either, I guess this change could cause lvremove to fail again for the udev race. See BZs: https://bugzilla.redhat.com/show_bug.cgi?id=702260 https://bugzilla.redhat.com/show_bug.cgi?id=570359 It seems that those bugs were caused by something like 1. open(lv, O_RDWR) 2. close(lv) 3. system(lvremove ...) where udev would fire off a command between 2 and 3 that caused 3 to fail. Adding udevadm settle as step 2.5 is a good way to wait for that command to finish, but: - it doesn't necessarily fix the issue; something could easily re-open the device between 2.5 and 3 and cause the same failure. Right. - the race condition sounds like it was a short window, and sometimes the original sequence would still work even without the settle. That would suggest to me that a timeout of 10s is still plenty long. A few thoughts: - For lvremove: can we try a short timeout (3 seconds), then if the lvremove still fails, try again with the default udevadm timeout (120 seconds)? - Even in that case, we need to fix libvirtd to not kill the connection after 30 seconds when it's libvirtd's fault that the connection is blocked for so long anyway. perhaps we need a timeout property for the client connection, but not hardcode to 30s. - When connecting with virt-manager, is the udevadm settle really necessary? We're not calling lvremove. virt-manager's hung should be caused by pool refresh, which uses udevadm settle to wait for the new devices show up. So it doesn't relates with lvremove. Except logical storage, storage type of disk, scsi, and mpath uses udevadm settle too. And node device driver. Generally the pool refresh will be involked when libvirtd starts, and surely another case is it's involked explicitly. :-) I.e. virt-manager can't be hung if it doesn't intent to refresh the pool. And thus I guess the situation will be much worse if pools of disk, logical, scsi, mpath exists all together. I'm wondering if virt-manager try to refresh the pools when it starts, or when user request to check storage explicitly, (e.g. clicking some button). It should be improved if it's the first case IMHO, (let the user get the connection, and refresh the pool when neccessary could be better). I'd agree with that introducing timeout argument for udevadm settle will be better, but hardcode a timeout in virFileWaitForDevices is not good, as we can see, it's used many
Re: [libvirt] udevadm settle can take too long
On 2012年04月24日 03:47, Guido Günther wrote: Hi, On Sun, Apr 22, 2012 at 02:41:54PM -0400, Jim Paris wrote: Hi, http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager times out on the initial connection to libvirt. I reassigned the bug back to libvirt. I still wonder what triggers this though for some users but not for others? Cheers, -- Guido The basic problem is that, while checking storage volumes, virt-manager causes libvirt to call udevadm settle. There's an interaction where libvirt's earlier use of network namespaces (to probe LXC features) had caused some uevents to be sent that get filtered out before they reach udev. This confuses udevadm settle a bit, and so it sits there waiting for a 2-3 minute built-in timeout before returning. Eventually libvirtd prints: 2012-04-22 18:22:18.678+: 30503: warning : virKeepAliveTimer:182 : No response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds and virt-manager prints: 2012-04-22 18:22:18.931+: 30647: warning : virKeepAliveSend:128 : Failed to send keepalive response to client 0x25004e0 and the connection gets dropped. One workaround could be to specify a shorter timeout when doing the settle. The patch appended below allows virt-manager to work, although the connection still has to wait for the 10 second timeout before it succeeds. I don't know what a better solution would be, though. It seems the udevadm behavior might not be considered a bug from the udev/kernel point of view: https://lkml.org/lkml/2012/4/22/60 I'm using Linux 3.2.14 with libvirt 0.9.11. You can trigger the udevadm issue using a program I posted at the Debian bug report link above. -jim From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001 From: Jim Parisj...@jtan.com Date: Sun, 22 Apr 2012 14:35:47 -0400 Subject: [PATCH] shorten udevadmin settle timeout Otherwise, udevadmin settle can take so long that connections from e.g. virt-manager will get closed. --- src/util/util.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 6e041d6..dfe458e 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED) void virFileWaitForDevices(void) { # ifdef UDEVADM -const char *const settleprog[] = { UDEVADM, settle, NULL }; +const char *const settleprog[] = { UDEVADM, settle, --timeout, 10, NULL }; Though I don't have a good idea to fix it either, I guess this change could cause lvremove to fail again for the udev race. See BZs: https://bugzilla.redhat.com/show_bug.cgi?id=702260 https://bugzilla.redhat.com/show_bug.cgi?id=570359 Regards, Osier -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] udevadm settle can take too long
Hi, On Sun, Apr 22, 2012 at 02:41:54PM -0400, Jim Paris wrote: Hi, http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager times out on the initial connection to libvirt. I reassigned the bug back to libvirt. I still wonder what triggers this though for some users but not for others? Cheers, -- Guido The basic problem is that, while checking storage volumes, virt-manager causes libvirt to call udevadm settle. There's an interaction where libvirt's earlier use of network namespaces (to probe LXC features) had caused some uevents to be sent that get filtered out before they reach udev. This confuses udevadm settle a bit, and so it sits there waiting for a 2-3 minute built-in timeout before returning. Eventually libvirtd prints: 2012-04-22 18:22:18.678+: 30503: warning : virKeepAliveTimer:182 : No response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds and virt-manager prints: 2012-04-22 18:22:18.931+: 30647: warning : virKeepAliveSend:128 : Failed to send keepalive response to client 0x25004e0 and the connection gets dropped. One workaround could be to specify a shorter timeout when doing the settle. The patch appended below allows virt-manager to work, although the connection still has to wait for the 10 second timeout before it succeeds. I don't know what a better solution would be, though. It seems the udevadm behavior might not be considered a bug from the udev/kernel point of view: https://lkml.org/lkml/2012/4/22/60 I'm using Linux 3.2.14 with libvirt 0.9.11. You can trigger the udevadm issue using a program I posted at the Debian bug report link above. -jim From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001 From: Jim Paris j...@jtan.com Date: Sun, 22 Apr 2012 14:35:47 -0400 Subject: [PATCH] shorten udevadmin settle timeout Otherwise, udevadmin settle can take so long that connections from e.g. virt-manager will get closed. --- src/util/util.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 6e041d6..dfe458e 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED) void virFileWaitForDevices(void) { # ifdef UDEVADM -const char *const settleprog[] = { UDEVADM, settle, NULL }; +const char *const settleprog[] = { UDEVADM, settle, --timeout, 10, NULL }; # else -const char *const settleprog[] = { UDEVSETTLE, NULL }; +const char *const settleprog[] = { UDEVSETTLE, --timeout, 10, NULL }; # endif int exitstatus; -- 1.7.7 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] udevadm settle can take too long
Guido Günther wrote: Hi, On Sun, Apr 22, 2012 at 02:41:54PM -0400, Jim Paris wrote: Hi, http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager times out on the initial connection to libvirt. I reassigned the bug back to libvirt. I still wonder what triggers this though for some users but not for others? Cheers, -- Guido On all of my machines, virt-manager hangs if udevadm settle hangs. You can use the program I posted at that bug report to trigger the udevadm problem (it can be undone by restarting udev). Libvirtd only triggers the udevadm problem at startup, through its use of network namespaces while probing lxc. If anything else generates uevents after that point, then the udevadm problem usually goes away. For example, any module loads, hardware events (ejecting a CD, closing a laptop lid, etc), or bringing up or down network interfaces (which libvirt would typically do by itself when starting a new domain). So most users might just avoid it through luck. But if you manually restart libvirtd right before trying virt-manager, you'll probably see it too. Thanks, -jim The basic problem is that, while checking storage volumes, virt-manager causes libvirt to call udevadm settle. There's an interaction where libvirt's earlier use of network namespaces (to probe LXC features) had caused some uevents to be sent that get filtered out before they reach udev. This confuses udevadm settle a bit, and so it sits there waiting for a 2-3 minute built-in timeout before returning. Eventually libvirtd prints: 2012-04-22 18:22:18.678+: 30503: warning : virKeepAliveTimer:182 : No response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds and virt-manager prints: 2012-04-22 18:22:18.931+: 30647: warning : virKeepAliveSend:128 : Failed to send keepalive response to client 0x25004e0 and the connection gets dropped. One workaround could be to specify a shorter timeout when doing the settle. The patch appended below allows virt-manager to work, although the connection still has to wait for the 10 second timeout before it succeeds. I don't know what a better solution would be, though. It seems the udevadm behavior might not be considered a bug from the udev/kernel point of view: https://lkml.org/lkml/2012/4/22/60 I'm using Linux 3.2.14 with libvirt 0.9.11. You can trigger the udevadm issue using a program I posted at the Debian bug report link above. -jim From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001 From: Jim Paris j...@jtan.com Date: Sun, 22 Apr 2012 14:35:47 -0400 Subject: [PATCH] shorten udevadmin settle timeout Otherwise, udevadmin settle can take so long that connections from e.g. virt-manager will get closed. --- src/util/util.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 6e041d6..dfe458e 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED) void virFileWaitForDevices(void) { # ifdef UDEVADM -const char *const settleprog[] = { UDEVADM, settle, NULL }; +const char *const settleprog[] = { UDEVADM, settle, --timeout, 10, NULL }; # else -const char *const settleprog[] = { UDEVSETTLE, NULL }; +const char *const settleprog[] = { UDEVSETTLE, --timeout, 10, NULL }; # endif int exitstatus; -- 1.7.7 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
[libvirt] udevadm settle can take too long
Hi, http://bugs.debian.org/663931 is a bug I'm hitting, where virt-manager times out on the initial connection to libvirt. The basic problem is that, while checking storage volumes, virt-manager causes libvirt to call udevadm settle. There's an interaction where libvirt's earlier use of network namespaces (to probe LXC features) had caused some uevents to be sent that get filtered out before they reach udev. This confuses udevadm settle a bit, and so it sits there waiting for a 2-3 minute built-in timeout before returning. Eventually libvirtd prints: 2012-04-22 18:22:18.678+: 30503: warning : virKeepAliveTimer:182 : No response from client 0x7feec4003630 after 5 keepalive messages in 30 seconds and virt-manager prints: 2012-04-22 18:22:18.931+: 30647: warning : virKeepAliveSend:128 : Failed to send keepalive response to client 0x25004e0 and the connection gets dropped. One workaround could be to specify a shorter timeout when doing the settle. The patch appended below allows virt-manager to work, although the connection still has to wait for the 10 second timeout before it succeeds. I don't know what a better solution would be, though. It seems the udevadm behavior might not be considered a bug from the udev/kernel point of view: https://lkml.org/lkml/2012/4/22/60 I'm using Linux 3.2.14 with libvirt 0.9.11. You can trigger the udevadm issue using a program I posted at the Debian bug report link above. -jim From 17e5b9ebab76acb0d711e8bc308023372fbc4180 Mon Sep 17 00:00:00 2001 From: Jim Paris j...@jtan.com Date: Sun, 22 Apr 2012 14:35:47 -0400 Subject: [PATCH] shorten udevadmin settle timeout Otherwise, udevadmin settle can take so long that connections from e.g. virt-manager will get closed. --- src/util/util.c |4 ++-- 1 files changed, 2 insertions(+), 2 deletions(-) diff --git a/src/util/util.c b/src/util/util.c index 6e041d6..dfe458e 100644 --- a/src/util/util.c +++ b/src/util/util.c @@ -2593,9 +2593,9 @@ virFileFindMountPoint(const char *type ATTRIBUTE_UNUSED) void virFileWaitForDevices(void) { # ifdef UDEVADM -const char *const settleprog[] = { UDEVADM, settle, NULL }; +const char *const settleprog[] = { UDEVADM, settle, --timeout, 10, NULL }; # else -const char *const settleprog[] = { UDEVSETTLE, NULL }; +const char *const settleprog[] = { UDEVSETTLE, --timeout, 10, NULL }; # endif int exitstatus; -- 1.7.7 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list