Re: [libvirt] [PATCH] vl: allow cont from panicked state
On 08/21/13 17:32, Paolo Bonzini wrote: To support 1.5, libvirt should simply be ready to react to unanticipated GUEST_PANICKED events. reboot-on-panic will simply be broken for 1.5 and Linux 3.10+ guests. :( I'm probably misunderstanding the discussion, but it might be possible to disable pvpanic even in 1.5 from the host side, with the following hack: -global pvpanic.ioport=0 In qemu, this will either configure a working pvpanic device on ioport 0, or the pvpanic device will be genuinely broken. At least it doesn't (obviously) break other stuff (in v1.5.2): (qemu) info mtree I/O - (prio 0, RW): io - (prio 0, RW): pvpanic -0007 (prio 0, RW): dma-chan (qemu) info qtree bus: main-system-bus dev: i440FX-pcihost, id bus: pci.0 dev: PIIX3, id bus: isa.0 dev: pvpanic, id ioport = 0 Either way, the etc/pvpanic-port fw_cfg file will contain 0, and SeaBIOS will interpret it as no pvpanic device. It will report the same to the Linux guest too (_STA will return 0 for QEMU0001; pvpanic_add() -- -ENODEV). Thus, no pvpanic notifier should be registered, and reboot-on-panic should be reachable in the guest. A horrible hack, certainly. Laszlo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 22/08/2013 10:38, Laszlo Ersek ha scritto: To support 1.5, libvirt should simply be ready to react to unanticipated GUEST_PANICKED events. reboot-on-panic will simply be broken for 1.5 and Linux 3.10+ guests. :( I'm probably misunderstanding the discussion, but it might be possible to disable pvpanic even in 1.5 from the host side, with the following hack: -global pvpanic.ioport=0 In qemu, this will either configure a working pvpanic device on ioport 0, or the pvpanic device will be genuinely broken. At least it doesn't (obviously) break other stuff (in v1.5.2): (qemu) info mtree I/O - (prio 0, RW): io - (prio 0, RW): pvpanic -0007 (prio 0, RW): dma-chan No, you're not misunderstanding the discussion. Depending on the priorities of the pvpanic and legacy-DMA regions, it would break DMA channel 0. Channel 0 is (was) used for DRAM refresh, so it should not have any visible effect. However, it may not be entirely disabling pvpanic, just making it mostly invisible. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On Thu, Aug 22, 2013 at 11:19:44AM +0200, Paolo Bonzini wrote: Il 22/08/2013 10:38, Laszlo Ersek ha scritto: To support 1.5, libvirt should simply be ready to react to unanticipated GUEST_PANICKED events. reboot-on-panic will simply be broken for 1.5 and Linux 3.10+ guests. :( I'm probably misunderstanding the discussion, but it might be possible to disable pvpanic even in 1.5 from the host side, with the following hack: -global pvpanic.ioport=0 In qemu, this will either configure a working pvpanic device on ioport 0, or the pvpanic device will be genuinely broken. At least it doesn't (obviously) break other stuff (in v1.5.2): (qemu) info mtree I/O - (prio 0, RW): io - (prio 0, RW): pvpanic -0007 (prio 0, RW): dma-chan No, you're not misunderstanding the discussion. Depending on the priorities of the pvpanic and legacy-DMA regions, it would break DMA channel 0. Channel 0 is (was) used for DRAM refresh, so it should not have any visible effect. However, it may not be entirely disabling pvpanic, just making it mostly invisible. Paolo Ugh. And now that Paolo pointed out that nothing terrible happens even when migrating from host with pvpanic enabled to host with pvpanic disabled, I'm inclined to think we should just disable pvpanic in 1.5.X. Thoughts? -- MST -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 22/08/2013 11:37, Michael S. Tsirkin ha scritto: No, you're not misunderstanding the discussion. Depending on the priorities of the pvpanic and legacy-DMA regions, it would break DMA channel 0. Channel 0 is (was) used for DRAM refresh, so it should not have any visible effect. However, it may not be entirely disabling pvpanic, just making it mostly invisible. Paolo Ugh. And now that Paolo pointed out that nothing terrible happens even when migrating from host with pvpanic enabled to host with pvpanic disabled, I'm inclined to think we should just disable pvpanic in 1.5.X. Thoughts? You'd obviously have no objection from me. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On 08/22/13 11:19, Paolo Bonzini wrote: Il 22/08/2013 10:38, Laszlo Ersek ha scritto: To support 1.5, libvirt should simply be ready to react to unanticipated GUEST_PANICKED events. reboot-on-panic will simply be broken for 1.5 and Linux 3.10+ guests. :( I'm probably misunderstanding the discussion, but it might be possible to disable pvpanic even in 1.5 from the host side, with the following hack: -global pvpanic.ioport=0 In qemu, this will either configure a working pvpanic device on ioport 0, or the pvpanic device will be genuinely broken. At least it doesn't (obviously) break other stuff (in v1.5.2): (qemu) info mtree I/O - (prio 0, RW): io - (prio 0, RW): pvpanic -0007 (prio 0, RW): dma-chan No, you're not misunderstanding the discussion. Depending on the priorities of the pvpanic and legacy-DMA regions, it would break DMA channel 0. academic I think before priority comes into the picture, the access size would matter first, no? (I think I'm recalling this from the 0xCF9 reset control register, which falls into the [0xCF8..0xCFA] range.) Unless ioport 0 is accessed with width 1 for dma-chan purposes, I think such an access would be unique to pvpanic, and always dispatched to pvpanic. Channel 0 is (was) used for DRAM refresh, so it should not have any visible effect. However, it may not be entirely disabling pvpanic, just making it mostly invisible. That's good enough for the guest to reach kexec :) /academic Laszlo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On 08/22/13 12:34, Laszlo Ersek wrote: (I think I'm recalling this from the 0xCF9 reset control register, which falls into the [0xCF8..0xCFA] range.) [0xCF8..0xCFB], sigh -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 22/08/2013 12:34, Laszlo Ersek ha scritto: academic Actually it's fine to clarify these things! Hence the longish digression below. I think before priority comes into the picture, the access size would matter first, no? (I think I'm recalling this from the 0xCF9 reset control register, which falls into the [0xCF8..0xCFA] range.) The base address is what matters. A 2- or 4-byte access to x will always go to the region that includes address x, even if there are other regions between x and respectively x+1 or x+3. So an access to 0xCF8 will go to the PCI address register, while an access to 0xCF9 will always go to the reset control register. This happens in address_space_translate_internal: diff = int128_sub(section-mr-size, int128_make64(addr)); For a write to 0xCF8, addr would be 0 (it is relative to the base of the MemoryRegion). section-size would be 1 because the next section starts at 0xCF9. However, section-mr-size would be 4 as specified e.g. in i440fx_pcihost_initfn: memory_region_init_io(s-conf_mem, obj, pci_host_conf_le_ops, s, pci-conf-idx, 4); Using section-size would be wrong---it would attempt a 1-byte write to 0xCF8, another 1-byte write to 0xCF9, and a 2-byte write to 0xCFA. section-mr-size instead does a single write to 0xCF8, the same as on real hardware. BTW, the behavior changed slightly in QEMU 1.6 for 8-byte accesses. All such accesses were split to two 4-byte accesses before; now the maximum size of a direct MMIO operation (the data bus size, effectively) is 64 bits, so a 64-bit write will always address a single MemoryRegion. For example, say you had the PCI address and data registers occupying two separate 4-byte MemoryRegions in 8 consecutive bytes of memory. In 1.5 you could write both of them with a single 64-bit write. In 1.6, this would only write four bytes to the first MemoryRegion. This matches hardware more closely, and is really unlikely to be a problem: a target with 32-bit data bus probably would not have 64-bit CPU registers to begin with. If it did, it would resemble the architecture of the 80386SX or 8088 processors. Unless ioport 0 is accessed with width 1 for dma-chan purposes, I think such an access would be unique to pvpanic, and always dispatched to pvpanic. It is: static const MemoryRegionOps channel_io_ops = { .read = read_chan, .write = write_chan, .endianness = DEVICE_NATIVE_ENDIAN, .impl = { .min_access_size = 1, .max_access_size = 1, }, }; Channel 0 is (was) used for DRAM refresh, so it should not have any visible effect. However, it may not be entirely disabling pvpanic, just making it mostly invisible. That's good enough for the guest to reach kexec :) Yes, I cannot deny that. :) Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 21/08/2013 14:42, Laszlo Ersek ha scritto: (*) Hm I think I understand why. main_loop_should_exit(), when a reset was requested *and* runstate_needs_reset() evaluated to true, used to set the runstate to PAUSED -- I guess temporarily. Yes, this is the code that does the PANICKED - PAUSED transition: if (runstate_needs_reset()) { runstate_set(RUN_STATE_PAUSED); } This is to move the system out of a runstate that needs_reset(), and make the subsequent cont work instead of hitting this: if (runstate_needs_reset()) { error_set(errp, QERR_RESET_REQUIRED); return; } Paolo Since PANICKED was included in runstate_needs_reset(), this generic code could request a transition from PANICKED to PAUSED (**). As PANICKED is being removed from runstate_needs_reset(), the PANICKED-PAUSED transition is not required any longer. (**) I don't know why the generic code moves to PAUSED temporarily (from INTERNAL_ERROR and SHUTDOWN), but I'll just accept that as status quo. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On Wed, Aug 21, 2013 at 02:01:17PM +0200, Paolo Bonzini wrote: After reporting the GUEST_PANICKED monitor event, QEMU stops the VM. The reason for this is that events are edge-triggered, and can be lost if management dies at the wrong time. Stopping a panicked VM lets management know of a panic even if it has crashed; management can learn about the panic when it restarts and queries running QEMU processes. The downside is of course that the VM will be paused while management is not running, but that is acceptable if it only happens with explicit -device pvpanic. Upon learning of a panic, management (if configured to do so) can pick a variety of behaviors: leave the VM paused, reset it, destroy it. In addition to all of these behaviors, it is possible dumping the VM core from the host. However, right now, the panicked state is irreversible, and can only be exited by resetting the machine. This means that any policy decision is entirely in the hands of the host. In particular there is no way to use the reboot on panic option together with pvpanic. This patch makes the panicked state reversible (and removes various workarounds that were there because of the state being irreversible). With this change, management has a wider set of possible policies: it can just log the crash and leave policy to the guest, it can leave the VM paused. In particular, the log the crash and continue is implemented simply by sending a cont as soon as management learns about the panic. Management could also implement the irreversible paused state itself. And again, all such actions can be coupled with dumping the VM core. Unfortunately we cannot change the behavior of 1.6.0. Thus, even if it uses -device pvpanic, management should check for cont failures. If cont fails, management can then log that the VM remained paused and urge the administrator to update QEMU. I suggest that this patch be included in an 1.6.1 release as soon as possible, and perhaps in the 1.5 branch too. Cc: qemu-sta...@nongnu.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com OK this does sound reasonable, but it looks like current behaviour was intentional, so I wonder why was it put in place. Any idea? --- gdbstub.c | 3 --- vl.c | 6 ++ 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 35ca7c2..747e67d 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -372,9 +372,6 @@ static inline void gdb_continue(GDBState *s) #ifdef CONFIG_USER_ONLY s-running_state = 1; #else -if (runstate_check(RUN_STATE_GUEST_PANICKED)) { -runstate_set(RUN_STATE_DEBUG); -} if (!runstate_needs_reset()) { vm_start(); } diff --git a/vl.c b/vl.c index 25b8f2f..818d99e 100644 --- a/vl.c +++ b/vl.c @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, +{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, { RUN_STATE_MAX, RUN_STATE_MAX }, }; @@ -685,8 +684,7 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { return runstate_check(RUN_STATE_INTERNAL_ERROR) || -runstate_check(RUN_STATE_SHUTDOWN) || -runstate_check(RUN_STATE_GUEST_PANICKED); +runstate_check(RUN_STATE_SHUTDOWN); } StatusInfo *qmp_query_status(Error **errp) -- 1.8.3.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On Wed, 21 Aug 2013 14:01:17 +0200 Paolo Bonzini pbonz...@redhat.com wrote: After reporting the GUEST_PANICKED monitor event, QEMU stops the VM. The reason for this is that events are edge-triggered, and can be lost if management dies at the wrong time. Stopping a panicked VM lets management know of a panic even if it has crashed; management can learn about the panic when it restarts and queries running QEMU processes. The downside is of course that the VM will be paused while management is not running, but that is acceptable if it only happens with explicit -device pvpanic. Upon learning of a panic, management (if configured to do so) can pick a variety of behaviors: leave the VM paused, reset it, destroy it. In addition to all of these behaviors, it is possible dumping the VM core from the host. However, right now, the panicked state is irreversible, and can only be exited by resetting the machine. This means that any policy decision is entirely in the hands of the host. In particular there is no way to use the reboot on panic option together with pvpanic. This patch makes the panicked state reversible (and removes various workarounds that were there because of the state being irreversible). With this change, management has a wider set of possible policies: it can just log the crash and leave policy to the guest, it can leave the VM paused. In particular, the log the crash and continue is implemented simply by sending a cont as soon as management learns about the panic. Management could also implement the irreversible paused state itself. And again, all such actions can be coupled with dumping the VM core. Unfortunately we cannot change the behavior of 1.6.0. Thus, even if it uses -device pvpanic, management should check for cont failures. If cont fails, management can then log that the VM remained paused and urge the administrator to update QEMU. I suggest that this patch be included in an 1.6.1 release as soon as possible, and perhaps in the 1.5 branch too. Looks good to me: Reviewed-by: Luiz Capitulino lcapitul...@redhat.com Cc: qemu-sta...@nongnu.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- gdbstub.c | 3 --- vl.c | 6 ++ 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 35ca7c2..747e67d 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -372,9 +372,6 @@ static inline void gdb_continue(GDBState *s) #ifdef CONFIG_USER_ONLY s-running_state = 1; #else -if (runstate_check(RUN_STATE_GUEST_PANICKED)) { -runstate_set(RUN_STATE_DEBUG); -} if (!runstate_needs_reset()) { vm_start(); } diff --git a/vl.c b/vl.c index 25b8f2f..818d99e 100644 --- a/vl.c +++ b/vl.c @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, +{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, { RUN_STATE_MAX, RUN_STATE_MAX }, }; @@ -685,8 +684,7 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { return runstate_check(RUN_STATE_INTERNAL_ERROR) || -runstate_check(RUN_STATE_SHUTDOWN) || -runstate_check(RUN_STATE_GUEST_PANICKED); +runstate_check(RUN_STATE_SHUTDOWN); } StatusInfo *qmp_query_status(Error **errp) -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On Wed, 21 Aug 2013 14:43:11 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 21/08/2013 14:42, Laszlo Ersek ha scritto: (*) Hm I think I understand why. main_loop_should_exit(), when a reset was requested *and* runstate_needs_reset() evaluated to true, used to set the runstate to PAUSED -- I guess temporarily. Yes, this is the code that does the PANICKED - PAUSED transition: if (runstate_needs_reset()) { runstate_set(RUN_STATE_PAUSED); } This is to move the system out of a runstate that needs_reset(), and make the subsequent cont work instead of hitting this: if (runstate_needs_reset()) { error_set(errp, QERR_RESET_REQUIRED); return; } Yes. For those states issuing 'cont' won't put the guest to run again, so you're required to reset the guest first. I think the same reasoning went behind the PANICKED state, and for most cases it's going to be disastrous to put the guest to run again, but I can understand that this is up user/mngt to decide this, not QEMU. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On Wed, Aug 21, 2013 at 10:17:49AM -0400, Luiz Capitulino wrote: On Wed, 21 Aug 2013 14:43:11 +0200 Paolo Bonzini pbonz...@redhat.com wrote: Il 21/08/2013 14:42, Laszlo Ersek ha scritto: (*) Hm I think I understand why. main_loop_should_exit(), when a reset was requested *and* runstate_needs_reset() evaluated to true, used to set the runstate to PAUSED -- I guess temporarily. Yes, this is the code that does the PANICKED - PAUSED transition: if (runstate_needs_reset()) { runstate_set(RUN_STATE_PAUSED); } This is to move the system out of a runstate that needs_reset(), and make the subsequent cont work instead of hitting this: if (runstate_needs_reset()) { error_set(errp, QERR_RESET_REQUIRED); return; } Yes. For those states issuing 'cont' won't put the guest to run again, so you're required to reset the guest first. I think the same reasoning went behind the PANICKED state, and for most cases it's going to be disastrous to put the guest to run again, Why will it? It will most likely just call halt a bit later. but I can understand that this is up user/mngt to decide this, not QEMU. I don't have a problem with this patch as such, so Acked-by: Michael S. Tsirkin m...@redhat.com though I'm still not really sure why do we want to block guest immediately on panic. Why not let it call halt a bit later? -- MST -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 21/08/2013 16:30, Michael S. Tsirkin ha scritto: I think the same reasoning went behind the PANICKED state, and for most cases it's going to be disastrous to put the guest to run again, Why will it? It will most likely just call halt a bit later. I agree. but I can understand that this is up user/mngt to decide this, not QEMU. I don't have a problem with this patch as such, so Acked-by: Michael S. Tsirkin m...@redhat.com though I'm still not really sure why do we want to block guest immediately on panic. Why not let it call halt a bit later? To make sure the panic is detected, and action taken, in the host even if management has crashed at the time. For example, even if you have reboot-on-panic active, management has time to take a core dump of the paused guest _before_ the reboot. Paolo -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 21/08/2013 15:30, Michael S. Tsirkin ha scritto: On Wed, Aug 21, 2013 at 02:01:17PM +0200, Paolo Bonzini wrote: After reporting the GUEST_PANICKED monitor event, QEMU stops the VM. The reason for this is that events are edge-triggered, and can be lost if management dies at the wrong time. Stopping a panicked VM lets management know of a panic even if it has crashed; management can learn about the panic when it restarts and queries running QEMU processes. The downside is of course that the VM will be paused while management is not running, but that is acceptable if it only happens with explicit -device pvpanic. Upon learning of a panic, management (if configured to do so) can pick a variety of behaviors: leave the VM paused, reset it, destroy it. In addition to all of these behaviors, it is possible dumping the VM core from the host. However, right now, the panicked state is irreversible, and can only be exited by resetting the machine. This means that any policy decision is entirely in the hands of the host. In particular there is no way to use the reboot on panic option together with pvpanic. This patch makes the panicked state reversible (and removes various workarounds that were there because of the state being irreversible). With this change, management has a wider set of possible policies: it can just log the crash and leave policy to the guest, it can leave the VM paused. In particular, the log the crash and continue is implemented simply by sending a cont as soon as management learns about the panic. Management could also implement the irreversible paused state itself. And again, all such actions can be coupled with dumping the VM core. Unfortunately we cannot change the behavior of 1.6.0. Thus, even if it uses -device pvpanic, management should check for cont failures. If cont fails, management can then log that the VM remained paused and urge the administrator to update QEMU. I suggest that this patch be included in an 1.6.1 release as soon as possible, and perhaps in the 1.5 branch too. Cc: qemu-sta...@nongnu.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com OK this does sound reasonable, but it looks like current behaviour was intentional Yes, it was intentional and it also sounded reasonable at the time. The gdbstub.c hack definitely should have raised a warning sign, though. I suspect it was done this way simply because Xen has a crashed state that behaves exactly like that, with policy determined exclusively by the host. And it has the same problems, in fact, even though I never heard anyone complain about it for some weird reason... With this patch, the same thing can be implemented at the libvirt level, and the GUEST_PANICKED runstate now matches the semantics of watchdogs too, so this solution is not only easier to use, but also more consistent with the rest of QEMU. Paolo , so I wonder why was it put in place. Any idea? --- gdbstub.c | 3 --- vl.c | 6 ++ 2 files changed, 2 insertions(+), 7 deletions(-) diff --git a/gdbstub.c b/gdbstub.c index 35ca7c2..747e67d 100644 --- a/gdbstub.c +++ b/gdbstub.c @@ -372,9 +372,6 @@ static inline void gdb_continue(GDBState *s) #ifdef CONFIG_USER_ONLY s-running_state = 1; #else -if (runstate_check(RUN_STATE_GUEST_PANICKED)) { -runstate_set(RUN_STATE_DEBUG); -} if (!runstate_needs_reset()) { vm_start(); } diff --git a/vl.c b/vl.c index 25b8f2f..818d99e 100644 --- a/vl.c +++ b/vl.c @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, +{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, { RUN_STATE_GUEST_PANICKED, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_DEBUG }, { RUN_STATE_MAX, RUN_STATE_MAX }, }; @@ -685,8 +684,7 @@ int runstate_is_running(void) bool runstate_needs_reset(void) { return runstate_check(RUN_STATE_INTERNAL_ERROR) || -runstate_check(RUN_STATE_SHUTDOWN) || -runstate_check(RUN_STATE_GUEST_PANICKED); +runstate_check(RUN_STATE_SHUTDOWN); } StatusInfo *qmp_query_status(Error **errp) -- 1.8.3.1 -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On Wed, Aug 21, 2013 at 04:37:56PM +0200, Paolo Bonzini wrote: Il 21/08/2013 16:30, Michael S. Tsirkin ha scritto: I think the same reasoning went behind the PANICKED state, and for most cases it's going to be disastrous to put the guest to run again, Why will it? It will most likely just call halt a bit later. I agree. but I can understand that this is up user/mngt to decide this, not QEMU. I don't have a problem with this patch as such, so Acked-by: Michael S. Tsirkin m...@redhat.com though I'm still not really sure why do we want to block guest immediately on panic. Why not let it call halt a bit later? To make sure the panic is detected, and action taken, in the host even if management has crashed at the time. I'm not sure I get the reference to management crashing - we just need to maintain panicked state to make sure info is not lost ... For example, even if you have reboot-on-panic active, management has time to take a core dump of the paused guest _before_ the reboot. Paolo but this sounds like a good reason to support synchronous panic events. -- MST -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 21/08/2013 16:58, Michael S. Tsirkin ha scritto: On Wed, Aug 21, 2013 at 04:37:56PM +0200, Paolo Bonzini wrote: Il 21/08/2013 16:30, Michael S. Tsirkin ha scritto: I think the same reasoning went behind the PANICKED state, and for most cases it's going to be disastrous to put the guest to run again, Why will it? It will most likely just call halt a bit later. I agree. but I can understand that this is up user/mngt to decide this, not QEMU. I don't have a problem with this patch as such, so Acked-by: Michael S. Tsirkin m...@redhat.com though I'm still not really sure why do we want to block guest immediately on panic. Why not let it call halt a bit later? To make sure the panic is detected, and action taken, in the host even if management has crashed at the time. I'm not sure I get the reference to management crashing - we just need to maintain panicked state to make sure info is not lost ... Yes, but that would be additional burden (a new info command to retrieve the exact time of panicking, or something like that). Since synchronous panic events are useful anyway, and crashed management is rare, it's simpler to go the synchronous way. Paolo For example, even if you have reboot-on-panic active, management has time to take a core dump of the paused guest _before_ the reboot. Paolo but this sounds like a good reason to support synchronous panic events. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On 08/21/2013 06:01 AM, Paolo Bonzini wrote: After reporting the GUEST_PANICKED monitor event, QEMU stops the VM. The reason for this is that events are edge-triggered, and can be lost if management dies at the wrong time. Stopping a panicked VM lets management know of a panic even if it has crashed; management can learn about the panic when it restarts and queries running QEMU processes. The downside is of course that the VM will be paused while management is not running, but that is acceptable if it only happens with explicit -device pvpanic. Agreed - the key point is that by having a command line option to opt in to panic handling, then libvirt can decide whether panics should pause or auto-resume based on its on_crash settings being mapped to appropriate command lines. Upon learning of a panic, management (if configured to do so) can pick a variety of behaviors: leave the VM paused, reset it, destroy it. In addition to all of these behaviors, it is possible dumping the VM core from the host. s/possible dumping/possible to dump/ and yes, libvirt wants to do just that, as one of its on_crash mappings, since it could do the same for Xen. However, right now, the panicked state is irreversible, and can only be exited by resetting the machine. This means that any policy decision is entirely in the hands of the host. In particular there is no way to use the reboot on panic option together with pvpanic. This patch makes the panicked state reversible (and removes various workarounds that were there because of the state being irreversible). With this change, management has a wider set of possible policies: it can just log the crash and leave policy to the guest, it can leave the VM paused. In particular, the log the crash and continue is implemented simply by sending a cont as soon as management learns about the panic. Management could also implement the irreversible paused state itself. And again, all such actions can be coupled with dumping the VM core. Yes, this makes sense. Unfortunately we cannot change the behavior of 1.6.0. Thus, even if it uses -device pvpanic, management should check for cont failures. If cont fails, management can then log that the VM remained paused and urge the administrator to update QEMU. Is that the best we can do? Is there any sort of QMP introspection that libvirt can do, where we can know UP FRONT what level of panic support is provided by the qemu binary and the machine type being run in that binary? I'm afraid we've created a complicated mess of what options work when, and I'm not looking forward to what it will take to encode all the correct workarounds into libvirt. Ideally, I'd like a one-shot question: is qemu new enough to sanely support reversible '-device pvpanic'? If so, honor on_crash settings, if not, reject attempts to use any on_crash setting other than the default that matches qemu 1.4 behavior - but I might be persuaded to also support qemu 1.5/1.6 behaviors if they are easy enough to detect and work with; there's also the question that the behavior is machine-type dependent (-M pc-1.5 behaves differently than -M pc-1.6). I suggest that this patch be included in an 1.6.1 release as soon as possible, and perhaps in the 1.5 branch too. Cc: qemu-sta...@nongnu.org Signed-off-by: Paolo Bonzini pbonz...@redhat.com --- gdbstub.c | 3 --- vl.c | 6 ++ 2 files changed, 2 insertions(+), 7 deletions(-) Reviewed-by: Eric Blake ebl...@redhat.com /me why oh why did we rush such a half-baked builtin design into qemu 1.5 again? +++ b/vl.c @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, +{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, Is 'cont' the only viable way to escape PANICKED, or is it also reasonable to support 'stop' as a way to transition from PANICKED to PAUSED? That is, management may want to make the state reversible but still leave the guest paused, so this patch may be incomplete. -- Eric Blake eblake redhat com+1-919-301-3266 Libvirt virtualization library http://libvirt.org signature.asc Description: OpenPGP digital signature -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
Il 21/08/2013 17:23, Eric Blake ha scritto: Upon learning of a panic, management (if configured to do so) can pick a variety of behaviors: leave the VM paused, reset it, destroy it. In addition to all of these behaviors, it is possible dumping the VM core from the host. s/possible dumping/possible to dump/ and yes, libvirt wants to do just that, as one of its on_crash mappings, since it could do the same for Xen. However, right now, the panicked state is irreversible, and can only be exited by resetting the machine. This means that any policy decision is entirely in the hands of the host. In particular there is no way to use the reboot on panic option together with pvpanic. This patch makes the panicked state reversible (and removes various workarounds that were there because of the state being irreversible). With this change, management has a wider set of possible policies: it can just log the crash and leave policy to the guest, it can leave the VM paused. In particular, the log the crash and continue is implemented simply by sending a cont as soon as management learns about the panic. Management could also implement the irreversible paused state itself. And again, all such actions can be coupled with dumping the VM core. Yes, this makes sense. Unfortunately we cannot change the behavior of 1.6.0. Thus, even if it uses -device pvpanic, management should check for cont failures. If cont fails, management can then log that the VM remained paused and urge the administrator to update QEMU. Is that the best we can do? Is there any sort of QMP introspection that libvirt can do, where we can know UP FRONT what level of panic support is provided by the qemu binary and the machine type being run in that binary? No, this is not possible unfortunately. The only possibility that comes to mind would be to rename the pvpanic device, e.g. to isa-pvpanic, and forget about -device pvpanic on 1.6.x. A hack, I know. To support 1.5, libvirt should simply be ready to react to unanticipated GUEST_PANICKED events. reboot-on-panic will simply be broken for 1.5 and Linux 3.10+ guests. :( +++ b/vl.c @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, +{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, Is 'cont' the only viable way to escape PANICKED, or is it also reasonable to support 'stop' as a way to transition from PANICKED to PAUSED? That is, management may want to make the state reversible but still leave the guest paused, so this patch may be incomplete. No, there is no way to move from PANICKED to PAUSED. Libvirt has its own statuses (PAUSED, CRASHED etc.) and substatuses. You don't really care about the QEMU state: both the PAUSED_PANICKED and CRASHED_PANICKED substatuses map to QEMU's GUEST_PANICKED state. Simply, libvirt will not allow a virsh resume for on_crashpreserve/on_crash, and will allow it for a hypothetical new on_crashpause/on_crash element. BTW, any chance coredump-destroy and coredump-restart can be preserved just for backwards compatibility, and a new coredump='yes/no' attribute introduced instead? Because coredump-pause and coredump-preserve would make just as much sense. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list
Re: [libvirt] [PATCH] vl: allow cont from panicked state
On Wed, Aug 21, 2013 at 05:32:27PM +0200, Paolo Bonzini wrote: Il 21/08/2013 17:23, Eric Blake ha scritto: Upon learning of a panic, management (if configured to do so) can pick a variety of behaviors: leave the VM paused, reset it, destroy it. In addition to all of these behaviors, it is possible dumping the VM core from the host. s/possible dumping/possible to dump/ and yes, libvirt wants to do just that, as one of its on_crash mappings, since it could do the same for Xen. However, right now, the panicked state is irreversible, and can only be exited by resetting the machine. This means that any policy decision is entirely in the hands of the host. In particular there is no way to use the reboot on panic option together with pvpanic. This patch makes the panicked state reversible (and removes various workarounds that were there because of the state being irreversible). With this change, management has a wider set of possible policies: it can just log the crash and leave policy to the guest, it can leave the VM paused. In particular, the log the crash and continue is implemented simply by sending a cont as soon as management learns about the panic. Management could also implement the irreversible paused state itself. And again, all such actions can be coupled with dumping the VM core. Yes, this makes sense. Unfortunately we cannot change the behavior of 1.6.0. Thus, even if it uses -device pvpanic, management should check for cont failures. If cont fails, management can then log that the VM remained paused and urge the administrator to update QEMU. Is that the best we can do? Is there any sort of QMP introspection that libvirt can do, where we can know UP FRONT what level of panic support is provided by the qemu binary and the machine type being run in that binary? No, this is not possible unfortunately. The only possibility that comes to mind would be to rename the pvpanic device, e.g. to isa-pvpanic, and forget about -device pvpanic on 1.6.x. A hack, I know. To support 1.5, libvirt should simply be ready to react to unanticipated GUEST_PANICKED events. reboot-on-panic will simply be broken for 1.5 and Linux 3.10+ guests. :( Let's just fix the bugs in 1.6.X. I don't think libvirt needs to work around all qemu bugs. For 1.5.X it might be possible to backport -device pvpanic there. We need to make sure cross-version migration works. +++ b/vl.c @@ -637,9 +637,8 @@ static const RunStateTransition runstate_transitions_def[] = { { RUN_STATE_WATCHDOG, RUN_STATE_RUNNING }, { RUN_STATE_WATCHDOG, RUN_STATE_FINISH_MIGRATE }, -{ RUN_STATE_GUEST_PANICKED, RUN_STATE_PAUSED }, +{ RUN_STATE_GUEST_PANICKED, RUN_STATE_RUNNING }, Is 'cont' the only viable way to escape PANICKED, or is it also reasonable to support 'stop' as a way to transition from PANICKED to PAUSED? That is, management may want to make the state reversible but still leave the guest paused, so this patch may be incomplete. No, there is no way to move from PANICKED to PAUSED. Libvirt has its own statuses (PAUSED, CRASHED etc.) and substatuses. You don't really care about the QEMU state: both the PAUSED_PANICKED and CRASHED_PANICKED substatuses map to QEMU's GUEST_PANICKED state. Simply, libvirt will not allow a virsh resume for on_crashpreserve/on_crash, and will allow it for a hypothetical new on_crashpause/on_crash element. BTW, any chance coredump-destroy and coredump-restart can be preserved just for backwards compatibility, and a new coredump='yes/no' attribute introduced instead? Because coredump-pause and coredump-preserve would make just as much sense. -- libvir-list mailing list libvir-list@redhat.com https://www.redhat.com/mailman/listinfo/libvir-list