Re: QEMU/KVM snapshot restore bug
On 2/17/20 3:48 AM, dftxbs3e wrote: > On 2/16/20 7:16 PM, Cédric Le Goater wrote: >> >> I think this is fixed by commit f55750e4e4fb ("spapr/xive: Mask the EAS when >> allocating an IRQ") which is not in QEMU 4.1.1. The same problem should also >> occur with LE guests. >> >> Could you possibly regenerate the QEMU rpm with this patch ? >> >> Thanks, >> >> C. > > Hello! > > I applied the patch and reinstalled the RPM then tried to restore the > snapshot I created previously and it threw the same error. > > Do I need to re-create the snapshot and/or restart the machine? yes. The problem is at the source. > I have > important workloads running so that'll be possible only in a few days if > needed. OK. Thanks, C.
Re: QEMU/KVM snapshot restore bug
On 2/16/20 7:16 PM, Cédric Le Goater wrote: > > I think this is fixed by commit f55750e4e4fb ("spapr/xive: Mask the EAS when > allocating an IRQ") which is not in QEMU 4.1.1. The same problem should also > occur with LE guests. > > Could you possibly regenerate the QEMU rpm with this patch ? > > Thanks, > > C. Hello! I applied the patch and reinstalled the RPM then tried to restore the snapshot I created previously and it threw the same error. Do I need to re-create the snapshot and/or restart the machine? I have important workloads running so that'll be possible only in a few days if needed. Thanks signature.asc Description: OpenPGP digital signature
Re: QEMU/KVM snapshot restore bug
On 2/11/20 4:57 AM, dftxbs3e wrote: > Hello, > > I took a snapshot of a ppc64 (big endian) VM from a ppc64 (little endian) > host using `virsh snapshot-create-as --domain --name ` > > Then I restarted my system and tried restoring the snapshot: > > # virsh snapshot-revert --domain --snapshotname > error: internal error: process exited while connecting to monitor: > 2020-02-11T03:18:08.110582Z qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: > Group 3 attr 0x1309: Device or resource busy > 2020-02-11T03:18:08.110605Z qemu-system-ppc64: error while loading state for > instance 0x0 of device 'spapr' > 2020-02-11T03:18:08.112843Z qemu-system-ppc64: Error -1 while loading VM state > > And dmesg shows each time the restore command is executed: > > [ 180.176606] WARNING: CPU: 16 PID: 5528 at > arch/powerpc/kvm/book3s_xive.c:345 xive_try_pick_queue+0x40/0xb8 [kvm] > [ 180.176608] Modules linked in: vhost_net vhost tap kvm_hv kvm xt_CHECKSUM > xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge 8021q garp mrp stp llc > rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT > nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack > ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw > ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw > iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink > ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc > raid1 at24 regmap_i2c snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > joydev snd_hda_codec snd_hda_core ofpart snd_hwdep crct10dif_vpmsum snd_seq > ipmi_powernv powernv_flash ipmi_devintf snd_seq_device mtd ipmi_msghandler > rtc_opal snd_pcm opal_prd i2c_opal snd_timer snd soundcore lz4 lz4_compress > zram ip_tables xfs libcrc32c dm_crypt amdgpu ast drm_vram_helper mfd_core > i2c_algo_bit gpu_sched drm_kms_helper mpt3sas > [ 180.176652] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm > vmx_crypto tg3 crc32c_vpmsum nvme raid_class scsi_transport_sas nvme_core > drm_panel_orientation_quirks i2c_core fuse > [ 180.176663] CPU: 16 PID: 5528 Comm: qemu-system-ppc Not tainted > 5.4.17-200.fc31.ppc64le #1 > [ 180.176665] NIP: c0080a883c80 LR: c0080a886db8 CTR: > c0080a88a9e0 > [ 180.176667] REGS: c00767a17890 TRAP: 0700 Not tainted > (5.4.17-200.fc31.ppc64le) > [ 180.176668] MSR: 90029033 CR: 48224248 > XER: 2004 > [ 180.176673] CFAR: c0080a886db4 IRQMASK: 0 >GPR00: c0080a886db8 c00767a17b20 c0080a8aed00 > c0002005468a4480 >GPR04: > 0001 >GPR08: c0002007142b2400 c0002007142b2400 > c0080a8910f0 >GPR12: c0080a88a488 c007fffed000 > >GPR16: 000149524180 739bda78 739bda30 > 025c >GPR20: 0003 c0002006f13a > >GPR24: 1359 c002f8c96c38 > c002f8c8 >GPR28: c0002006f13a c0002006f13a4038 > c00767a17be4 > [ 180.176688] NIP [c0080a883c80] xive_try_pick_queue+0x40/0xb8 [kvm] > [ 180.176693] LR [c0080a886db8] kvmppc_xive_select_target+0x100/0x210 > [kvm] > [ 180.176694] Call Trace: > [ 180.176696] [c00767a17b20] [c00767a17b70] 0xc00767a17b70 > (unreliable) > [ 180.176701] [c00767a17b70] [c0080a88b420] > kvmppc_xive_native_set_attr+0xf98/0x1760 [kvm] > [ 180.176705] [c00767a17cc0] [c0080a86392c] > kvm_device_ioctl+0xf4/0x180 [kvm] > [ 180.176710] [c00767a17d10] [c05380b0] do_vfs_ioctl+0xaa0/0xd90 > [ 180.176712] [c00767a17dd0] [c0538464] sys_ioctl+0xc4/0x110 > [ 180.176716] [c00767a17e20] [c000b9d0] system_call+0x5c/0x68 > [ 180.176717] Instruction dump: > [ 180.176719] 794ad182 0b0a 2c29 41820080 89490010 2c0a 41820074 > 78883664 > [ 180.176723] 7d094214 e9480070 7d470074 78e7d182 <0b07> 2c2a > 41820054 81480078 > [ 180.176727] ---[ end trace 056a6dd275e20684 ]--- > > Let me know if I can provide more information I think this is fixed by commit f55750e4e4fb ("spapr/xive: Mask the EAS when allocating an IRQ") which is not in QEMU 4.1.1. The same problem should also occur with LE guests. Could you possibly regenerate the QEMU rpm with this patch ? Thanks, C.
Re: QEMU/KVM snapshot restore bug
Hello, > A big endian guest doing XIVE ?!? I'm pretty sure we didn't do much testing, > if > any, on such a setup... What distro is used in the VM ? A live Void Linux ISO ; https://repo.voidlinux-ppc.org/live/current/void-live-ppc64-20190901.iso > This indicates that QEMU failed to configure the source targeting > for the HW interrupt 0x1309, which is an MSI interrupt used by > a PCI device plugged in the default PHB. Especially, -EBUSY means > > -EBUSY: No CPU available to serve interrupt > Okay. > This warning means that we have vCPU without a configured event queue. > > Since kvmppc_xive_select_target() is trying all vCPUs before bailing out > with -EBUSY, you might be seeing several WARNINGs (1 per vCPU) in dmesg, > correct ? > > Anyway, this looks wrong since QEMU is supposed to have already configured > the event queues at this point... Not sure what's happening here... > Indeed. VM core count + 1 such messages in dmesg. > Yeah, QEMU command line, QEMU version, guest kernel version can help. Also, > what kind of workload is running inside the guest ? Is this easy to reproduce > ? /usr/bin/qemu-system-ppc64 -name guest=voidlinux-ppc64,debug-threads=on -S -object secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-voidlinux-ppc64/master-key.aes -machine pseries-4.1,accel=kvm,usb=off,dump-guest-core=off -m 8192 -overcommit mem-lock=off -smp 8,sockets=8,cores=1,threads=1 -uuid 5dd7af48-f00d-43c1-86ed-df5e0f7b4f1c -no-user-config -nodefaults -chardev socket,id=charmonitor,fd=41,server,nowait -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown -boot strict=on -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.0,addr=0x2 -device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive file=/var/lib/libvirt/images/voidlinux-ppc64.qcow2,format=qcow2,if=none,id=drive-virtio-disk0 -device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -drive file=/home/jdoe/Downloads/void-live-ppc64-20190901.iso,format=raw,if=none,id=drive-scsi0-0-0-0,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2 -netdev tap,fd=43,id=hostnet0,vhost=on,vhostfd=44 -device virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ae:d7:62,bus=pci.0,addr=0x1 -chardev pty,id=charserial0 -device spapr-vty,chardev=charserial0,id=serial0,reg=0x3000 -chardev socket,id=charchannel0,fd=45,server,nowait -device virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0 -device usb-tablet,id=input0,bus=usb.0,port=1 -device usb-kbd,id=input1,bus=usb.0,port=2 -vnc 127.0.0.1:2 -device VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x8 -device virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object rng-random,id=objrng0,filename=/dev/urandom -device virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -loadvm guix-gentoo -sandbox on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny -msg timestamp=on I am using virt-manager, which is why the command line is so long. And ; $ qemu-system-ppc64 --version QEMU emulator version 4.1.1 (qemu-4.1.1-1.fc31) Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers Workload at snapshot time, the VM was idle, I was compiling software using a Gentoo ppc64 big endian chroot inside the Void Linux ppc64 big endian headless live system. And yes it is easy to reproduce, download that Void Linux ppc64 big endian ISO, using virt-manager, create a ppc64 vm with a disk, set the VM to 8192MB of RAM and 8 cores (less RAM and cores might work, untested) and it should reproduce the issue. It seems that a 1 core, 512MB of RAM VM suffers from no issue with snapshotting. Thanks! signature.asc Description: OpenPGP digital signature
Re: QEMU/KVM snapshot restore bug
On Tue, 11 Feb 2020 04:57:52 +0100 dftxbs3e wrote: > Hello, > > I took a snapshot of a ppc64 (big endian) VM from a ppc64 (little endian) > host using `virsh snapshot-create-as --domain --name ` > A big endian guest doing XIVE ?!? I'm pretty sure we didn't do much testing, if any, on such a setup... What distro is used in the VM ? > Then I restarted my system and tried restoring the snapshot: > > # virsh snapshot-revert --domain --snapshotname > error: internal error: process exited while connecting to monitor: > 2020-02-11T03:18:08.110582Z qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: > Group 3 attr 0x1309: Device or resource busy > 2020-02-11T03:18:08.110605Z qemu-system-ppc64: error while loading state for > instance 0x0 of device 'spapr' > 2020-02-11T03:18:08.112843Z qemu-system-ppc64: Error -1 while loading VM state > This indicates that QEMU failed to configure the source targeting for the HW interrupt 0x1309, which is an MSI interrupt used by a PCI device plugged in the default PHB. Especially, -EBUSY means -EBUSY: No CPU available to serve interrupt > And dmesg shows each time the restore command is executed: > > [ 180.176606] WARNING: CPU: 16 PID: 5528 at > arch/powerpc/kvm/book3s_xive.c:345 xive_try_pick_queue+0x40/0xb8 [kvm] This warning means that we have vCPU without a configured event queue. Since kvmppc_xive_select_target() is trying all vCPUs before bailing out with -EBUSY, you might be seeing several WARNINGs (1 per vCPU) in dmesg, correct ? Anyway, this looks wrong since QEMU is supposed to have already configured the event queues at this point... Not sure what's happening here... > [ 180.176608] Modules linked in: vhost_net vhost tap kvm_hv kvm xt_CHECKSUM > xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge 8021q garp mrp stp llc > rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT > nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack > ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw > ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw > iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink > ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc > raid1 at24 regmap_i2c snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg > joydev snd_hda_codec snd_hda_core ofpart snd_hwdep crct10dif_vpmsum snd_seq > ipmi_powernv powernv_flash ipmi_devintf snd_seq_device mtd ipmi_msghandler > rtc_opal snd_pcm opal_prd i2c_opal snd_timer snd soundcore lz4 lz4_compress > zram ip_tables xfs libcrc32c dm_crypt amdgpu ast drm_vram_helper mfd_core > i2c_algo_bit gpu_sched drm_kms_helper mpt3sas > [ 180.176652] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm > vmx_crypto tg3 crc32c_vpmsum nvme raid_class scsi_transport_sas nvme_core > drm_panel_orientation_quirks i2c_core fuse > [ 180.176663] CPU: 16 PID: 5528 Comm: qemu-system-ppc Not tainted > 5.4.17-200.fc31.ppc64le #1 > [ 180.176665] NIP: c0080a883c80 LR: c0080a886db8 CTR: > c0080a88a9e0 > [ 180.176667] REGS: c00767a17890 TRAP: 0700 Not tainted > (5.4.17-200.fc31.ppc64le) > [ 180.176668] MSR: 90029033 CR: 48224248 > XER: 2004 > [ 180.176673] CFAR: c0080a886db4 IRQMASK: 0 >GPR00: c0080a886db8 c00767a17b20 c0080a8aed00 > c0002005468a4480 >GPR04: > 0001 >GPR08: c0002007142b2400 c0002007142b2400 > c0080a8910f0 >GPR12: c0080a88a488 c007fffed000 > >GPR16: 000149524180 739bda78 739bda30 > 025c >GPR20: 0003 c0002006f13a > >GPR24: 1359 c002f8c96c38 > c002f8c8 >GPR28: c0002006f13a c0002006f13a4038 > c00767a17be4 > [ 180.176688] NIP [c0080a883c80] xive_try_pick_queue+0x40/0xb8 [kvm] > [ 180.176693] LR [c0080a886db8] kvmppc_xive_select_target+0x100/0x210 > [kvm] > [ 180.176694] Call Trace: > [ 180.176696] [c00767a17b20] [c00767a17b70] 0xc00767a17b70 > (unreliable) > [ 180.176701] [c00767a17b70] [c0080a88b420] > kvmppc_xive_native_set_attr+0xf98/0x1760 [kvm] > [ 180.176705] [c00767a17cc0] [c0080a86392c] > kvm_device_ioctl+0xf4/0x180 [kvm] > [ 180.176710] [c00767a17d10] [c05380b0] do_vfs_ioctl+0xaa0/0xd90 > [ 180.176712] [c00767a17dd0] [c0538464] sys_ioctl+0xc4/0x110 > [ 180.176716] [c00767a17e20] [c000b9d0] system_call+0x5c/0x68 > [ 180.176717] Instruction dump: > [ 180.176719] 794ad182 0b0a 2c29 41820080 89490010 2c0a 41820074 > 78883664 > [ 180.176723] 7d094214 e9480070 7d470074 78e7d182 <0b07> 2c2a > 41820054
QEMU/KVM snapshot restore bug
Hello, I took a snapshot of a ppc64 (big endian) VM from a ppc64 (little endian) host using `virsh snapshot-create-as --domain --name ` Then I restarted my system and tried restoring the snapshot: # virsh snapshot-revert --domain --snapshotname error: internal error: process exited while connecting to monitor: 2020-02-11T03:18:08.110582Z qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: Group 3 attr 0x1309: Device or resource busy 2020-02-11T03:18:08.110605Z qemu-system-ppc64: error while loading state for instance 0x0 of device 'spapr' 2020-02-11T03:18:08.112843Z qemu-system-ppc64: Error -1 while loading VM state And dmesg shows each time the restore command is executed: [ 180.176606] WARNING: CPU: 16 PID: 5528 at arch/powerpc/kvm/book3s_xive.c:345 xive_try_pick_queue+0x40/0xb8 [kvm] [ 180.176608] Modules linked in: vhost_net vhost tap kvm_hv kvm xt_CHECKSUM xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge 8021q garp mrp stp llc rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc raid1 at24 regmap_i2c snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg joydev snd_hda_codec snd_hda_core ofpart snd_hwdep crct10dif_vpmsum snd_seq ipmi_powernv powernv_flash ipmi_devintf snd_seq_device mtd ipmi_msghandler rtc_opal snd_pcm opal_prd i2c_opal snd_timer snd soundcore lz4 lz4_compress zram ip_tables xfs libcrc32c dm_crypt amdgpu ast drm_vram_helper mfd_core i2c_algo_bit gpu_sched drm_kms_helper mpt3sas [ 180.176652] syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm vmx_crypto tg3 crc32c_vpmsum nvme raid_class scsi_transport_sas nvme_core drm_panel_orientation_quirks i2c_core fuse [ 180.176663] CPU: 16 PID: 5528 Comm: qemu-system-ppc Not tainted 5.4.17-200.fc31.ppc64le #1 [ 180.176665] NIP: c0080a883c80 LR: c0080a886db8 CTR: c0080a88a9e0 [ 180.176667] REGS: c00767a17890 TRAP: 0700 Not tainted (5.4.17-200.fc31.ppc64le) [ 180.176668] MSR: 90029033 CR: 48224248 XER: 2004 [ 180.176673] CFAR: c0080a886db4 IRQMASK: 0 GPR00: c0080a886db8 c00767a17b20 c0080a8aed00 c0002005468a4480 GPR04: 0001 GPR08: c0002007142b2400 c0002007142b2400 c0080a8910f0 GPR12: c0080a88a488 c007fffed000 GPR16: 000149524180 739bda78 739bda30 025c GPR20: 0003 c0002006f13a GPR24: 1359 c002f8c96c38 c002f8c8 GPR28: c0002006f13a c0002006f13a4038 c00767a17be4 [ 180.176688] NIP [c0080a883c80] xive_try_pick_queue+0x40/0xb8 [kvm] [ 180.176693] LR [c0080a886db8] kvmppc_xive_select_target+0x100/0x210 [kvm] [ 180.176694] Call Trace: [ 180.176696] [c00767a17b20] [c00767a17b70] 0xc00767a17b70 (unreliable) [ 180.176701] [c00767a17b70] [c0080a88b420] kvmppc_xive_native_set_attr+0xf98/0x1760 [kvm] [ 180.176705] [c00767a17cc0] [c0080a86392c] kvm_device_ioctl+0xf4/0x180 [kvm] [ 180.176710] [c00767a17d10] [c05380b0] do_vfs_ioctl+0xaa0/0xd90 [ 180.176712] [c00767a17dd0] [c0538464] sys_ioctl+0xc4/0x110 [ 180.176716] [c00767a17e20] [c000b9d0] system_call+0x5c/0x68 [ 180.176717] Instruction dump: [ 180.176719] 794ad182 0b0a 2c29 41820080 89490010 2c0a 41820074 78883664 [ 180.176723] 7d094214 e9480070 7d470074 78e7d182 <0b07> 2c2a 41820054 81480078 [ 180.176727] ---[ end trace 056a6dd275e20684 ]--- Let me know if I can provide more information Thanks signature.asc Description: OpenPGP digital signature