Re: QEMU/KVM snapshot restore bug

2020-02-17 Thread Cédric Le Goater
On 2/17/20 3:48 AM, dftxbs3e wrote:
> On 2/16/20 7:16 PM, Cédric Le Goater wrote:
>>
>> I think this is fixed by commit f55750e4e4fb ("spapr/xive: Mask the EAS when 
>> allocating an IRQ") which is not in QEMU 4.1.1. The same problem should also 
>> occur with LE guests. 
>>
>> Could you possibly regenerate the QEMU rpm with this patch ? 
>>
>> Thanks,
>>
>> C.
> 
> Hello!
> 
> I applied the patch and reinstalled the RPM then tried to restore the
> snapshot I created previously and it threw the same error.
> 
> Do I need to re-create the snapshot and/or restart the machine? 

yes. The problem is at the source.

> I have
> important workloads running so that'll be possible only in a few days if
> needed.
OK. 

Thanks,

C. 


Re: QEMU/KVM snapshot restore bug

2020-02-16 Thread dftxbs3e
On 2/16/20 7:16 PM, Cédric Le Goater wrote:
>
> I think this is fixed by commit f55750e4e4fb ("spapr/xive: Mask the EAS when 
> allocating an IRQ") which is not in QEMU 4.1.1. The same problem should also 
> occur with LE guests. 
>
> Could you possibly regenerate the QEMU rpm with this patch ? 
>
> Thanks,
>
> C.

Hello!

I applied the patch and reinstalled the RPM then tried to restore the
snapshot I created previously and it threw the same error.

Do I need to re-create the snapshot and/or restart the machine? I have
important workloads running so that'll be possible only in a few days if
needed.

Thanks




signature.asc
Description: OpenPGP digital signature


Re: QEMU/KVM snapshot restore bug

2020-02-16 Thread Cédric Le Goater
On 2/11/20 4:57 AM, dftxbs3e wrote:
> Hello,
> 
> I took a snapshot of a ppc64 (big endian) VM from a ppc64 (little endian) 
> host using `virsh snapshot-create-as --domain  --name `
>
> Then I restarted my system and tried restoring the snapshot:
> 
> # virsh snapshot-revert --domain  --snapshotname 
> error: internal error: process exited while connecting to monitor: 
> 2020-02-11T03:18:08.110582Z qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: 
> Group 3 attr 0x1309: Device or resource busy
> 2020-02-11T03:18:08.110605Z qemu-system-ppc64: error while loading state for 
> instance 0x0 of device 'spapr'
> 2020-02-11T03:18:08.112843Z qemu-system-ppc64: Error -1 while loading VM state
> 
> And dmesg shows each time the restore command is executed:
> 
> [  180.176606] WARNING: CPU: 16 PID: 5528 at 
> arch/powerpc/kvm/book3s_xive.c:345 xive_try_pick_queue+0x40/0xb8 [kvm]
> [  180.176608] Modules linked in: vhost_net vhost tap kvm_hv kvm xt_CHECKSUM 
> xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge 8021q garp mrp stp llc 
> rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT 
> nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack 
> ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw 
> ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw 
> iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink 
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc 
> raid1 at24 regmap_i2c snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg 
> joydev snd_hda_codec snd_hda_core ofpart snd_hwdep crct10dif_vpmsum snd_seq 
> ipmi_powernv powernv_flash ipmi_devintf snd_seq_device mtd ipmi_msghandler 
> rtc_opal snd_pcm opal_prd i2c_opal snd_timer snd soundcore lz4 lz4_compress 
> zram ip_tables xfs libcrc32c dm_crypt amdgpu ast drm_vram_helper mfd_core 
> i2c_algo_bit gpu_sched drm_kms_helper mpt3sas
> [  180.176652]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
> vmx_crypto tg3 crc32c_vpmsum nvme raid_class scsi_transport_sas nvme_core 
> drm_panel_orientation_quirks i2c_core fuse
> [  180.176663] CPU: 16 PID: 5528 Comm: qemu-system-ppc Not tainted 
> 5.4.17-200.fc31.ppc64le #1
> [  180.176665] NIP:  c0080a883c80 LR: c0080a886db8 CTR: 
> c0080a88a9e0
> [  180.176667] REGS: c00767a17890 TRAP: 0700   Not tainted  
> (5.4.17-200.fc31.ppc64le)
> [  180.176668] MSR:  90029033   CR: 48224248 
>  XER: 2004
> [  180.176673] CFAR: c0080a886db4 IRQMASK: 0 
>GPR00: c0080a886db8 c00767a17b20 c0080a8aed00 
> c0002005468a4480 
>GPR04:    
> 0001 
>GPR08: c0002007142b2400 c0002007142b2400  
> c0080a8910f0 
>GPR12: c0080a88a488 c007fffed000  
>  
>GPR16: 000149524180 739bda78 739bda30 
> 025c 
>GPR20:  0003 c0002006f13a 
>  
>GPR24: 1359  c002f8c96c38 
> c002f8c8 
>GPR28:  c0002006f13a c0002006f13a4038 
> c00767a17be4 
> [  180.176688] NIP [c0080a883c80] xive_try_pick_queue+0x40/0xb8 [kvm]
> [  180.176693] LR [c0080a886db8] kvmppc_xive_select_target+0x100/0x210 
> [kvm]
> [  180.176694] Call Trace:
> [  180.176696] [c00767a17b20] [c00767a17b70] 0xc00767a17b70 
> (unreliable)
> [  180.176701] [c00767a17b70] [c0080a88b420] 
> kvmppc_xive_native_set_attr+0xf98/0x1760 [kvm]
> [  180.176705] [c00767a17cc0] [c0080a86392c] 
> kvm_device_ioctl+0xf4/0x180 [kvm]
> [  180.176710] [c00767a17d10] [c05380b0] do_vfs_ioctl+0xaa0/0xd90
> [  180.176712] [c00767a17dd0] [c0538464] sys_ioctl+0xc4/0x110
> [  180.176716] [c00767a17e20] [c000b9d0] system_call+0x5c/0x68
> [  180.176717] Instruction dump:
> [  180.176719] 794ad182 0b0a 2c29 41820080 89490010 2c0a 41820074 
> 78883664 
> [  180.176723] 7d094214 e9480070 7d470074 78e7d182 <0b07> 2c2a 
> 41820054 81480078 
> [  180.176727] ---[ end trace 056a6dd275e20684 ]---
> 
> Let me know if I can provide more information

I think this is fixed by commit f55750e4e4fb ("spapr/xive: Mask the EAS when 
allocating an IRQ") which is not in QEMU 4.1.1. The same problem should also 
occur with LE guests. 

Could you possibly regenerate the QEMU rpm with this patch ? 

Thanks,

C.


Re: QEMU/KVM snapshot restore bug

2020-02-12 Thread dftxbs3e
Hello,
> A big endian guest doing XIVE ?!? I'm pretty sure we didn't do much testing, 
> if
> any, on such a setup... What distro is used in the VM ?
A live Void Linux ISO ;
https://repo.voidlinux-ppc.org/live/current/void-live-ppc64-20190901.iso
> This indicates that QEMU failed to configure the source targeting
> for the HW interrupt 0x1309, which is an MSI interrupt used by
> a PCI device plugged in the default PHB. Especially, -EBUSY means
>
> -EBUSY:  No CPU available to serve interrupt
>
Okay.
> This warning means that we have vCPU without a configured event queue.
>
> Since kvmppc_xive_select_target() is trying all vCPUs before bailing out
> with -EBUSY, you might be seeing several WARNINGs (1 per vCPU) in dmesg,
> correct ?
>
> Anyway, this looks wrong since QEMU is supposed to have already configured
> the event queues at this point... Not sure what's happening here...
>
Indeed. VM core count + 1 such messages in dmesg.
> Yeah, QEMU command line, QEMU version, guest kernel version can help. Also,
> what kind of workload is running inside the guest ? Is this easy to reproduce 
> ?

/usr/bin/qemu-system-ppc64 -name guest=voidlinux-ppc64,debug-threads=on
-S -object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-13-voidlinux-ppc64/master-key.aes
-machine pseries-4.1,accel=kvm,usb=off,dump-guest-core=off -m 8192
-overcommit mem-lock=off -smp 8,sockets=8,cores=1,threads=1 -uuid
5dd7af48-f00d-43c1-86ed-df5e0f7b4f1c -no-user-config -nodefaults
-chardev socket,id=charmonitor,fd=41,server,nowait -mon
chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown
-boot strict=on -device qemu-xhci,p2=15,p3=15,id=usb,bus=pci.0,addr=0x2
-device virtio-scsi-pci,id=scsi0,bus=pci.0,addr=0x3 -device
virtio-serial-pci,id=virtio-serial0,bus=pci.0,addr=0x4 -drive
file=/var/lib/libvirt/images/voidlinux-ppc64.qcow2,format=qcow2,if=none,id=drive-virtio-disk0
-device
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x5,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
-drive
file=/home/jdoe/Downloads/void-live-ppc64-20190901.iso,format=raw,if=none,id=drive-scsi0-0-0-0,readonly=on
-device
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,device_id=drive-scsi0-0-0-0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0,bootindex=2
-netdev tap,fd=43,id=hostnet0,vhost=on,vhostfd=44 -device
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:ae:d7:62,bus=pci.0,addr=0x1
-chardev pty,id=charserial0 -device
spapr-vty,chardev=charserial0,id=serial0,reg=0x3000 -chardev
socket,id=charchannel0,fd=45,server,nowait -device
virtserialport,bus=virtio-serial0.0,nr=1,chardev=charchannel0,id=channel0,name=org.qemu.guest_agent.0
-device usb-tablet,id=input0,bus=usb.0,port=1 -device
usb-kbd,id=input1,bus=usb.0,port=2 -vnc 127.0.0.1:2 -device
VGA,id=video0,vgamem_mb=16,bus=pci.0,addr=0x8 -device
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x6 -object
rng-random,id=objrng0,filename=/dev/urandom -device
virtio-rng-pci,rng=objrng0,id=rng0,bus=pci.0,addr=0x7 -loadvm
guix-gentoo -sandbox
on,obsolete=deny,elevateprivileges=deny,spawn=deny,resourcecontrol=deny
-msg timestamp=on

I am using virt-manager, which is why the command line is so long.

And ;

$ qemu-system-ppc64 --version
QEMU emulator version 4.1.1 (qemu-4.1.1-1.fc31)
Copyright (c) 2003-2019 Fabrice Bellard and the QEMU Project developers

Workload at snapshot time, the VM was idle, I was compiling software
using a Gentoo ppc64 big endian chroot inside the Void Linux ppc64 big
endian headless live system.

And yes it is easy to reproduce, download that Void Linux ppc64 big
endian ISO, using virt-manager, create a ppc64 vm with a disk, set the
VM to 8192MB of RAM and 8 cores (less RAM and cores might work,
untested) and it should reproduce the issue. It seems that a 1 core,
512MB of RAM VM suffers from no issue with snapshotting.

Thanks!




signature.asc
Description: OpenPGP digital signature


Re: QEMU/KVM snapshot restore bug

2020-02-12 Thread Greg Kurz
On Tue, 11 Feb 2020 04:57:52 +0100
dftxbs3e  wrote:

> Hello,
> 
> I took a snapshot of a ppc64 (big endian) VM from a ppc64 (little endian) 
> host using `virsh snapshot-create-as --domain  --name `
> 

A big endian guest doing XIVE ?!? I'm pretty sure we didn't do much testing, if
any, on such a setup... What distro is used in the VM ?

> Then I restarted my system and tried restoring the snapshot:
> 
> # virsh snapshot-revert --domain  --snapshotname 
> error: internal error: process exited while connecting to monitor: 
> 2020-02-11T03:18:08.110582Z qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: 
> Group 3 attr 0x1309: Device or resource busy
> 2020-02-11T03:18:08.110605Z qemu-system-ppc64: error while loading state for 
> instance 0x0 of device 'spapr'
> 2020-02-11T03:18:08.112843Z qemu-system-ppc64: Error -1 while loading VM state
> 

This indicates that QEMU failed to configure the source targeting
for the HW interrupt 0x1309, which is an MSI interrupt used by
a PCI device plugged in the default PHB. Especially, -EBUSY means

-EBUSY:  No CPU available to serve interrupt

> And dmesg shows each time the restore command is executed:
> 
> [  180.176606] WARNING: CPU: 16 PID: 5528 at 
> arch/powerpc/kvm/book3s_xive.c:345 xive_try_pick_queue+0x40/0xb8 [kvm]

This warning means that we have vCPU without a configured event queue.

Since kvmppc_xive_select_target() is trying all vCPUs before bailing out
with -EBUSY, you might be seeing several WARNINGs (1 per vCPU) in dmesg,
correct ?

Anyway, this looks wrong since QEMU is supposed to have already configured
the event queues at this point... Not sure what's happening here...

> [  180.176608] Modules linked in: vhost_net vhost tap kvm_hv kvm xt_CHECKSUM 
> xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge 8021q garp mrp stp llc 
> rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT 
> nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack 
> ebtable_nat ebtable_broute ip6table_nat ip6table_mangle ip6table_raw 
> ip6table_security iptable_nat nf_nat iptable_mangle iptable_raw 
> iptable_security nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink 
> ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter sunrpc 
> raid1 at24 regmap_i2c snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg 
> joydev snd_hda_codec snd_hda_core ofpart snd_hwdep crct10dif_vpmsum snd_seq 
> ipmi_powernv powernv_flash ipmi_devintf snd_seq_device mtd ipmi_msghandler 
> rtc_opal snd_pcm opal_prd i2c_opal snd_timer snd soundcore lz4 lz4_compress 
> zram ip_tables xfs libcrc32c dm_crypt amdgpu ast drm_vram_helper mfd_core 
> i2c_algo_bit gpu_sched drm_kms_helper mpt3sas
> [  180.176652]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
> vmx_crypto tg3 crc32c_vpmsum nvme raid_class scsi_transport_sas nvme_core 
> drm_panel_orientation_quirks i2c_core fuse
> [  180.176663] CPU: 16 PID: 5528 Comm: qemu-system-ppc Not tainted 
> 5.4.17-200.fc31.ppc64le #1
> [  180.176665] NIP:  c0080a883c80 LR: c0080a886db8 CTR: 
> c0080a88a9e0
> [  180.176667] REGS: c00767a17890 TRAP: 0700   Not tainted  
> (5.4.17-200.fc31.ppc64le)
> [  180.176668] MSR:  90029033   CR: 48224248 
>  XER: 2004
> [  180.176673] CFAR: c0080a886db4 IRQMASK: 0 
>GPR00: c0080a886db8 c00767a17b20 c0080a8aed00 
> c0002005468a4480 
>GPR04:    
> 0001 
>GPR08: c0002007142b2400 c0002007142b2400  
> c0080a8910f0 
>GPR12: c0080a88a488 c007fffed000  
>  
>GPR16: 000149524180 739bda78 739bda30 
> 025c 
>GPR20:  0003 c0002006f13a 
>  
>GPR24: 1359  c002f8c96c38 
> c002f8c8 
>GPR28:  c0002006f13a c0002006f13a4038 
> c00767a17be4 
> [  180.176688] NIP [c0080a883c80] xive_try_pick_queue+0x40/0xb8 [kvm]
> [  180.176693] LR [c0080a886db8] kvmppc_xive_select_target+0x100/0x210 
> [kvm]
> [  180.176694] Call Trace:
> [  180.176696] [c00767a17b20] [c00767a17b70] 0xc00767a17b70 
> (unreliable)
> [  180.176701] [c00767a17b70] [c0080a88b420] 
> kvmppc_xive_native_set_attr+0xf98/0x1760 [kvm]
> [  180.176705] [c00767a17cc0] [c0080a86392c] 
> kvm_device_ioctl+0xf4/0x180 [kvm]
> [  180.176710] [c00767a17d10] [c05380b0] do_vfs_ioctl+0xaa0/0xd90
> [  180.176712] [c00767a17dd0] [c0538464] sys_ioctl+0xc4/0x110
> [  180.176716] [c00767a17e20] [c000b9d0] system_call+0x5c/0x68
> [  180.176717] Instruction dump:
> [  180.176719] 794ad182 0b0a 2c29 41820080 89490010 2c0a 41820074 
> 78883664 
> [  180.176723] 7d094214 e9480070 7d470074 78e7d182 <0b07> 2c2a 
> 41820054 

QEMU/KVM snapshot restore bug

2020-02-10 Thread dftxbs3e
Hello,

I took a snapshot of a ppc64 (big endian) VM from a ppc64 (little endian) host 
using `virsh snapshot-create-as --domain  --name `

Then I restarted my system and tried restoring the snapshot:

# virsh snapshot-revert --domain  --snapshotname 
error: internal error: process exited while connecting to monitor: 
2020-02-11T03:18:08.110582Z qemu-system-ppc64: KVM_SET_DEVICE_ATTR failed: 
Group 3 attr 0x1309: Device or resource busy
2020-02-11T03:18:08.110605Z qemu-system-ppc64: error while loading state for 
instance 0x0 of device 'spapr'
2020-02-11T03:18:08.112843Z qemu-system-ppc64: Error -1 while loading VM state

And dmesg shows each time the restore command is executed:

[  180.176606] WARNING: CPU: 16 PID: 5528 at arch/powerpc/kvm/book3s_xive.c:345 
xive_try_pick_queue+0x40/0xb8 [kvm]
[  180.176608] Modules linked in: vhost_net vhost tap kvm_hv kvm xt_CHECKSUM 
xt_MASQUERADE nf_nat_tftp nf_conntrack_tftp tun bridge 8021q garp mrp stp llc 
rfkill nf_conntrack_netbios_ns nf_conntrack_broadcast xt_CT ip6t_REJECT 
nf_reject_ipv6 ip6t_rpfilter ipt_REJECT nf_reject_ipv4 xt_conntrack ebtable_nat 
ebtable_broute ip6table_nat ip6table_mangle ip6table_raw ip6table_security 
iptable_nat nf_nat iptable_mangle iptable_raw iptable_security nf_conntrack 
nf_defrag_ipv6 nf_defrag_ipv4 ip_set nfnetlink ebtable_filter ebtables 
ip6table_filter ip6_tables iptable_filter sunrpc raid1 at24 regmap_i2c 
snd_hda_codec_hdmi snd_hda_intel snd_intel_dspcfg joydev snd_hda_codec 
snd_hda_core ofpart snd_hwdep crct10dif_vpmsum snd_seq ipmi_powernv 
powernv_flash ipmi_devintf snd_seq_device mtd ipmi_msghandler rtc_opal snd_pcm 
opal_prd i2c_opal snd_timer snd soundcore lz4 lz4_compress zram ip_tables xfs 
libcrc32c dm_crypt amdgpu ast drm_vram_helper mfd_core i2c_algo_bit gpu_sched 
drm_kms_helper mpt3sas
[  180.176652]  syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm 
vmx_crypto tg3 crc32c_vpmsum nvme raid_class scsi_transport_sas nvme_core 
drm_panel_orientation_quirks i2c_core fuse
[  180.176663] CPU: 16 PID: 5528 Comm: qemu-system-ppc Not tainted 
5.4.17-200.fc31.ppc64le #1
[  180.176665] NIP:  c0080a883c80 LR: c0080a886db8 CTR: c0080a88a9e0
[  180.176667] REGS: c00767a17890 TRAP: 0700   Not tainted  
(5.4.17-200.fc31.ppc64le)
[  180.176668] MSR:  90029033   CR: 48224248  
XER: 2004
[  180.176673] CFAR: c0080a886db4 IRQMASK: 0 
   GPR00: c0080a886db8 c00767a17b20 c0080a8aed00 
c0002005468a4480 
   GPR04:    
0001 
   GPR08: c0002007142b2400 c0002007142b2400  
c0080a8910f0 
   GPR12: c0080a88a488 c007fffed000  
 
   GPR16: 000149524180 739bda78 739bda30 
025c 
   GPR20:  0003 c0002006f13a 
 
   GPR24: 1359  c002f8c96c38 
c002f8c8 
   GPR28:  c0002006f13a c0002006f13a4038 
c00767a17be4 
[  180.176688] NIP [c0080a883c80] xive_try_pick_queue+0x40/0xb8 [kvm]
[  180.176693] LR [c0080a886db8] kvmppc_xive_select_target+0x100/0x210 [kvm]
[  180.176694] Call Trace:
[  180.176696] [c00767a17b20] [c00767a17b70] 0xc00767a17b70 
(unreliable)
[  180.176701] [c00767a17b70] [c0080a88b420] 
kvmppc_xive_native_set_attr+0xf98/0x1760 [kvm]
[  180.176705] [c00767a17cc0] [c0080a86392c] 
kvm_device_ioctl+0xf4/0x180 [kvm]
[  180.176710] [c00767a17d10] [c05380b0] do_vfs_ioctl+0xaa0/0xd90
[  180.176712] [c00767a17dd0] [c0538464] sys_ioctl+0xc4/0x110
[  180.176716] [c00767a17e20] [c000b9d0] system_call+0x5c/0x68
[  180.176717] Instruction dump:
[  180.176719] 794ad182 0b0a 2c29 41820080 89490010 2c0a 41820074 
78883664 
[  180.176723] 7d094214 e9480070 7d470074 78e7d182 <0b07> 2c2a 41820054 
81480078 
[  180.176727] ---[ end trace 056a6dd275e20684 ]---

Let me know if I can provide more information

Thanks



signature.asc
Description: OpenPGP digital signature