Public bug reported:

---Problem Description---
I am trying to do hotplug attach with Mellanox CX3 card to a guest but I get 
failure.
virsh attach-device powerio-le12-ubuntu-17.04 ./add_cx3.xml
error: Failed to attach device from ./add_cx3.xml
error: internal error: unable to execute QEMU command 'device_add': vfio error: 
0044:01:00.0: failed to setup container for group 6: RAM memory listener 
initialization failed for container


from the log file from qemu I see this:
2017-02-14T22:55:40.721108Z qemu-system-ppc64: backend does not support BE vnet 
headers; falling back on use rspace virtio

This is with kernel 4.9.0-15-generic and qemu level:
dpkg --list| grep qemu
ii  ipxe-qemu                                     
1.0.0+git-20150424.a25a16d-1ubuntu2      all          PXE boot firmware - ROM 
images for qemu
ii  qemu                                          1:2.8+dfsg-2ubuntu1           
           ppc64el      fast processor emulator
ii  qemu-block-extra:ppc64el                      1:2.8+dfsg-2ubuntu1           
           ppc64el      extra block backend modules for qemu-system and 
qemu-utils
ii  qemu-kvm                                      1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU Full virtualization
ii  qemu-slof                                     20161019+dfsg-1               
           all          Slimline Open Firmware -- QEMU PowerPC version
ii  qemu-system                                   1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries
ii  qemu-system-arm                               1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries (arm)
ii  qemu-system-common                            1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries (common files)
ii  qemu-system-mips                              1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries (mips)
ii  qemu-system-misc                              1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries (miscellaneous)
ii  qemu-system-ppc                               1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries (ppc)
ii  qemu-system-sparc                             1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries (sparc)
ii  qemu-system-x86                               1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU full system emulation binaries (x86)
ii  qemu-user                                     1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU user mode emulation binaries
ii  qemu-user-binfmt                              1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU user mode binfmt registration for qemu-user
ii  qemu-utils                                    1:2.8+dfsg-2ubuntu1           
           ppc64el      QEMU utilities

 
---uname output---
4.9.0-15-generic #16-Ubuntu SMP Fri Jan 20 15:28:49 UTC 2017 ppc64le ppc64le 
ppc64le GNU/Linux
 
Machine Type = P8 
 
---Steps to Reproduce---
 bring up a guest and then try to attach device like this:
 virsh attach-device powerio-le12-ubuntu-17.04 ./add_cx3.xml --live
error: Failed to attach device from ./add_cx3.xml
error: internal error: unable to execute QEMU command 'device_add': vfio error: 
0044:01:00.0: failed to setup container for group 6: RAM memory listener 
initialization failed for container

When I retried the steps for add_cx3.xml on the same machine I noticed
the following in the host logs:

[ 1374.276210] KVM guest htab at c000001e56000000 (order 26), LPID 1
[ 1383.824281] hrtimer: interrupt took 923 ns
[ 1447.479194] audit_printk_skb: 15 callbacks suppressed
[ 1447.479198] audit: type=1400 audit(1487194729.006:17): apparmor="DENIED" 
operation="setrlimit" profile="/usr/sbin/libvirtd" pid=6853 comm="libvirtd" 
rlimit=memlock value=8694792192
[ 1447.481927] pci 0044:01     : [PE# 002] Disabling 64-bit DMA bypass
[ 1447.481935] pci 0044:01     : [PE# 002] Removing DMA window #0
[ 1447.481978] pci 0044:01     : [PE# 002] Removing DMA window #0
[ 1447.481980] pci 0044:01     : [PE# 002] Removing DMA window #1
[ 1447.485667] pci 0044:01     : [PE# 002] Setting up window#0 0..7fffffff 
pg=1000
[ 1447.485670] pci 0044:01     : [PE# 002] Enabling 64-bit DMA bypass
[ 1517.030701] audit: type=1400 audit(1487194798.559:18): apparmor="DENIED" 
operation="setrlimit" profile="/usr/sbin/libvirtd" pid=6853 comm="libvirtd" 
rlimit=memlock value=8694792192
[ 1517.033286] pci 0044:01     : [PE# 002] Disabling 64-bit DMA bypass
[ 1517.033290] pci 0044:01     : [PE# 002] Removing DMA window #0
[ 1517.033322] pci 0044:01     : [PE# 002] Removing DMA window #0
[ 1517.033325] pci 0044:01     : [PE# 002] Removing DMA window #1
[ 1517.036971] pci 0044:01     : [PE# 002] Setting up window#0 0..7fffffff 
pg=1000
[ 1517.036974] pci 0044:01     : [PE# 002] Enabling 64-bit DMA bypass

I'm not sure if the apparmor issues are affecting functionality or not.
That may be worth looking into a separate bug, or a dupe of
https://bugzilla.linux.ibm.com/show_bug.cgi?id=146192

As noted there I did the following to work around it:

sudo aa-complain /usr/sbin/libvirtd
sudo aa-complain 
/etc/apparmor.d/libvirt/libvirt-????????-????-????-????-????????????

I still got the VFIO memory listener error however. If I install QEMU
2.7.0 I no longer see the VFIO error and things seems to succeed from a
host perspective:

root@powerio-le11:/etc/libvirt/qemu# virsh attach-device 
powerio-le12-ubuntu-17.04 ./add_cx3.xml --live
Device attached successfully

root@powerio-le11:/etc/libvirt/qemu# dmesg | tail -6
[ 3880.813971] KVM guest htab at c000001e56000000 (order 26), LPID 1
[ 3917.656384] audit: type=1400 audit(1487197199.210:26): apparmor="ALLOWED" 
operation="setrlimit" profile="/usr/sbin/libvirtd" pid=6853 comm="libvirtd" 
rlimit=memlock value=8694792192
[ 3917.659276] pci 0044:01     : [PE# 002] Disabling 64-bit DMA bypass
[ 3917.659284] pci 0044:01     : [PE# 002] Removing DMA window #0
[ 3917.688803] vfio-pci 0044:01:00.0: enabling device (0400 -> 0402)
[ 3917.800106] vfio_ecap_init: 0044:01:00.0 hiding ecap 0x19@0x18c

In the guest things look okay initially:

[   28.797667] RTAS: event: 1, Type: Unknown, Severity: 1
[   29.062821] pci 0000:00:05.0: [15b3:1007] type 00 class 0x020000
[   29.063118] pci 0000:00:05.0: reg 0x10: [mem 0x100a0000000-0x100a00fffff 
64bit]
[   29.063341] pci 0000:00:05.0: reg 0x18: [mem 0x2c0200000000-0x2c0201ffffff 
64bit pref]
[   29.063701] pci 0000:00:05.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
[   29.065237] iommu: Adding device 0000:00:05.0 to group 0
[   29.065332] pci 0000:00:05.0: BAR 2: assigned [mem 
0x10122000000-0x10123ffffff 64bit pref]
[   29.065675] pci 0000:00:05.0: BAR 0: assigned [mem 
0x10121800000-0x101218fffff 64bit]
[   29.066010] pci 0000:00:05.0: BAR 6: assigned [mem 
0x100a0000000-0x100a00fffff pref]
[   29.066105] mlx4_core: Mellanox ConnectX core driver v4.0-1.0.1 (29 Jan 2017)
[   29.066127] mlx4_core: Initializing 0000:00:05.0
[   29.066210] mlx4_core 0000:00:05.0: enabling device (0000 -> 0002)
[   29.076273] mlx4_core 0000:00:05.0: Using 64-bit direct DMA at offset 
800000000000000


but eventually I see the following error:


[   89.925954] mlx4_core 0000:00:05.0: device is going to be reset
[   99.923755] mlx4_core 0000:00:05.0: Failed to obtain HW semaphore, aborting
[   99.924052] mlx4_core 0000:00:05.0: Fail to reset HCA
[   99.924305] kernel BUG at 
/var/lib/dkms/mlnx-ofed-kernel/4.0/build/drivers/net/ethernet/mellanox/mlx4/catas.c:193!
[   99.924643] Oops: Exception in kernel mode, sig: 5 [#1]
[   99.924811] SMP NR_CPUS=2048 [   99.924889] NUMA 
[   99.924968] pSeries
[   99.925048] Modules linked in: rdma_ucm(OE) ib_ucm(OE) ib_ipoib(OE) 
ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx5_core(OE) mlx4_ib(OE) mlx4_en(OE) 
mlx4_core(OE) devlink vmx_crypto ib_iser rdma_cm(OE) iw_cm(OE) ib_cm(OE) 
ib_core(OE) mlx_compat(OE) configfs iscsi_tcp libiscsi_tcp libiscsi 
scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10 raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear ibmvscsi crc32c_vpmsum virtio_net 
virtio_blk
[   99.927029] CPU: 10 PID: 4600 Comm: drmgr Tainted: G           OE   
4.9.0-12-generic #13-Ubuntu
[   99.927316] task: c0000001dfc27e00 task.stack: c0000001dd630000
[   99.927515] NIP: d000000003c62794 LR: d000000003c6277c CTR: c0000000006c4a80
[   99.927752] REGS: c0000001dd6332a0 TRAP: 0700   Tainted: G           OE    
(4.9.0-12-generic)
[   99.928029] MSR: 800000010282b033 <SF,VEC,VSX,EE,FP,ME,IR,DR,RI,LE,TM[E]>[   
99.928645]   CR: 48022222  XER: 20000000
[   99.928764] CFAR: c000000000710c28 SOFTE: 1 
GPR00: d000000003c6277c c0000001dd633520 d000000003cbca7c 0000000000000029 
GPR04: 0000000000000001 000000000000014d 6573657420484341 0d0a6c20746f2072 
GPR08: 0000000000000007 0000000000000001 c0000001e2a94300 0000000000000006 
GPR12: 0000000000002200 c00000000fb85a00 c0000001dd9a0060 0000000000000000 
GPR16: 00000000024080c0 00000000024202c2 0000000000000000 d000000003cb7418 
GPR20: 0000000000000000 d0000800802c0680 c0000001dd9a04e8 c0000001dd9a0518 
GPR24: 000000000000ea60 0000000000000000 c000000001443a00 c0000001dd6336f0 
GPR28: 0000000000000000 0000000000000004 c0000001e2a94360 c0000001dd9a0060 
NIP [d000000003c62794] mlx4_enter_error_state.part.0+0x35c/0x460 [mlx4_core]
[   99.931952] LR [d000000003c6277c] mlx4_enter_error_state.part.0+0x344/0x460 
[mlx4_core]
[   99.932190] Call Trace:
[   99.932278] [c0000001dd633520] [d000000003c6277c] 
mlx4_enter_error_state.part.0+0x344/0x460 [mlx4_core] (unreliable)
[   99.932647] [c0000001dd6335b0] [d000000003c66df8] __mlx4_cmd+0x720/0x970 
[mlx4_core]
[   99.932946] [c0000001dd633680] [d000000003c73d88] mlx4_QUERY_FW+0x90/0x420 
[mlx4_core]
[   99.933238] [c0000001dd633730] [d000000003c7fd28] mlx4_load_one+0x440/0x1ac0 
[mlx4_core]
[   99.933520] [c0000001dd633850] [d000000003c81a40] mlx4_init_one+0x698/0x7c0 
[mlx4_core]
[   99.933922] [c0000001dd633960] [c00000000063049c] local_pci_probe+0x6c/0x140
[   99.934171] [c0000001dd6339f0] [c0000000006312e8] 
pci_device_probe+0x178/0x200
[   99.934430] [c0000001dd633a50] [c000000000716970] 
driver_probe_device+0x240/0x540
[   99.934657] [c0000001dd633ae0] [c00000000071344c] bus_for_each_drv+0x8c/0xf0
[   99.934848] [c0000001dd633b30] [c0000000007164f0] __device_attach+0x140/0x210
[   99.935057] [c0000001dd633bc0] [c000000000621d38] 
pci_bus_add_device+0x78/0x100
[   99.935270] [c0000001dd633c30] [c000000000621e20] 
pci_bus_add_devices+0x60/0xe0
[   99.935488] [c0000001dd633c70] [c000000000625b44] pci_rescan_bus+0x44/0x70
[   99.935666] [c0000001dd633ca0] [c000000000631ee4] bus_rescan_store+0x84/0xb0
[   99.935840] [c0000001dd633ce0] [c000000000712fb4] bus_attr_store+0x44/0x70
[   99.936039] [c0000001dd633d00] [c0000000003d52b8] sysfs_kf_write+0x68/0xa0
[   99.936210] [c0000001dd633d20] [c0000000003d417c] 
kernfs_fop_write+0x17c/0x250
[   99.936407] [c0000001dd633d70] [c00000000031924c] __vfs_write+0x3c/0x70
[   99.936583] [c0000001dd633d90] [c00000000031a4b4] vfs_write+0xd4/0x240
[   99.936760] [c0000001dd633de0] [c00000000031c018] SyS_write+0x68/0x110
[   99.936934] [c0000001dd633e30] [c00000000000bd84] system_call+0x38/0xe0
[   99.937102] Instruction dump:
[   99.937188] e93f0000 3d020000 e8888078 e8690000 386300a0 4803f8f1 e8410018 
e95f0000 
[   99.937472] e92a0000 81290098 2f890001 409efea0 <0fe00000> 60000000 60420000 
e93f0000 
[   99.937726] ---[ end trace 66826e43e8c8b7ba ]---
[   99.937832]

It's not clear to me if this new guest issue is specific to QEMU 2.7, or
something that would also be present on 2.8 if not for the VFIO issue
originally noted in this bug. First step I think will be to root-cause
the VFIO issue, fix it, and see if the guest issue remains afterward. If
it does we can track that as a separate bug (or perhaps we already seen
this somewhere? seems vaguely familiar).

Need to hop of machine for today, but can look at it more tomorrow.

(In reply to comment #10)

> [ 1517.030701] audit: type=1400 audit(1487194798.559:18): apparmor="DENIED"
> operation="setrlimit" profile="/usr/sbin/libvirtd" pid=6853 comm="libvirtd"
> rlimit=memlock value=8694792192

> I'm not sure if the apparmor issues are affecting functionality or not. That
> may be worth looking into a separate bug, or a dupe of
> https://bugzilla.linux.ibm.com/show_bug.cgi?id=146192
> 
Let me check again the Ubuntu 16.10 system because I did the same steps to 
update the /etc/libvirt/qemu.conf in Ubuntu 17.04 like I did in 16.10 but still 
see it. Not sure if I did something else. 

> 
> It's not clear to me if this new guest issue is specific to QEMU 2.7, or
> something that would also be present on 2.8 if not for the VFIO issue
> originally noted in this bug. First step I think will be to root-cause the
> VFIO issue, fix it, and see if the guest issue remains afterward. If it does
> we can track that as a separate bug (or perhaps we already seen this
> somewhere? seems vaguely familiar).
> 
> Need to hop of machine for today, but can look at it more tomorrow.
For this I see it with Ubuntu 16.10 KVM and the issue is the command are timing 
out like the dmas are not getting to the HW. I can see this with any Mellanox 
card I had tried. I can open separate bug more specific to 16.10 if you want.

== Comment: #15 - MICHAEL D. ROTH <[email protected]> - 2017-02-22 13:22:53 ==
I tried a bisect between 2.7.0 and 2.8.0/hostos to find the origin of these 
errors:

root@powerio-le11:/etc/libvirt/qemu# virsh attach-device 
powerio-le12-ubuntu-17.04 ./add_cx3.xml --live
error: Failed to attach device from ./add_cx3.xml
error: internal error: unable to execute QEMU command 'device_add': Device 
initialization failed

The commit that caused the "breakage" was:

root@powerio-le11:~/mdroth/qemu.git# git bisect good
01905f58f166646619c35a2ebfc3ca3ed4cad62d is the first bad commit
commit 01905f58f166646619c35a2ebfc3ca3ed4cad62d
Author: Eric Auger <[email protected]>
Date:   Mon Oct 17 10:57:59 2016 -0600

    vfio: Pass an Error object to vfio_connect_container


However all that does is turn vfio init errors into fatal errors that are 
passed on to libvirt, as opposed to just logging them in background and 
continuing execution. If I go back to 2.7.0 and re-test, I find that while 
libvirt reports the attach is successful, the log file still shows:

LC_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -name 
guest=powerio-le12-ubuntu-17.04,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-17-powerio-le12-ubuntu-/master-key.aes
 -machine pseries-2.7,accel=kvm,usb=off,dump-guest-core=off -m 8192 -realtime 
mlock=off -smp 16,sockets=1,cores=2,threads=8 -uuid 
bd3248c2-5686-4e18-b86e-799292bf4ad3 -display none -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-17-powerio-le12-ubuntu-/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -device pci-ohci,id=usb,bus=pci.0,addr=0x2 -device 
spapr-vscsi,id=scsi0,reg=0x2000 -drive 
file=/var/lib/libvirt/images/powerio-le12-ubuntu-17.04.qcow2,format=qcow2,if=none,id=drive-virtio-disk0
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -driv
 e if=none,id=drive-scsi0-0-0-0,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:eb:a9:da,bus=pci.0,addr=0x1 
-chardev pty,id=charserial0 -device 
spapr-vty,chardev=charserial0,reg=0x30000000 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
Domain id=17 is tainted: high-privileges
char device redirected to /dev/pts/5 (label charserial0)
vfio: RAM memory listener initialization failed for container

So this issue seems to have existed since before 2.7.0, assuming it is
stemming from QEMU and not related to kernel. Will look into it more.

== Comment: #16 - MICHAEL D. ROTH <[email protected]> - 2017-02-22 18:02:36 ==
I think this is some sort of permissions/rlimit issue after all.

If I invoke QEMU directly without libvirt, then to the attach from the
QEMU monitor, I see the device added successfully with no error, and I
also don't see the subsequent crashes within the guest relating to
mlx_QUERY_FW:

root@powerio-le11:~/mdroth/qemu-build# ppc64-softmmu/qemu-system-ppc64
-object
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2
-powerio-le12-ubuntu-/master-key.aes -machine
pseries-2.7,accel=kvm,usb=off,dump-guest-core=off -m 8192 -realtime
mlock=off -smp 16,sockets=1,cores=2,threads=8 -uuid bd3248c2-5686-4e18
-b86e-799292bf4ad3 -display none -no-user-config -nodefaults -rtc
base=utc -no-shutdown -boot strict=on -device pci-
ohci,id=usb,bus=pci.0,addr=0x2 -device spapr-vscsi,id=scsi0,reg=0x2000
-drive file=/var/lib/libvirt/images/powerio-
le12-ubuntu-17.04.qcow2,format=qcow2,if=none,id=drive-virtio-disk0
-device virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-
disk0,id=virtio-disk0,bootindex=1 -drive if=none,id=drive-
scsi0-0-0-0,readonly=on -device scsi-cd,bus=scsi0.0,channel=0,scsi-
id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0 -netdev
tap,id=hostnet0,vhost=on,script=/etc/qemu-ifup -device virtio-net-
pci,netdev=hostnet0,id=net0,mac=52:54:00:eb:a9:da,bus=pci.0,addr=0x1
-device spapr-vty,chardev=charserial0,reg=0x30000000 -device virtio-
balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on -vga none
-nographic -chardev stdio,mux=on,id=charserial0 -monitor
chardev:charserial0

root@powerio-le11:~/mdroth# ./vfio-bind 0044:01:00.0 
unbinding 0044:01:00.0 via /sys/bus/pci/devices/0044:01:00.0/driver/unbind
binding 0044:01:00.0
echo 0x15b3 0x1007 >/sys/bus/pci/drivers/vfio-pci/new_id

(qemu) device_add vfio-pci,host=0044:01:00.0,id=hp0

root@powerio-le12:~# dmesg | tail -36
[  236.294903] RTAS: event: 1, Type: Unknown, Severity: 1
[  236.574958] pci 0000:00:00.0: [15b3:1007] type 00 class 0x020000
[  236.575630] pci 0000:00:00.0: reg 0x10: [mem 0x00000000-0x000fffff 64bit]
[  236.575986] pci 0000:00:00.0: reg 0x18: [mem 0x00000000-0x01ffffff 64bit 
pref]
[  236.576592] pci 0000:00:00.0: reg 0x30: [mem 0x00000000-0x000fffff pref]
[  236.578890] iommu: Adding device 0000:00:00.0 to group 0
[  236.578985] pci 0000:00:00.0: BAR 2: assigned [mem 
0x10122000000-0x10123ffffff 64bit pref]
[  236.580466] pci 0000:00:00.0: BAR 0: assigned [mem 
0x10121800000-0x101218fffff 64bit]
[  236.580921] pci 0000:00:00.0: BAR 6: assigned [mem 
0x100a0000000-0x100a00fffff pref]
[  236.581011] mlx4_core: Mellanox ConnectX core driver v4.0-1.0.1 (29 Jan 2017)
[  236.581162] mlx4_core: Initializing 0000:00:00.0
[  236.581272] mlx4_core 0000:00:00.0: enabling device (0000 -> 0002)
[  236.583876] mlx4_core 0000:00:00.0: Using 64-bit direct DMA at offset 
800000000000000
[  242.122882] mlx4_core: device is working in RoCE mode: Roce V1
[  242.122884] mlx4_core: UD QP Gid type is: V1
[  243.652901] mlx4_core 0000:00:00.0: PCIe link speed is 8.0GT/s, device 
supports 8.0GT/s
[  243.652904] mlx4_core 0000:00:00.0: PCIe link width is x8, device supports x8
[  243.877392] mlx4_en: Mellanox ConnectX HCA Ethernet driver v4.0-1.0.1 (29 
Jan 2017)
[  243.877592] mlx4_en 0000:00:00.0: Activating port:1
[  243.904087] mlx4_en: 0000:00:00.0: Port 1: Using 128 TX rings
[  243.904090] mlx4_en: 0000:00:00.0: Port 1: Using 8 RX rings
[  243.904093] mlx4_en: 0000:00:00.0: Port 1:   frag:0 - size:1522 prefix:0 
stride:1536
[  243.904770] mlx4_en: 0000:00:00.0: Port 1: Initializing port
[  243.905354] mlx4_en 0000:00:00.0: registered PHC clock
[  243.906985] mlx4_en 0000:00:00.0: Activating port:2
[  243.917716] mlx4_core 0000:00:00.0 enp0s0: renamed from eth0
[  243.919899] mlx4_en: 0000:00:00.0: Port 2: Using 128 TX rings
[  243.919901] mlx4_en: 0000:00:00.0: Port 2: Using 8 RX rings
[  243.919903] mlx4_en: 0000:00:00.0: Port 2:   frag:0 - size:1522 prefix:0 
stride:1536
[  243.920694] mlx4_en: 0000:00:00.0: Port 2: Initializing port
[  243.941713] <mlx4_ib> mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand 
driver v4.0-1.0.1 (29 Jan 2017)
[  244.039494] <mlx4_ib> mlx4_ib_add: counter index 2 for port 1 allocated 1
[  244.039520] <mlx4_ib> mlx4_ib_add: counter index 3 for port 2 allocated 1
[  244.098796] mlx4_core 0000:00:00.0 enp0s0d1: renamed from eth0
[  245.266775] mlx4_en: enp0s0: Link Up
[  245.266891] mlx4_en: enp0s0d1: Link Up

Everything appears to be functioning. Also worth noting, the host
doesn't report any apparmor messages:

[ 3683.945997] KVM guest htab at c000001e5a000000 (order 26), LPID 2
[ 3878.433033] br0: port 2(vnet0) entered disabled state
[ 3878.436993] device vnet0 left promiscuous mode
[ 3878.436995] br0: port 2(vnet0) entered disabled state
[ 3927.505181] pci 0044:01     : [PE# 02] Disabling 64-bit DMA bypass
[ 3927.505188] pci 0044:01     : [PE# 02] Removing DMA window #0
[ 3928.018862] pci 0044:01     : [PE# 02] Setting up window#0 0..3fffffff 
pg=1000
[ 3928.024266] pci 0044:01     : [PE# 02] Setting up window#1 
800000000000000..8000001ffffffff pg=10000
[ 3928.403651] vfio-pci 0044:01:00.0: enabling device (0400 -> 0402)
[ 3928.514975] vfio_ecap_init: 0044:01:00.0 hiding ecap 0x19@0x18c

If I try to hotplug the device via libvirt, I see the vfio listener
registration failure originally noted. If I enabled traces in qemu, i
see where that listener failure is stemming from:

C_ALL=C PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin 
QEMU_AUDIO_DRV=none /usr/bin/kvm -name 
guest=powerio-le12-ubuntu-17.04,debug-threads=on -S -object 
secret,id=masterKey0,format=raw,file=/var/lib/libvirt/qemu/domain-2-powerio-le12-ubuntu-/master-key.aes
 -machine pseries-2.7,accel=kvm,usb=off,dump-guest-core=off -m 8192 -realtime 
mlock=off -smp 16,sockets=1,cores=2,threads=8 -uuid 
bd3248c2-5686-4e18-b86e-799292bf4ad3 -display none -no-user-config -nodefaults 
-chardev 
socket,id=charmonitor,path=/var/lib/libvirt/qemu/domain-2-powerio-le12-ubuntu-/monitor.sock,server,nowait
 -mon chardev=charmonitor,id=monitor,mode=control -rtc base=utc -no-shutdown 
-boot strict=on -device pci-ohci,id=usb,bus=pci.0,addr=0x2 -device 
spapr-vscsi,id=scsi0,reg=0x2000 -drive 
file=/var/lib/libvirt/images/powerio-le12-ubuntu-17.04.qcow2,format=qcow2,if=none,id=drive-virtio-disk0
 -device 
virtio-blk-pci,scsi=off,bus=pci.0,addr=0x3,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1
 -drive i
 f=none,id=drive-scsi0-0-0-0,readonly=on -device 
scsi-cd,bus=scsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0-0-0-0,id=scsi0-0-0-0
 -netdev tap,fd=25,id=hostnet0,vhost=on,vhostfd=27 -device 
virtio-net-pci,netdev=hostnet0,id=net0,mac=52:54:00:eb:a9:da,bus=pci.0,addr=0x1 
-chardev pty,id=charserial0 -device 
spapr-vty,chardev=charserial0,reg=0x30000000 -device 
virtio-balloon-pci,id=balloon0,bus=pci.0,addr=0x4 -msg timestamp=on
Domain id=2 is tainted: high-privileges
2017-02-22T23:01:35.080908Z qemu-system-ppc64: -chardev pty,id=charserial0: 
char device redirected to /dev/pts/6 (label charserial0)
[email protected]:vfio_realize  (0044:01:00.0) group 6
[email protected]:vfio_prereg_register va=3ffd2bff0000 size=200000000 
ret=-12
[email protected]:vfio_prereg_listener_region_add_skip 10080000020 - 
1008000003f
[email protected]:vfio_prereg_listener_region_add_skip 10080000040 - 
1008000007f
[email protected]:vfio_prereg_listener_region_add_skip 10080000080 - 
1008000009f
[email protected]:vfio_prereg_listener_region_add_skip 100e0000000 - 
100e000001f
[email protected]:vfio_prereg_listener_region_add_skip 100e0000800 - 
100e0000807
[email protected]:vfio_prereg_listener_region_add_skip 100e0001000 - 
100e00010ff
[email protected]:vfio_prereg_listener_region_add_skip 100e0002000 - 
100e000202f
[email protected]:vfio_prereg_listener_region_add_skip 100e0002800 - 
100e0002807
[email protected]:vfio_prereg_listener_region_add_skip 10120000000 - 
10120000fff
[email protected]:vfio_prereg_listener_region_add_skip 10120001000 - 
10120001fff
[email protected]:vfio_prereg_listener_region_add_skip 10120002000 - 
10120002fff
[email protected]:vfio_prereg_listener_region_add_skip 10120003000 - 
10120402fff
[email protected]:vfio_prereg_listener_region_add_skip 10120800000 - 
10120800fff
[email protected]:vfio_prereg_listener_region_add_skip 10120801000 - 
10120801fff
[email protected]:vfio_prereg_listener_region_add_skip 10120802000 - 
10120802fff
[email protected]:vfio_prereg_listener_region_add_skip 10120803000 - 
10120c02fff
[email protected]:vfio_prereg_listener_region_add_skip 10121000000 - 
10121000fff
[email protected]:vfio_prereg_listener_region_add_skip 10121001000 - 
10121001fff
[email protected]:vfio_prereg_listener_region_add_skip 10121002000 - 
10121002fff
[email protected]:vfio_prereg_listener_region_add_skip 10121003000 - 
10121402fff

vfio_prereg_register's ret=-12 is the errno value set by:

    ret = ioctl(container->fd, VFIO_IOMMU_SPAPR_REGISTER_MEMORY, &reg);

which indicates that VFIO_IOMMU_SPAPR_REGISTER_MEMORY is failing with
"Cannot allocate memory". In the host, I see an apparmor message:

[ 1607.260426] KVM guest htab at c000001e56000000 (order 26), LPID 1
[ 1745.761165] audit: type=1400 audit(1487804633.611:18): apparmor="ALLOWED" 
operation="setrlimit" profile="/usr/sbin/libvirtd" pid=5329 comm="libvirtd" 
rlimit=memlock value=8694792192
[ 1745.763764] pci 0044:01     : [PE# 02] Disabling 64-bit DMA bypass
[ 1745.763771] pci 0044:01     : [PE# 02] Removing DMA window #0
[ 1745.763864] pci 0044:01     : [PE# 02] Removing DMA window #0
[ 1745.763867] pci 0044:01     : [PE# 02] Removing DMA window #1
[ 1745.767676] pci 0044:01     : [PE# 02] Setting up window#0 0..7fffffff 
pg=1000
[ 1745.767679] pci 0044:01     : [PE# 02] Enabling 64-bit DMA bypass

Originally these were "DENIED" errors, but In comment #10 i noted I'd
worked around that via:

sudo aa-complain /usr/sbin/libvirtd
sudo aa-complain 
/etc/apparmor.d/libvirt/libvirt-????????-????-????-????-????????????

as noted in https://bugzilla.linux.ibm.com/show_bug.cgi?id=146192

But either that workaround is insufficient, or there's some other issue
relating to libvirt priviledge levels that seems to be at issue, given
that QEMU doesn't have any issues when using directly as root.


Can u try now because I was using the system in the weekend and the card was 
dead plus the guest was doing pci  passthru of the card also. So I took out the 
card from the guest xml and I can recreate again. 
virsh attach-device powerio-le12-ubuntu-17.04 ./add_hydepark.xml --live
error: Failed to attach device from ./add_hydepark.xml
error: internal error: unable to execute QEMU command 'device_add': vfio error: 
0040:01:00.0: failed to setup container for group 5: RAM memory listener 
initialization failed for container

This is because of the memlock hard limits that libvirt does. The
upstream 2.5.0 doesnt have the problem.

The libvirt starts with a certain value for max memlock and adjusts it during 
the hotplug. The upstream 2.5.0 is adjusting it correctly for my guest having   
<memory unit='KiB'>16777216</memory>
to Max locked memory         17368612864          17368612864          bytes    
         on hotplug, where as the ubuntu libvirt is not. 

The same can be worked around by hard coding the max limits with the below tag 
for the guest powerio-le14-ubuntu-17.04
  <memtune>
    <hard_limit unit='KiB'>16961536</hard_limit>
    <soft_limit unit='KiB'>16961536</soft_limit>
  </memtune>

Trying to figure out the patch which might be missing on Ubuntu libvirt.

I went through the code and figured the required patches are all there.
The package apparmor-profiles was missing and I installed that.

I had to add #include <abstractions/libvirt-qemu>  to
/etc/apprmor.d/usr.bin.libvirt and add /dev/vfio/vfio rw, to
/etc/apparmor.d/abstractions/libvirt-qemu so I could get the hotplug
working

I did above three together to get it working and not sure which of the
them actually fixed(mosty including libvirt-qemu) as the appromor keeps
the profiles in cache and reinstalling libvirt-daemon-system(which
provides the /etc/apprmor.d/usr.bin.libvirt) didnt reinstall the
file(!!).

The apparmor is kind of keeping the profiles in cache somewhere and
relioading is not helping. Everything seems to be working fine now that
is making it hard to say exactly which of the two steps fixed it. Or
having the apparmor-profiles made the trick.

Carol, Let me know if you are planning for re-image sometime so we can
see exactly which of the 3 helps get rid of the problem.

Would it be sufficient to just document this issue?

For now may be we can document the steps.

All steps except the step3 (3. Add /dev/vfio/vfio rw in abstractions
/libvirt-qemu ), are not avoidable. The Step3 can be avoided if we can
make changes to the default libvirt-qemu file on the distro.

** Affects: qemu (Ubuntu)
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bugnameltc-151486 severity-high 
targetmilestone-inin1704

** Tags added: architecture-ppc64le bugnameltc-151486 severity-high
targetmilestone-inin1704

** Changed in: ubuntu
     Assignee: (unassigned) => Taco Screen team (taco-screen-team)

** Package changed: ubuntu => qemu (Ubuntu)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1678322

Title:
  Ubuntu 17.04 KVM: Can not do hotplug attach

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/qemu/+bug/1678322/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to