Public bug reported:
---Problem Description---
When enabling SRIOV with kernel 4.10.0-26-generic in power will see this stack
trace:
[ 2084.079575] ------------[ cut here ]------------
[ 2084.079583] WARNING: CPU: 120 PID: 734 at
/build/linux-TAhFXm/linux-4.10.0/arch/powerpc/platforms/powernv/npu-dma.c:78
pnv_pci_get_npu_dev+0x40/0xb0
[ 2084.079584] Modules linked in: mst_pciconf(OE) mst_pci(OE) xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) ib_ipoib(OE)
ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx4_ib(OE) binfmt_misc bridge stp llc
ipmi_powernv ipmi_devintf ipmi_msghandler powernv_rng powernv_op_panel
uio_pdrv_genirq leds_powernv uio ibmpowernv vmx_crypto sunrpc ib_iser(OE)
rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_core(OE) configfs iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
[ 2084.079640] xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_en(OE)
ses enclosure scsi_transport_sas crc32c_vpmsum tg3 mlx5_core(OE) mlx4_core(OE)
ipr devlink mlx_compat(OE)
[ 2084.079658] CPU: 120 PID: 734 Comm: kworker/120:0 Tainted: G W OE
4.10.0-26-generic #30-Ubuntu
[ 2084.079663] Workqueue: events work_for_cpu_fn
[ 2084.079665] task: c000000fee60dc00 task.stack: c000000fee534000
[ 2084.079666] NIP: c00000000009c210 LR: c00000000009d404 CTR: 0000000000000000
[ 2084.079668] REGS: c000000fee537700 TRAP: 0700 Tainted: G W OE
(4.10.0-26-generic)
[ 2084.079669] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 2084.079677] CR: 42004428 XER: 20000000
[ 2084.079678] CFAR: c00000000009d400 SOFTE: 1
GPR00: c00000000009d404 c000000fee537980 c00000000145d100
0000000000000000
GPR04: 0000000000000000 0000000000000aa6 c000001fff700000
0000000000049188
GPR08: 0000000000000007 0000000000000001 0000000000000001
0000000000000000
GPR12: 0000000000002200 c00000000fbc3800 c00000000010ef48
c000000ff70ec540
GPR16: c000000ffa622c58 c000000ffa622a10 c000000ffa6229a0
0000000000000001
GPR20: 0000000000000000 c000000001318de8 c000000000d700e8
0000000000000001
GPR24: c000000000d6f070 c000000000d6f050 c000000003d02000
c000000003d02098
GPR28: c000000e92680060 0800001fffffffff ffffffffffffffff
0000000000000000
[ 2084.079702] NIP [c00000000009c210] pnv_pci_get_npu_dev+0x40/0xb0
[ 2084.079704] LR [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250
[ 2084.079705] Call Trace:
[ 2084.079708] [c000000fee5379b0] [c00000000009d404]
pnv_npu_try_dma_set_bypass+0x144/0x250
[ 2084.079710] [c000000fee537a80] [c000000000096c74]
pnv_pci_ioda_dma_set_mask+0xa4/0x150
[ 2084.079714] [c000000fee537b00] [c0000000000291a0] dma_set_mask+0x40/0xc0
[ 2084.079728] [c000000fee537b20] [d0000000143531e4] init_one+0x33c/0x6a0
[mlx5_core]
[ 2084.079732] [c000000fee537bd0] [c00000000066ba9c] local_pci_probe+0x6c/0x140
[ 2084.079734] [c000000fee537c60] [c0000000001016b8] work_for_cpu_fn+0x38/0x60
[ 2084.079737] [c000000fee537c90] [c0000000001061a0]
process_one_work+0x2b0/0x5a0
[ 2084.079740] [c000000fee537d20] [c000000000106780] worker_thread+0x2f0/0x650
[ 2084.079742] [c000000fee537dc0] [c00000000010f0a4] kthread+0x164/0x1b0
[ 2084.079746] [c000000fee537e30] [c00000000000b4e8]
ret_from_kernel_thread+0x5c/0x74
[ 2084.079747] Instruction dump:
[ 2084.079748] 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c690074 7929d182 0b090000
2fa30000
[ 2084.079753] 419e0060 e8630330 7c690074 7929d182 <0b090000> 2fa30000 419e0048
7c852378
[ 2084.079759] ---[ end trace 7bf01a937efd69d8 ]---
This issue was introduced by this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=4c3b89effc281704d5395282c800c45e453235f6
(Subject: powerpc/powernv: Add sanity checks to pnv_pci_get_{gpu|npu}_dev )
and the solution will be to add this commit:
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=377aa6b0efbaa29cfeecd8b9244641217f9544ca
which reads: "powerpc/npu-dma: Remove spurious WARN_ON when a PCI device has no
of_node"
Requesting fix inclusion in 17.04 and probably 16.04.3.
---uname output---
4.10.0-26-generic #30-Ubuntu SMP Tue Jun 27 09:29:34 UTC 2017 ppc64le ppc64le
ppc64le GNU/Linux
---Additional Hardware Info---
Need a Mellanox card that supports SRIOV.
Machine Type = P8
---Steps to Reproduce---
Just enable SRIOV in a power system with Mellanox CX4 or CX5 will be like this:
echo 1 > /sys/class/infiniband/mlx5_0/device/sriov_numvfs
Stack trace output:
[ 2084.079567] mlx5_core 0004:01:04.0: Using 64-bit DMA iommu bypass
[ 2084.079575] ------------[ cut here ]------------
[ 2084.079583] WARNING: CPU: 120 PID: 734 at
/build/linux-TAhFXm/linux-4.10.0/arch/powerpc/platforms/powernv/npu-dma.c:78
pnv_pci_get_npu_dev+0x40/0xb0
[ 2084.079584] Modules linked in: mst_pciconf(OE) mst_pci(OE) xt_CHECKSUM
iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4
nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT
nf_reject_ipv4 xt_tcpudp kvm_hv kvm_pr kvm ebtable_filter ebtables
ip6table_filter ip6_tables iptable_filter rdma_ucm(OE) ib_ucm(OE) ib_ipoib(OE)
ib_uverbs(OE) ib_umad(OE) mlx5_ib(OE) mlx4_ib(OE) binfmt_misc bridge stp llc
ipmi_powernv ipmi_devintf ipmi_msghandler powernv_rng powernv_op_panel
uio_pdrv_genirq leds_powernv uio ibmpowernv vmx_crypto sunrpc ib_iser(OE)
rdma_cm(OE) iw_cm(OE) ib_cm(OE) ib_core(OE) configfs iscsi_tcp libiscsi_tcp
libiscsi scsi_transport_iscsi knem(OE) ip_tables x_tables autofs4 btrfs raid10
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx
[ 2084.079640] xor raid6_pq libcrc32c raid1 raid0 multipath linear mlx4_en(OE)
ses enclosure scsi_transport_sas crc32c_vpmsum tg3 mlx5_core(OE) mlx4_core(OE)
ipr devlink mlx_compat(OE)
[ 2084.079658] CPU: 120 PID: 734 Comm: kworker/120:0 Tainted: G W OE
4.10.0-26-generic #30-Ubuntu
[ 2084.079663] Workqueue: events work_for_cpu_fn
[ 2084.079665] task: c000000fee60dc00 task.stack: c000000fee534000
[ 2084.079666] NIP: c00000000009c210 LR: c00000000009d404 CTR: 0000000000000000
[ 2084.079668] REGS: c000000fee537700 TRAP: 0700 Tainted: G W OE
(4.10.0-26-generic)
[ 2084.079669] MSR: 900000000282b033 <SF,HV,VEC,VSX,EE,FP,ME,IR,DR,RI,LE>
[ 2084.079677] CR: 42004428 XER: 20000000
[ 2084.079678] CFAR: c00000000009d400 SOFTE: 1
GPR00: c00000000009d404 c000000fee537980 c00000000145d100
0000000000000000
GPR04: 0000000000000000 0000000000000aa6 c000001fff700000
0000000000049188
GPR08: 0000000000000007 0000000000000001 0000000000000001
0000000000000000
GPR12: 0000000000002200 c00000000fbc3800 c00000000010ef48
c000000ff70ec540
GPR16: c000000ffa622c58 c000000ffa622a10 c000000ffa6229a0
0000000000000001
GPR20: 0000000000000000 c000000001318de8 c000000000d700e8
0000000000000001
GPR24: c000000000d6f070 c000000000d6f050 c000000003d02000
c000000003d02098
GPR28: c000000e92680060 0800001fffffffff ffffffffffffffff
0000000000000000
[ 2084.079702] NIP [c00000000009c210] pnv_pci_get_npu_dev+0x40/0xb0
[ 2084.079704] LR [c00000000009d404] pnv_npu_try_dma_set_bypass+0x144/0x250
[ 2084.079705] Call Trace:
[ 2084.079708] [c000000fee5379b0] [c00000000009d404]
pnv_npu_try_dma_set_bypass+0x144/0x250
[ 2084.079710] [c000000fee537a80] [c000000000096c74]
pnv_pci_ioda_dma_set_mask+0xa4/0x150
[ 2084.079714] [c000000fee537b00] [c0000000000291a0] dma_set_mask+0x40/0xc0
[ 2084.079728] [c000000fee537b20] [d0000000143531e4] init_one+0x33c/0x6a0
[mlx5_core]
[ 2084.079732] [c000000fee537bd0] [c00000000066ba9c] local_pci_probe+0x6c/0x140
[ 2084.079734] [c000000fee537c60] [c0000000001016b8] work_for_cpu_fn+0x38/0x60
[ 2084.079737] [c000000fee537c90] [c0000000001061a0]
process_one_work+0x2b0/0x5a0
[ 2084.079740] [c000000fee537d20] [c000000000106780] worker_thread+0x2f0/0x650
[ 2084.079742] [c000000fee537dc0] [c00000000010f0a4] kthread+0x164/0x1b0
[ 2084.079746] [c000000fee537e30] [c00000000000b4e8]
ret_from_kernel_thread+0x5c/0x74
[ 2084.079747] Instruction dump:
[ 2084.079748] 7c0802a6 fbe1fff8 f8010010 f821ffd1 7c690074 7929d182 0b090000
2fa30000
[ 2084.079753] 419e0060 e8630330 7c690074 7929d182 <0b090000> 2fa30000 419e0048
7c852378
[ 2084.079759] ---[ end trace 7bf01a937efd69d8 ]---
[ 2084.080096] mlx5_core 0004:01:04.0: firmware version: 12.20.1010
** Affects: linux (Ubuntu)
Importance: Undecided
Assignee: Ubuntu on IBM Power Systems Bug Triage (ubuntu-power-triage)
Status: New
** Tags: architecture-ppc64le bugnameltc-156405 severity-high
targetmilestone-inin1704
** Tags added: architecture-ppc64le bugnameltc-156405 severity-high
targetmilestone-inin1704
** Changed in: ubuntu
Assignee: (unassigned) => Ubuntu on IBM Power Systems Bug Triage
(ubuntu-power-triage)
** Package changed: ubuntu => linux (Ubuntu)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1702768
Title:
Ubuntu 17.04 KVM: stack trace generated when enabling SRIOV in power
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1702768/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs