Public bug reported:

---Problem Description---
WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595 [travis3EN]
 
---uname output---
Linux ltciofvtr-s822l2-lp3 4.4.0-4-generic #19-Ubuntu SMP Fri Feb 5 17:36:21 
UTC 2016 ppc64le ppc64le ppc64le GNU/Linux
 
Machine Type = s822l 
 
---Steps to Reproduce---
 triggering EEH causes the warning messages in syslog
Note: its just the warning messages, card recovers after EEH

1. from peer: run some load
linux-xqxs:~ # ping -f 22.22.22.22

2. from pKVM host run the EEH for the travis3EN card
[root@ltciofvtr-s822l2-lp1 ~]# echo 0x8000000000000000 > 
/sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA; sleep 1; echo 0x0 > 
/sys/kernel/debug/powerpc/PCI0003/err_injct_inboundA

3. on client's sysfs you can see the warning messages "WARNING: at
/build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595"

[  940.382507] EEH: Frozen PHB#0-PE#1 detected
[  940.382594] EEH: PE location: N/A, PHB location: N/A
[  940.382828] mlx4_core 0000:00:04.0: mlx4_pci_err_detected was called
[  940.382891] mlx4_core 0000:00:04.0: device is going to be reset
[  940.382953] mlx4_core 0000:00:04.0: device was reset successfully
[  940.383014] mlx4_en 0000:00:04.0: Internal error detected, restarting device
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382507] EEH: Frozen 
PHB#0-PE#1 detected
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382594] EEH: PE location: 
N/A, PHB location: N/A
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382647] CPU: 1 PID: 176 
Comm: kworker/u16:2 Not tainted 4.4.0-4-generic #19-Ubuntu
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382671] Workqueue: mlx4_en 
mlx4_en_do_get_stats [mlx4_en]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382673] Call Trace:
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382714] [c00000000487b7c0] 
[c000000000ad8aa0] dump_stack+0x90/0xbc (unreliable)
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382725] [c00000000487b7f0] 
[c0000000000378f4] eeh_dev_check_failure+0x534/0x580
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382728] [c00000000487b890] 
[c0000000000379c4] eeh_check_failure+0x84/0xd0
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382743] [c00000000487b8d0] 
[d000000002112fc0] cmd_pending+0xb0/0xe0 [mlx4_core]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382749] [c00000000487b900] 
[d0000000021130b0] mlx4_cmd_post+0xc0/0x250 [mlx4_core]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382756] [c00000000487b9b0] 
[d00000000211592c] __mlx4_cmd+0x1dc/0x9b0 [mlx4_core]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382766] [c00000000487ba70] 
[d0000000024eb030] mlx4_en_DUMP_ETH_STATS+0xc0/0x830 [mlx4_en]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382770] [c00000000487bb70] 
[d0000000024ef150] mlx4_en_do_get_stats+0x160/0x340 [mlx4_en]
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382780] [c00000000487bc50] 
[c0000000000dc920] process_one_work+0x1e0/0x560
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382783] [c00000000487bce0] 
[c0000000000dce34] worker_thread+0x194/0x680
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382785] [c00000000487bd80] 
[c0000000000e58d0] kthread+0x110/0x130
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382788] [c00000000487be30] 
[c000000000009538] ret_from_kernel_thread+0x5c/0xa4
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382814] mlx4_core 
0000:00:04.0: Could not post command 0x49: ret=-5, in_param=0x0, in_mod=0x1, 
op_mod=0x0
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382821] EEH: Detected PCI 
bus error on PHB#0-PE#1
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382823] EEH: This PCI 
device has failed 1 times in the last hour
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382824] EEH: Notify device 
drivers to shutdown
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382828] mlx4_core 
0000:00:04.0: mlx4_pci_err_detected was called
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382891] mlx4_core 
0000:00:04.0: device is going to be reset
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.382953] mlx4_core 
0000:00:04.0: device was reset successfully
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.383014] mlx4_en 
0000:00:04.0: Internal error detected, restarting device
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.383320] mlx4_en: enp0s4: 
Close port called
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Starting Cleanup of Temporary 
Directories...
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd-tmpfiles[2473]: 
[/usr/lib/tmpfiles.d/var.conf:14] Duplicate line for path "/var/log", ignoring.
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 systemd[1]: Started Cleanup of Temporary 
Directories.
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  940.801690] mlx4_en 
0000:00:04.0: removed PHC
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.079593] EEH: Collect 
temporary log
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.079631] eeh_pci_enable: 
Unexpected state change 2 on PHB#0-PE#1, err=-3
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081329] EEH: of 
node=0000:00:04:0
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081348] EEH: PCI 
device/vendor: 100315b3
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081582] EEH: PCI cmd/status 
register: 00100142
Feb 19 02:18:23 ltciofvtr-s822l2-lp3 kernel: [  941.081584] EEH: PCI-E 
capabilities and status follow:
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081725] EEH: PCI-E 00: 
0002c010 11d08e02 0020202e 0843f483 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081849] EEH: PCI-E 10: 
10830000 00000000 00000000 00000000 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081851] EEH: PCI-E 20: 
00000000 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081886] EEH: Reset without 
hotplug activity
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.081935] mlx4_core 
0000:00:04.0: mlx4_remove_one: interface is down
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082003] mlx4_core 
0000:00:04.0: disabling already-disabled device
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082046] ------------[ cut 
here ]------------
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082049] WARNING: at 
/build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082051] Modules linked in: 
ib_ipoib mlx5_ib mlx5_core rdma_ucm rdma_cm iw_cm ib_umad ib_ucm ib_cm ib_sa 
ib_mad ib_uverbs ib_core ib_addr pseries_rng rtc_generic nfsd auth_rpcgss 
nfs_acl lockd grace sunrpc autofs4 mlx4_en vxlan ip6_udp_tunnel udp_tunnel 
ibmvscsi mlx4_core
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082257] CPU: 1 PID: 49 
Comm: eehd Not tainted 4.4.0-4-generic #19-Ubuntu
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082260] task: 
c0000003f91e9370 ti: c0000003f9060000 task.ti: c0000003f9060000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082263] NIP: 
c0000000005cdf0c LR: c0000000005cdf08 CTR: c00000000057ae00
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082265] REGS: 
c0000003f9063560 TRAP: 0700   Not tainted  (4.4.0-4-generic)
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082267] MSR: 
8000000100029033 <SF,EE,ME,IR,DR,RI,LE>  CR: 28002422  XER: 20000000
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] CFAR: 
c000000000ad578c SOFTE: 1 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR00: 
c0000000005cdf08 c0000003f90637e0 c000000001593900 0000000000000039 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR04: 
0000000000000001 0000000000000000 0000000000000048 0000000000000175 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR08: 
c000000001733900 0000000000000000 0000000000000000 0000000000000005 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR12: 
0000000028002428 c00000000fb40980 c0000000000e57c8 c0000003fe165980 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR16: 
0000000000000000 0000000000000000 0000000000000000 0000000000000000 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR20: 
0000000000000000 0000000000000000 0000000000000000 c000000000d1f500 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR24: 
c000000000d1f4d8 0000000000000100 c0000003fe058580 0000000000000000 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082275] GPR28: 
c0000003fe144000 c000000004fd0300 c0000003fe144758 c0000003fe144000 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082328] NIP 
[c0000000005cdf0c] pci_disable_device+0x11c/0x140
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082332] LR 
[c0000000005cdf08] pci_disable_device+0x118/0x140
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082333] Call Trace:
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082337] [c0000003f90637e0] 
[c0000000005cdf08] pci_disable_device+0x118/0x140 (unreliable)
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082350] [c0000003f9063850] 
[d00000000212b0d4] mlx4_remove_one+0xc4/0x250 [mlx4_core]
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082353] [c0000003f90638e0] 
[c0000000005d2fc0] pci_device_remove+0x70/0x110
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082358] [c0000003f9063920] 
[c0000000006be740] __device_release_driver+0xc0/0x190
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082362] [c0000003f9063950] 
[c0000000006be850] device_release_driver+0x40/0x70
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082365] [c0000003f9063980] 
[c0000000005c7e30] pci_stop_bus_device+0xf0/0x110
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082368] [c0000003f90639c0] 
[c0000000005c7fbc] pci_stop_and_remove_bus_device+0x2c/0x50
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082372] [c0000003f90639f0] 
[c00000000003c100] eeh_rmv_device+0x140/0x1a0
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082375] [c0000003f9063a70] 
[c00000000003a294] eeh_pe_dev_traverse+0x94/0x160
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082380] [c0000003f9063b00] 
[c000000000ad39d0] eeh_reset_device+0xbc/0x218
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082383] [c0000003f9063ba0] 
[c00000000003c454] eeh_handle_normal_event+0x2f4/0x430
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082386] [c0000003f9063c20] 
[c00000000003c764] eeh_handle_event+0x54/0x360
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082389] [c0000003f9063cd0] 
[c00000000003cb8c] eeh_event_handler+0x11c/0x1e0
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082393] [c0000003f9063d80] 
[c0000000000e58d0] kthread+0x110/0x130
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082397] [c0000003f9063e30] 
[c000000000009538] ret_from_kernel_thread+0x5c/0xa4
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082399] Instruction dump:
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082401] 409eff64 387f0098 
480eab45 60000000 e8bf00e8 2fa50000 7c641b78 419e0028 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082407] 3c62ff7f 38633aa8 
48507821 60000000 <0fe00000> 39200001 3d42fff8 992a1acb 
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082413] ---[ end trace 
1cce98b956e06602 ]---
Feb 19 02:18:24 ltciofvtr-s822l2-lp3 kernel: [  941.082431] iommu: Removing 
device 0000:00:04.0 from group 0
Feb 19 02:18:28 ltciofvtr-s822l2-lp3 kernel: [  945.197931] EEH: Sleep 5s ahead 
of partial hotplug
Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [  950.204919] iommu: Adding 
device 0000:00:04.0 to group 0
Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [  950.205129] mlx4_core: 
Initializing 0000:00:04.0
Feb 19 02:18:33 ltciofvtr-s822l2-lp3 kernel: [  950.207395] mlx4_core 
0000:00:04.0: Using 64-bit direct DMA at offset 800000000000000
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.254212] mlx4_core 
0000:00:04.0: PCIe link speed is 8.0GT/s, device supports 8.0GT/s
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.254215] mlx4_core 
0000:00:04.0: PCIe link width is x8, device supports x8
[  955.356803] mlx4_en: 0000:00:04.0: Port 1:   frag:0 - size:1522 prefix:0 
stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.353773] mlx4_en 
0000:00:04.0: Activating port:1
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.356795] mlx4_en: 
0000:00:04.0: Port 1: Using 64 TX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.356800] mlx4_en: 
0000:00:04.0: Port 1: Using 8 RX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.356803] mlx4_en: 
0000:00:04.0: Port 1:   frag:0 - size:1522 prefix:0 stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.359817] mlx4_en: 
0000:00:04.0: Port 1: Initializing port
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.360278] mlx4_en 
0000:00:04.0: registered PHC clock
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.361113] mlx4_en 
0000:00:04.0: Activating port:2
[  955.365352] mlx4_en: 0000:00:04.0: Port 2:   frag:0 - size:1522 prefix:0 
stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.363940] mlx4_core 
0000:00:04.0 enp0s4: renamed from eth0
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.365347] mlx4_en: 
0000:00:04.0: Port 2: Using 64 TX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.365350] mlx4_en: 
0000:00:04.0: Port 2: Using 8 RX rings
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.365352] mlx4_en: 
0000:00:04.0: Port 2:   frag:0 - size:1522 prefix:0 stride:1536
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.380726] mlx4_en: 
0000:00:04.0: Port 2: Initializing port
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.386733] EEH: Notify device 
drivers the completion of reset
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.386737] EEH: Notify device 
driver to resume
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.408991] <mlx4_ib> 
mlx4_ib_add: mlx4_ib: Mellanox ConnectX InfiniBand driver v2.2-1 (Feb 2014)
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.410735] mlx4_core 
0000:00:04.0 enp0s4d1: renamed from eth0
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.411687] <mlx4_ib> 
mlx4_ib_add: counter index 2 for port 1 allocated 1
Feb 19 02:18:38 ltciofvtr-s822l2-lp3 kernel: [  955.411690] <mlx4_ib> 
mlx4_ib_add: counter index 3 for port 2 allocated 1
Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [  957.608097] mlx4_en: enp0s4d1: 
Link Up
Feb 19 02:18:40 ltciofvtr-s822l2-lp3 kernel: [  957.662997] mlx4_en: enp0s4: 
Link Up


pKVM syslog:
Feb 19 18:16:47 ltciofvtr-s822l2-lp1 kernel: vfio-pci 0003:0b:00.0: enabling dev
ice (0140 -> 0142)
Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Starting Session 1302 of user root
.
Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Started Session 1302 of user root.
Feb 19 18:20:01 ltciofvtr-s822l2-lp1 systemd: Failed to reset devices.list on /m
achine.slice: Invalid argument
 

The patches are finally upstream:
https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=c12833acff62cff83a8b728253e7ebbc1264d75e
>From c12833acff62cff83a8b728253e7ebbc1264d75e Mon Sep 17 00:00:00 2001
From: Daniel Jurgens <[email protected]>
Date: Wed, 20 Apr 2016 16:01:15 +0300
Subject: net/mlx4_core: Implement pci_resume callback

https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git/patch/drivers/net/ethernet/mellanox/mlx4?id=4bfd2e6e53435a214888fd35e230157a38ffc6a0
>From 4bfd2e6e53435a214888fd35e230157a38ffc6a0 Mon Sep 17 00:00:00 2001
From: Daniel Jurgens <[email protected]>
Date: Wed, 20 Apr 2016 16:01:16 +0300
Subject: net/mlx4_core: Avoid repeated calls to pci enable/disable

** Affects: ubuntu
     Importance: Undecided
     Assignee: Taco Screen team (taco-screen-team)
         Status: New


** Tags: architecture-ppc64le bugnameltc-137553 severity-low 
targetmilestone-inin16041

** Tags added: architecture-ppc64le bugnameltc-137553 severity-low
targetmilestone-inin16041

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1574697

Title:
  WARNING: at /build/linux-aWXT0l/linux-4.4.0/drivers/pci/pci.c:1595
  [travis3EN]

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+bug/1574697/+subscriptions

-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to