Hi Michael,
I have some user reports of issues with a Broadcom 57412 card with the
card intermittently hanging and dropping the link.
The problem has been observed on a Dell server with an Ubuntu 4.13
kernel (bnxt_en version 1.7.0) and with an Ubuntu 4.15 kernel (bnxt_en
version 1.8.0). It seems to occur while mounting an XFS volume over
iSCSI, although running blkid on the partition succeeds, and it seems to
be a different volume each time.
I've included an excerpt from the kernel log below.
It seems that other people have reported this with a RHEL7 kernel - I
have no idea what driver version that would be running, or what workload
they were operating, but the warnings and messages look the same:
https://www.dell.com/community/PowerEdge-Hardware-General/Critical-network-bnxt-en-module-crashes-on-14G-servers/td-p/6031769/highlight/true
The forum poster reported that disconnecting the card from power for 5
minutes was sufficient to get things working again and I have asked our
user to test that.
Is this a known issue?
Regards,
Daniel
[ 2662.526151] scsi host14: iSCSI Initiator over TCP/IP
[ 2662.538110] scsi host15: iSCSI Initiator over TCP/IP
[ 2662.547350] scsi host16: iSCSI Initiator over TCP/IP
[ 2662.554660] scsi host17: iSCSI Initiator over TCP/IP
[ 2662.813860] scsi 15:0:0:1: Direct-Access PURE FlashArray
PQ: 0 ANSI: 6
[ 2662.813888] scsi 14:0:0:1: Direct-Access PURE FlashArray
PQ: 0 ANSI: 6
[ 2662.813972] scsi 16:0:0:1: Direct-Access PURE FlashArray
PQ: 0 ANSI: 6
[ 2662.814322] sd 15:0:0:1: Attached scsi generic sg1 type 0
[ 2662.814553] sd 14:0:0:1: Attached scsi generic sg2 type 0
[ 2662.814554] scsi 17:0:0:1: Direct-Access PURE FlashArray
PQ: 0 ANSI: 6
[ 2662.814612] sd 16:0:0:1: Attached scsi generic sg3 type 0
[ 2662.815081] sd 15:0:0:1: [sdb] 10737418240 512-byte logical blocks: (5.50
TB/5.00 TiB)
[ 2662.815195] sd 15:0:0:1: [sdb] Write Protect is off
[ 2662.815197] sd 15:0:0:1: [sdb] Mode Sense: 43 00 00 08
[ 2662.815229] sd 14:0:0:1: [sdc] 10737418240 512-byte logical blocks: (5.50
TB/5.00 TiB)
[ 2662.815292] sd 17:0:0:1: Attached scsi generic sg4 type 0
[ 2662.815342] sd 14:0:0:1: [sdc] Write Protect is off
[ 2662.815343] sd 14:0:0:1: [sdc] Mode Sense: 43 00 00 08
[ 2662.815419] sd 15:0:0:1: [sdb] Write cache: disabled, read cache: enabled,
doesn't support DPO or FUA
[ 2662.815447] sd 16:0:0:1: [sdd] 10737418240 512-byte logical blocks: (5.50
TB/5.00 TiB)
[ 2662.815544] sd 14:0:0:1: [sdc] Write cache: disabled, read cache: enabled,
doesn't support DPO or FUA
[ 2662.815614] sd 16:0:0:1: [sdd] Write Protect is off
[ 2662.815615] sd 16:0:0:1: [sdd] Mode Sense: 43 00 00 08
[ 2662.815882] sd 16:0:0:1: [sdd] Write cache: disabled, read cache: enabled,
doesn't support DPO or FUA
[ 2662.816188] sd 17:0:0:1: [sde] 10737418240 512-byte logical blocks: (5.50
TB/5.00 TiB)
[ 2662.816298] sd 17:0:0:1: [sde] Write Protect is off
[ 2662.816300] sd 17:0:0:1: [sde] Mode Sense: 43 00 00 08
[ 2662.816502] sd 17:0:0:1: [sde] Write cache: disabled, read cache: enabled,
doesn't support DPO or FUA
[ 2662.820080] sd 15:0:0:1: [sdb] Attached SCSI disk
[ 2662.820594] sd 14:0:0:1: [sdc] Attached SCSI disk
[ 2662.820995] sd 17:0:0:1: [sde] Attached SCSI disk
[ 2662.821176] sd 16:0:0:1: [sdd] Attached SCSI disk
[ 2662.913642] device-mapper: multipath round-robin: version 1.2.0 loaded
[ 2663.954001] XFS (dm-2): Mounting V5 Filesystem
[ 2673.186083] connection3:0: ping timeout of 5 secs expired, recv timeout 5,
last rx 4295558209, last ping 4295559460, now 4295560768
[ 2673.186135] connection3:0: detected conn error (1022)
[ 2673.186137] connection2:0: ping timeout of 5 secs expired, recv timeout 5,
last rx 4295558209, last ping 4295559460, now 4295560768
[ 2673.186168] connection2:0: detected conn error (1022)
[ 2673.186170] connection1:0: ping timeout of 5 secs expired, recv timeout 5,
last rx 4295558209, last ping 4295559460, now 4295560768
[ 2673.186211] connection1:0: detected conn error (1022)
[ 2674.209870] connection4:0: ping timeout of 5 secs expired, recv timeout 5,
last rx 4295558463, last ping 4295559720, now 4295561024
[ 2674.209924] connection4:0: detected conn error (1022)
[ 2678.560630] session1: session recovery timed out after 5 secs
[ 2678.560641] session2: session recovery timed out after 5 secs
[ 2678.560647] session3: session recovery timed out after 5 secs
[ 2678.951453] device-mapper: multipath: Failing path 8:32.
[ 2678.951509] device-mapper: multipath: Failing path 8:48.
[ 2678.951548] device-mapper: multipath: Failing path 8:16.
[ 2679.584302] session4: session recovery timed out after 5 secs
[ 2679.584313] sd 17:0:0:1: rejecting I/O to offline device
[ 2679.584356] sd 17:0:0:1: [sde] killing request
[ 2679.584362] sd 17:0:0:1: rejecting I/O to offline device
[ 2679.584392] sd 17:0:0:1: [sde] killing request
[ 2679.584401] sd 17:0:0:1: [sde] FAILED Result: hostbyte=DID_NO_CONNECT