[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-05-11 Thread Nivedita Singhvi
The issue we have reported is easily avoided by specifying
the primary port to be the active interface of the bond. 

On netplan-using systems:

Add the directive "primary: $interface" (e.g. "primary: p94s0f0")
to the "parameters:" section of the netplan config file.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-05-11 Thread Nivedita Singhvi
Hello, diarmuid,

Re: original issue report, were you able to resolve your issue?

Please let us know.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-05-11 Thread Nivedita Singhvi
We are closing this LP bug for now as we aren't able to reproduce 
in-house, and we cannot get access to a live testing repro env 
at this time.  

Here is what we know:

- There seems to be different performance for some tests when
  the NIC is configured with active-backup bonding mode, between
  the case when the active interface is the primary port, and 
  when the active interface is the secondary port. 
  i.e.:

Primary port: enp94s0f0 // when this is the active, works fine
Secondary port: enp94s0f1d1 // when this is the active, more drops

- Switch info: 2 x Fortigate 1024D switches, each machine is connected 
  to both

- NIC info:  root@u072:~# lspci | grep BCM57416
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)

# ethtool -i enp1s0f0np0
driver: bnxt_en
version: 1.10.0
firmware-version: 214.0.253.1/pkg 21.40.25.31

- Our attempt at a reproducer (initially reported in production env via 
graphical
monitoring):

mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST
good system = ~ 0% drops
bad systems = ~ 8% drops

We are not getting NIC stats drops, nor UDP kernel drops, so it's
not clear where the packet is being dropped, whether it's being
dropped silently somewhere (?), or if that's a red herring and
a mtr test issue, and what's seen in production is something else.

If someone can reproduce this, or something similar, or if we manage
to, we will re-open this bug or file a new one.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-14 Thread Edwin Peer
Regarding your question about LLDP and IPv6...

The default Ubuntu 18.04.3 configuration has an IPv6 enabled kernel, but
the interface only has the default link local address configured. I've
seen it do router solicitation on link state changes and periodically
thereafter. I think I recall seeing somewhere above that you were going
to try without IPv6. Have you seen different behavior with IPv6
disabled?

I don't expect to see LLDP because I have the two NICs wired directly to
each other, back to back. I'm not running any LLDP daemon on the host.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-14 Thread Edwin Peer
The tpa_aborts shouldn't be a concern. They merely indicate that a TCP
flow could not be aggregated. That could have a performance impact, of
course, but that should manifest as counted drops somewhere if this were
the case.

Importantly, the tpa_aborts only apply to TCP traffic, but you see the
problem for ICMP and UDP too.

Note, the tpa_aborts also appear to be evident on the primary as active
interface while things are working as expected. A difference in
magnitude tpa_aborts from one test run to another may be a clue about
something else that's happening though, but I'm not sure that we are
comparing apples to apples with respect the ethtool -S dumps posted thus
far (when were they captured relative to the test runs, which interface
was active at the time, etc?).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-14 Thread Nivedita Singhvi
Edwin,

Do you happen to notice any IPv6 or LLDP or other link-local traffic
on the interfaces? (including backup interface). 

The MTR loss  % is purely a capture of their packets xmitted 
and responses received, so for that UDP MTR test, this is saying
that UDP packets were lost, somewhere. 

The NIC does not have any drops showing via ethtool -S 
stats but I'm hunting down which are the right pair of before/afters.

Other than the tpa_abort counts, there were no errors that I saw.
I can't tell what the tpa_abort means for the frame - is it purely 
a failure only to coalesce, or does it end up dropping packets at
some point in that functionality? I'm assuming not, as whatever the
reason, those would be counted as drops, I hope, and printed in 
the interface stats. 

I'll attach all the stats here once I get them sorted out, I thought
I had a clean diff of before and after from the tester, but after
looking through, I don't think the file I have is from before/after
the mtr test, as there was negligible UDP traffic. I'll try and
get clarification from the reporter. 

Note that when the provision of primary= is used to configure 
which interface is primary, and when the primary port is used
as the active interface for the bond, no problems are seen (and
that works deterministically to set the correct active interface).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-13 Thread Edwin Peer
I have tried, unsuccessfully, to reproduce this issue internally.
Details of my setup below.

1) I have a pair of Dell R210 servers racked (u072 and u073 below), each
with a BCM57416 installed:

root@u072:~# lspci | grep BCM57416
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
01:00.1 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)

2) I've matched the firmware version to one that Nivedita reported in a
bad system:

root@u072:~# ethtool -i enp1s0f0np0
driver: bnxt_en
version: 1.10.0
firmware-version: 214.0.253.1/pkg 21.40.25.31
expansion-rom-version: 
bus-info: :01:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no

3) Matched Ubuntu release and kernel version:

root@u072:~# lsb_release -dr
Description:Ubuntu 18.04.3 LTS
Release:18.04

root@u072:~# uname -a
Linux u072 5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

4) Configured the interface into an active-backup bond:

root@u072:~# cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp1s0f1np1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp1s0f1np1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:0a:f7:a7:10:61
Slave queue ID: 0

Slave Interface: enp1s0f0np0
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 1
Permanent HW addr: 00:0a:f7:a7:10:60
Slave queue ID: 0

5) Run the provided mtr and netperf test cases with the 1st port
selected as active:

root@u072:~# ip l set enp1s0f1np1 down
root@u072:~# ip l set enp1s0f1np1 up
root@u072:~# cat /proc/net/bonding/bond0 | grep Active
Currently Active Slave: enp1s0f0np0

a) initiated on u072:

root@u072:~# mtr --no-dns --report --report-cycles 60 192.168.1.2
Start: 2020-02-13T20:48:01+
HOST: u072Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.20.0%600.2   0.2   0.2   0.2   0.0

root@u072:~# netperf -t TCP_RR -H 192.168.1.2 -- -r 1,1
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.1.2 () port 0 AF_INET : demo : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  131072 11   10.0029040.91   
16384  87380

root@u072:~# netperf -t TCP_RR -H 192.168.1.2 -- -r 64,64
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.1.2 () port 0 AF_INET : demo : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  131072 64   64  10.0028633.36   
16384  87380 

root@u072:~# netperf -t TCP_RR -H 192.168.1.2 -- -r 128,8192
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.1.2 () port 0 AF_INET : demo : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  131072 128  819210.0017469.30   
16384  87380

b) initiated on u073:

root@u073:~# mtr --no-dns --report --report-cycles 60 192.168.1.1
Start: 2020-02-13T20:53:37+
HOST: u073Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- 192.168.1.10.0%600.1   0.1   0.1   0.2   0.0

root@u073:~# netperf -t TCP_RR -H 192.168.1.1 -- -r 1,1
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.1.1 () port 0 AF_INET : demo : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  11   10.0028514.93   
16384  131072

root@u073:~# netperf -t TCP_RR -H 192.168.1.1 -- -r 64,64
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.1.1 () port 0 AF_INET : demo : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 
bytes  Bytes  bytesbytes   secs.per sec   

16384  87380  64   64  10.0027405.88   
16384  131072

root@u073:~# netperf -t TCP_RR -H 192.168.1.1 -- -r 128,8192
MIGRATED TCP REQUEST/RESPONSE TEST from 0.0.0.0 (0.0.0.0) port 0 AF_INET to 
192.168.1.1 () port 0 AF_INET : demo : first burst 0
Local /Remote
Socket Size   Request  Resp.   Elapsed  Trans.
Send   Recv   Size SizeTime Rate 

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-12 Thread Nivedita Singhvi
Additional observations.

MAAS is being used to deploy the system and configure
the bond interface and settings.

MAAS allows you to specify which is the primary interface, with
the other being the backup, for the active-backup bonding mode.
However, it does not appear to be working -it's not passing along 
a primary primitive, for instance, in the netplan yaml or otherwise 
resulting in this being honored (still need to confirm).

MAAS allows you to enter a mac address for the bond interface,
but if not supplied, by default it will use the mac address of
the "primary" interface, as configured.

MAAS then populates the /etc/netplan/50-cloud-init.yaml, including
a macaddr= line with the default.

netplan then passes that along to systemd-networkd.

The bonding kernel, however, will use as the active interface
whichever interface is first attached to the bond (i.e., which
completes getting attached to the bond interface first) in the
absence of a primary= directive.

The bonding kernel will, however, use the mac addr supplied
as an override.

So let's say the active interface was configured in MAAS to be
f0, and it's mac is used to be the mac address of the bond,
but f1 (the second port of the NIC) actually gets attached
first to the bond and is used as the active interface by the
bond.

We have a situation where f0 = backup, f1 = active, and bond0
is using the mac of f0. While this should work, there is a
potential for problems depending on the circumstances.

It's likely this has nothing to do with our current issue, but
here for completeness. Will see if we can test/confirm.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi
Edwin, let me know if you can get in touch with me via the contact email 
on my Launchpad page. Thanks for all the help!

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi
** Attachment added: "ethtool -S for inactive interface enp94s0f0"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+attachment/5327556/+files/ethtool-S-enp94s0f0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi
ethtool-enp94s0f0
--
Settings for enp94s0f0:
Supported ports: [ FIBRE ]
Supported link modes:   1baseT/Full 
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 1Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 1
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: g
Wake-on: d
Current message level: 0x (0)
   
Link detected: yes

ethtool-i-enp94s0f0
--
driver: bnxt_en
version: 1.10.0
firmware-version: 214.0.253.1/pkg 21.40.25.31
expansion-rom-version: 
bus-info: :5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no


ethtool-c-enp94s0f0
-
Coalesce parameters for enp94s0f0:
Adaptive RX: off  TX: off
stats-block-usecs: 100
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 10
rx-frames: 15
rx-usecs-irq: 1
rx-frames-irq: 1

tx-usecs: 28
tx-frames: 30
tx-usecs-irq: 2
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

ethtool-g-enp94s0f0

Ring parameters for enp94s0f0:
Pre-set maximums:
RX: 2047
RX Mini:0
RX Jumbo:   8191
TX: 2047
Current hardware settings:
RX: 511
RX Mini:0
RX Jumbo:   2044
TX: 511

ethtool-k-enp94s0f0
-
Features for enp94s0f0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: on
tls-hw-record: off [fixed]

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi
"Bad" System/NIC:

NIC: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
System: Dell
Kernel: 5.3.0-28-generic #30~18.04.1-Ubuntu

(Note, this issue has been seen on prior kernels as well, upgraded
 to latest to see if various problems were resolved)


Attaching stats/config files from nics from this system (seeing issue).

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi
Good System/Good NIC (all configurations work) Comparison

NIC: NetXtreme II BCM57000 10 Gigabit Ethernet QLogic 57000
System: Dell
Kernel: 5.0.0-25-generic #26~18.04.1-Ubuntu


/proc/net/bonding/bond0
---
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp5s0f1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp5s0f1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:00:00:00:73:e2
Slave queue ID: 0

Slave Interface: enp5s0f0
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:00:00:00:73:e0
Slave queue ID: 0


/etc/netplan/50-cloud-init.yaml

network:
bonds:
bond0:
addresses:
- 00.00.235.182/25
gateway4: 00.00.235.129
interfaces:
- enp5s0f0
- enp5s0f1
macaddress: 00:00:00:00:73:e0
mtu: 9000
nameservers:
addresses:
- 00.00.235.172
- 00.00.235.171
search:
- maas
parameters:
down-delay: 0
gratuitious-arp: 1
mii-monitor-interval: 100
mode: active-backup
transmit-hash-policy: layer2
up-delay: 0
ethernets:
...(snip)..
enp5s0f0:
match:
macaddress: 00:00:00:00:73:e0
mtu: 9000
set-name: enp5s0f0
enp5s0f1:
match:
macaddress: 00:00:00:00:73:e2
mtu: 9000
set-name: enp5s0f1
version: 2

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi
"Bad" Configuration for active-backup mode:



$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
  
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp94s0f1d1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: enp94s0f1d1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 4c:d9:8f:48:08:da
Slave queue ID: 0

Slave Interface: enp94s0f0
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 4c:d9:8f:48:08:d9
Slave queue ID: 0

---
$ cat uname-rv 
5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020

---
Scrubbed /etc/netplan/50-cloud-init.yaml:
network:
bonds:
bond0:
addresses:
- 0.0.235.177/25
gateway4: 0.0.235.129
interfaces:
- enp94s0f0
- enp94s0f1d1
macaddress: 00:00:00:48:08:00
mtu: 9000
nameservers:
addresses:
- 0.0.235.171
- 0.0.235.172
search:
- maas
parameters:
down-delay: 0
gratuitious-arp: 1
mii-monitor-interval: 100
mode: active-backup
transmit-hash-policy: layer2
up-delay: 0
ethernets:
eno1:
match:
macaddress: 00:00:00:76:6e:ca
mtu: 1500
set-name: eno1
eno2:
match:
macaddress: 00:00:00:76:6e:cb
mtu: 1500
set-name: eno2
enp94s0f0:
match:
macaddress: 00:00:00:48:08:00
mtu: 9000
set-name: enp94s0f0
enp94s0f1d1:
match:
macaddress: 00:00:00:48:08:da
mtu: 9000
set-name: enp94s0f1d1
version: 2

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Edwin Peer
Hi Nivedita,

I have been away on PTO the last week and am picking this up again now.
Please could you post the full bonding configuration?

Regards,
Edwin Peer

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-10 Thread Nivedita Singhvi
We have narrowed it down to a flaw in a specific configuration setting
on this NIC, so we're comparing the good and bad configurations now.

Primary port: enp94s0f0
Secondary port: enp94s0f1d1


A] Good config for fault-tolerance (active-backup) bonding mode:
--
Primary port = active interface; Secondary port = backup

B] Bad config for fault-tolerance (active-backup) bonding mode:
--
Primary port = backup interface; Secondary port = active


We are consistently able to reproduce a drop rate difference
with UDP pkts, for the above good/bad cases:


Good Case UDP MTR Test Result
-
mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST
Start: 2020-02-10T10:14:01+
HOST: hostname Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- nn.nn.nnn.nnn  0.0%600.3   0.2   0.2   0.3   0.0


Bad Case UDP MTR Test Result
---
mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST
Start: 2020-02-10T14:10:52+
HOST: hostname Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- nn.nn.nnn.nnn  8.3%600.3   0.3   0.2   0.4   0.0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-10 Thread Nivedita Singhvi
The second port on the NIC definitely works as the active
interface in an active-backup bonding configuration on the
other NICs.

At the moment, it's only this particular NIC that is seeing
this problem that we know of.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-09 Thread Nivedita Singhvi
Hello Edwin,

Here is more information on the issue we are seeing wrt dropped 
packets and other connectivity issues with this NIC.

The problem is *only* seen when the second port on the NIC is 
chosen as the active interface of a active-backup configuration. 

So on the "bad" system with the interfaces:

enp94s0f0 -> when chosen as active, all OK
enp94s0f1d1 -> when chosen as active, not OK

I'll see if the reporters can confirm that on the "good" systems,
there was no problem when the second interface is active.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-04 Thread Nivedita Singhvi
Hey Edwin, sorry, I didn't see your last question.

I'll try and confirm but I've seen loss in both 
directions but it's not clear whether that's significant
enough or not yet. 

e.g., TCP traffic is retransmitted, so it could be segments
lost while outgoing or acks lost incoming. 

4407 retransmitted TCP segments 
130 TCP timeouts

in stats collected about 5 mins apart - which isn't 
sufficient a sample size, we're trying to get a new 
collection of stats, logs using the netperf TCP_RR test.

In our case, note, we're more concerned (and have more solid
data) of latency issues than dropped packets (which I expect
some of with heavy network testing). 

For example, netperf TCP_RR latency is about 70-78% of the older
systems for 1,1 request/response byte sizes as well as 64/64, 
100/200, 128/8192 sizes.

I'll update here as soon as we have more data from the production
environment.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Edwin Peer
I don't think bnxt_en exposes the disable_tpa parameter. Be that as it
may, I think the tpa_aborts may be a red herring. TPA aggregates TCP
flows and you are seeing the issue with ICMP.

In which direction(s) of traffic flow do you see the losses?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi
> NICs between systems? Are OS / kernel and driver
> versions the same on both systems? 

Yes, identical distro release, kernel, and most of the software
stack (I have not obtained and examined the full sw stack). 

Configuration of networking settings is also the same.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi
> There are more than one variable at play here. 
> Does the problem follow the NIC if you swap the 
> NICs between systems? Are OS / kernel and driver 
> versions the same on both systems?

Unfortunately, I've not been able to get them to try
permutations or switches, as yet, as this is still a
production system/environment. 

I'll try and obtain more information about it.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi
Thanks very much for helping on this, Edwin! Please let me
know if there's anything specific you need. 

I'm asking them to disable any IPv6, LLDP traffic in their environment,
and retest and collect information again.

Also, I'd like to disable tpa, would this be at all useful:

modprobe bnx disable_tpa=1

??

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi
> The mtr packet loss is an interesting result. What mtr options did you
use? Is this a UDP or ICMP test?


The mtr command was:

mtr --no-dns --report --report-cycles 60 $IP_ADDR

so ICMP was going out.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-30 Thread Edwin Peer
> 3. mtr ping test
> ---
> GoodSystem..0.0% Loss; 0.2 Avg; 0.1 Best, 0.9 Worst, 0.1 StdDev
> BadSystem2...11.7% Loss; 0.1 Avg; 0.1 Best, 0.2 Worst, 0.0 StdDev

The mtr packet loss is an interesting result. What mtr options did you
use? Is this a UDP or ICMP test?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-30 Thread Edwin Peer
With respect to one of these situations, this is the following system:

> Dell PowerEdge R440/0XP8V5, BIOS 2.2.11 06/14/2019
> 
> Note that a similar system does not have any issues:
> 
> Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.3.4 11/08/2016
> 
> So the NIC in the "bad" environment is:
> 
> BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
> Product Name: Broadcom Adv. Dual 10G SFP+ Ethernet
> 
> The NIC in the "good" environment is:
> 
> Broadcom Inc. and subsidiaries NetXtreme II BCM57810
> 10 Gigabit Ethernet [14e4:1006]
> Product Name: QLogic 57810 10 Gigabit Ethernet

There are more than one variable at play here. Does the problem follow
the NIC if you swap the NICs between systems? Are OS / kernel and driver
versions the same on both systems?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-30 Thread diarmuid
Here is the Ettus benchmark tool
https://kb.ettus.com/Verifying_the_Operation_of_the_USRP_Using_UHD_and_GNU_Radio

You would need an Ettus device to run those tests.

I cant test the affected node now as it is in production unfortunately.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi
** Attachment added: "active interface ethtool-S"
   
https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1853638/+attachment/5324070/+files/ethtool-S-enp94s0f0

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi
** Attachment added: "backup interface ethtool-S"
   
https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1853638/+attachment/5324071/+files/ethtool-S-enp94s0f1d1

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi
Note that iperf was identical whereas netperf and mtr showed
up differences (so it's possibly sporadic as well, not continuous)


1. iperf tcp test
--
GoodSystem.9.84 Gbits/sec
BadSystem18.37 Gbits/sec
BadSystem2...9.85 Gbits/sec


2. iperf udp test
--
GoodSystem.1.05 Mbits/sec
BadSystem2...1.05 Mbits/sec


3. mtr ping test
---
GoodSystem..0.0% Loss; 0.2 Avg; 0.1 Best, 0.9 Worst, 0.1 StdDev
BadSystem2...11.7% Loss; 0.1 Avg; 0.1 Best, 0.2 Worst, 0.0 StdDev


4. netperf tcp_rr 1/1 bytes

GoodSystem..17921.83 t/sec
BadSystem1.13912.45 t/sec
BadSystem2


5. netperf tcp_rr 64/64 bytes

GoodSystem..16987.48 t/sec
BadSystem1.13355.93 t/sec
BadSystem2


6. netperf tcp_rr 128/8192 bytes
---
GoodSystem..2396.45 t/sec
BadSystem1.1678.54 t/sec
BadSystem2

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi
Hello, Edwin,

We have two separate users/customers filing reports, and I can answer for
one of them. I'll ask the original poster separately as well to reply.

With respect to one of these situations, this is the following system:

Dell PowerEdge R440/0XP8V5, BIOS 2.2.11 06/14/2019

Note that a similar system does not have any issues:

Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.3.4 11/08/2016

So the NIC in the "bad" environment is:

BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
Product Name: Broadcom Adv. Dual 10G SFP+ Ethernet

The NIC in the "good" environment is:

Broadcom Inc. and subsidiaries NetXtreme II BCM57810
10 Gigabit Ethernet [14e4:1006]
Product Name: QLogic 57810 10 Gigabit Ethernet

I'll have to scrub some files and see what I can attach,
apologies, I'll have it here by tmrw. 

Unfortunately, we don't have an easy reproducer.
A single iperf and netperf test (both UDP and TCP) showed identical
results from both "good" and "bad" environments.

What we have is an identical kernel, network configuration and
stack with the "bad" system showing double, triple the latency
to the systems from a remote server. I'll have more information
for you shortly here regarding the exact k8 cmd.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-28 Thread Edwin Peer
I am an engineer at Broadcom and have been assigned to investigate this
issue. To that end, I have a few clarifying questions:

1a) What is the benchmark tool you are using and could you provide a
link to where I can get it?

 b) What kind of network traffic is it sending?

2a) In what units are the data rate parameters "--rx_rate 10e6 --tx_rate
10e6" specified?

 b) What data rate are you attempting to send? The report notes that the
platform can't be the issue at 1G, but are you attempting to utilize
10G?

3) Perhaps stating the obvious here, but has anybody looked into the
warning?

"[WARNING] [UHD] Unable to set the thread priority. Performance may be
negatively affected."

which is probably related to this error:

"EnvironmentError: OSError: error in pthread_setschedparam"

4 a) I am personally unfamiliar with the USRP x310, could you provide
some more information about it? Googling for it seems to indicate it is
some kind of software defined radio platform?

  b) Is there a way to get access to one to reproduce and diagnose this
issue?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-28 Thread Edwin Peer
Could you also please dump the ethtool statistics for the NIC?

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-17 Thread Nivedita Singhvi
(active interface)

> cat ethtool-S-enp94s0f1d1 | grep abort
 [0]: tpa_aborts: 19775497
 [1]: tpa_aborts: 26758635
 [2]: tpa_aborts: 12008147
 [3]: tpa_aborts: 15829167
 [4]: tpa_aborts: 25099500
 [5]: tpa_aborts: 3292554
 [6]: tpa_aborts: 2863692
 [7]: tpa_aborts: 20224692


(backup interface)
> cat ethtool-S-enp94s0f0 | grep abort
 [0]: tpa_aborts: 3158584
 [1]: tpa_aborts: 1670319
 [2]: tpa_aborts: 1749371
 [3]: tpa_aborts: 1454301
 [4]: tpa_aborts: 123020
 [5]: tpa_aborts: 1403509
 [6]: tpa_aborts: 1298383
 [7]: tpa_aborts: 1858753

Netted out from previous capture, there were

*f0 = 2014 tpa_aborts
*d1 = 1118473 tpa_aborts


** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

** Changed in: linux (Ubuntu)
   Importance: Undecided => Critical

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-17 Thread Nivedita Singhvi
We suspect this is a device (hw/fw) issue, however, not NetworkManager 
or kernel (driver bnxt_en). I've added the kernel for the driver impact 
(just in case, for now). This is really to eliminate all other causes
and confirm whether it's the device at root cause). 

NIC

Product Name: Broadcom Adv. Dual 10G SFP+ Ethernet
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries 
BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)

NIC Driver/FW
---
driver: bnxt_en
version: 1.10.0
firmware-version: 214.0.253.1/pkg 21.40.25.31
expansion-rom-version:
bus-info: :5e:00.1
supports-statistics: yes

Kernel
-
5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019

(appears to be an issue on all kernel versions)

Environment Configuration
-
active-backup bonding mode

(having the active backup up *might* potentially be the problem, 
 but it might just be the device itself).


The exact same distro, kernel, applications and configuration
works fine with a different NIC (Broadcom 10g bnx2x).

There were quite a few total tpa_abort stats counts (1118473)
during the duration of a 2 minute iperf test. 

Hoping to get more information from other users seeing the
same issue.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-17 Thread Nivedita Singhvi
I have reports of the same device appearing to drop packets and incur
greater number of retransmissions under certain circumstances which
we're still trying to nail down.

I'm using this bug for now until proven to be a different problem.

This is causing issues in a production environment.


** Changed in: network-manager (Ubuntu)
   Status: New => Confirmed

** Changed in: network-manager (Ubuntu)
   Importance: Undecided => Critical

** Tags added: sts

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+subscriptions

-- 
ubuntu-bugs mailing list
ubuntu-bugs@lists.ubuntu.com
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs