from:"Nivedita Singhvi"

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-07-22 Thread Nivedita Singhvi

Verified on Xenial

** Tags removed: verification-needed-xenial
** Tags added: verification-done-xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  Fix Committed
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add

[Kernel-packages] [Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus

2019-04-02 Thread Nivedita Singhvi

** Changed in: linux (Ubuntu)
   Status: In Progress => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels).

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  It requires the following commit as well:

  i40e: Do not allow use more TC queue pairs than MSI-X vectors exist
  Commit:   1563f2d2e01242f05dd523ffd56fe104bc1afd58

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
     i40e driver version: 2.1.14-k
     Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:

  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,,

    But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)

2019-04-07 Thread Nivedita Singhvi

I have installed and booted to this kernel, and ensured no
new regression introduced, although I cannot repro the issue.


** Tags removed: 4.15.0-24-generic cosmic kernel verification-needed-bionic 
verification-needed-cosmic
** Tags added: verification-done-bionic verification-done-cosmic

** Description changed:

  [Impact]
  The i40e driver can get stalled on tx timeouts. This can happen when
  DCB is enabled on the connected switch. This can also trigger a
  second situation when a tx timeout occurs before the recovery of
  a previous timeout has completed due to CPU load, which is not
  handled correctly. This leads to networking delays, drops and
  application timeouts and hangs. Note that the first tx timeout
  cause is just one of the ways to end up in the second situation.
  
  This issue was seen on a heavily loaded Kafka broker node running
- the 4.15.0-38-generic kernel on Xenial. 
+ the 4.15.0-38-generic kernel on Xenial.
  
  Symptoms include messages in the kernel log of the form:
  
  ---
  [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
  [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
  
  
  With the test kernel provided in this LP bug which had these
  two commits compiled in, the problem has not been seen again,
  and has been running successfully for several months:
  
- "i40e: Fix for Tx timeouts when interface is brought up if 
-  DCB is enabled"
+ "i40e: Fix for Tx timeouts when interface is brought up if
+  DCB is enabled"
  Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee
  
  "i40e: prevent overlapping tx_timeout recover"
  Commit: d5585b7b6846a6d0f9517afe57be3843150719da
  
  * The first commit is already in Disco, Cosmic
  * The second commit is already in Disco
  * Bionic needs both patches and Cosmic needs the second
  
  [Test Case]
  * We are considering the case of both issues above occurring.
  * Seen by reporter on a Kafka broker node with heavy traffic.
- * Not easy to reproduce as it requires something like the 
-   following example environment and heavy load:
+ * Not easy to reproduce as it requires something like the
+   following example environment and heavy load:
  
-   Kernel: 4.15.0-38-generic
-   Network driver: i40e
- version: 2.1.14-k
- firmware-version: 6.00 0x800034e6 18.3.6
-   NIC: Intel 40Gb XL710 
-   DCB enabled
- 
+   Kernel: 4.15.0-38-generic
+   Network driver: i40e
+ version: 2.1.14-k
+ firmware-version: 6.00 0x800034e6 18.3.6
+   NIC: Intel 40Gb XL710
+   DCB enabled
  
  [Regression Potential]
  Low, as the first only impacts i40e DCB environment, and has
- been running for several months in production-load testing 
+ been running for several months in production-load testing
  successfully.
- 
  
  --- Original Description
  Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to 
the Kernel 4.15.0-24-generic.
  
  On a "Dell PowerEdge R330" server with a network adapter "Intel Ethernet
  Converged Network Adapter X710-DA2" (driver i40e) the network card no
  longer works and permanently displays these three lines :
  
  [   98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed

Bug description:
  [Impact]
  The i40e driver can get stalled on tx timeouts. This can happen when
  DCB is enabled on the connected switch. This can also trigger a
  second situation when a tx timeout occurs before the recovery of
  a previous timeout has completed due to CPU load, which is not
  handled correctly. This leads to networking delays, drops and
  application timeouts and hangs. Note that the first tx timeout
  cause is just one of the ways to end up in the second situation.

  This issue was seen on a heavily loaded Kafka broker node running
  the 4.15.0-38-generic kernel on Xenial.

  Symptoms include messages in the kernel log of the form:

  ---
  [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
  [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
  

  With the test kernel provided in this LP bug which had these
  two commits compiled in, the problem has not been seen again,
  and has been running successfully for several months

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-04-16 Thread Nivedita Singhvi

We had tested a patch discussed above and tested internally,
with success - although we have limited testing (opening up
a geneve tunnel between 2 kvm guests). 

Jiri has now pushed an identical patch upstream which is 
available in the v5.0 kernel and later.

"geneve: correctly handle ipv6.disable module parameter"  
Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
 

Although I do not have testing validation from original 
poster, since it has been committed upstream, I'm going
to go ahead and get the SRU request started. 


** Changed in: linux (Ubuntu)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu)
   Importance: Medium => High

** Also affects: linux (Ubuntu Xenial)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Cosmic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Disco)
   Importance: High
   Status: In Progress

** Changed in: linux (Ubuntu Cosmic)
   Status: New => In Progress

** Changed in: linux (Ubuntu Disco)
 Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi)

** Changed in: linux (Ubuntu Cosmic)
 Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi)

** Changed in: linux (Ubuntu Xenial)
 Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi)

** Changed in: linux (Ubuntu Xenial)
   Status: New => In Progress

** Changed in: linux (Ubuntu Cosmic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

** Description changed:

  [Impact]
  
- When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in 
- an OS environment with open vswitch, where ipv6 has been disabled, 
+ When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
+ an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :
  
- “ovs-vsctl: Error detected while setting up 'geneve0': could not 
- add network device geneve0 to ofproto (Address family not supported 
+ “ovs-vsctl: Error detected while setting up 'geneve0': could not
+ add network device geneve0 to ofproto (Address family not supported
  by protocol)."
  
-  
+ [Fix]
+ There is an upstream commit for this in v5.0 mainline.
+ 
+ "geneve: correctly handle ipv6.disable module parameter"
+ Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
+ 
+ This fix is needed on all our series: X, C, B, D
+ 
+ 
  [Test Case]
- (Best to do this on a kvm guest VM so as not to interfere with  
-  your system's networking)
+ (Best to do this on a kvm guest VM so as not to interfere with
+  your system's networking)
  
  1. On any Ubuntu Xenial kernel, disable ipv6. This example
-is shown with the4.15.0-23-generic kernel (which differs
-slightly from 4.4.x in symptoms):   
-   
+    is shown with the4.15.0-23-generic kernel (which differs
+    slightly from 4.4.x in symptoms):
+ 
  - Edit /etc/default/grub to add the line:
- GRUB_CMDLINE_LINUX="ipv6.disable=1"
+ GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot
- 
  
  2. Install OVS
  # apt install openvswitch-switch
  
  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
- # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 
+ # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z
  
  (where remote_ip is the IP of the other host)
  
  You will see the following error message:
  
- "ovs-vsctl: Error detected while setting up 'geneve1'. 
+ "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."
  
  From /var/log/openvswitch/ovs-vswitchd.log you will see:
  
- "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: 
- failed to add geneve1 as port: Address family not supported 
+ "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
+ failed to add geneve1 as port: Address family not supported
  by protocol"
  
- You will notice from the "ifconfig" output that the device 
+ You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.
  
- If you do not disable IPv6 (remove ipv6.disable=1 from 
- /etc/default/grub + update-grub + reboot), the same 
- 'ovs-vsctl add-port' command completes successfully. 
- You can see that it is working properly by adding an 
- IP to the br1 and pinging each host. 
+ If you do not disable IPv6 (remove ipv6.disable=1 from
+ /etc/default/grub + update-grub + reboot), the same
+ 'ovs-vsctl add-port' command completes successfully.
+ You can see that it is working properly by adding an
+ IP to the br1 and pinging each host.
  
- On kernel 4.4 (4.4.0-128-generic), the error message doesn't 
- happen using the 'ovs-vsctl add-port' command, no warning is 
- shown in ovs-vswitchd.log, but the device g

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-04-16 Thread Nivedita Singhvi

** Changed in: linux (Ubuntu Disco)
   Status: In Progress => Fix Released

** Description changed:

  [Impact]
  
  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :
  
  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."
  
  [Fix]
- There is an upstream commit for this in v5.0 mainline. 
+ There is an upstream commit for this in v5.0 mainline.
  
  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  
- This fix is needed on all our series: X, C, B, D. It is identical
+ This fix is needed on all our series prior to Disco and the v5.0 kernel: X, 
C, B. It is identical
  to the fix we implemented and tested internally with, but
- had not pushed upstream yet. 
- 
+ had not pushed upstream yet.
  
  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)
  
  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):
  
  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot
  
  2. Install OVS
  # apt install openvswitch-switch
  
  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z
  
  (where remote_ip is the IP of the other host)
  
  You will see the following error message:
  
  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."
  
  From /var/log/openvswitch/ovs-vswitchd.log you will see:
  
  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"
  
  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.
  
  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.
  
  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.
  
- With the fixed test kernel, the interfaces and tunnel 
+ With the fixed test kernel, the interfaces and tunnel
  is created successfully.
- 
  
  [Other Info]
  
  * Analysis
  
  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.
  
  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:
  
  rather than:
  
  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6
  
  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :
  
  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...
  
  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif
  if (!ret && (!ipv6 || metadata))
  ret = geneve_sock_add(geneve, false);
  
  CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but
  even though ipv6 is false, metadata is always true
  for a geneve open as it is set unconditionally in
  ovs:
  
  In /lib/dpif_netlink_rtnl.c :
  
  case OVS_VPORT_TYPE_GENEVE:
  nl_msg_put_flag(&request, IFLA_GENEVE_COLLECT_METADATA);
  
  The second argument of geneve_sock_add is a boolean
  value indicating whether it's an ipv6 address family
  socket or not, and we thus incorrectly pass a true
  value rather than false.
  
  The current "|| metadata" check is unnecessary and incorrectly
  sends the tunnel creation code down the ipv6 path, which
  fails subsequently when the code expects an ipv6 family socket.
  
  * This issue exists in all versions of the kernel upto present
     mainline and net-next trees.
  
  * Testing with a trivial patch to remove that and make
    similar changes to those made for vxlan (which had the
    same issue) has been successful. Patches for various
    versions to be attached here soon.
  
  * Example Versions (bug exists in all versions of Ubuntu
    and mainline):
  
  $ uname -r
  4.4.0-135-generic
  
  $ lsb_release -rd
  Description:  Ubuntu 16.04.5 LTS
  Release:  16.04
  
  $ dpkg -l | grep openvswitch-switch
  ii  openvswitch-switch   2.5.4-0ubuntu0.16.04.1

** Description changed:

  [Impact]
  
  When attempting t

[Kernel-packages] [Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384

2019-08-23 Thread Nivedita Singhvi

*** This bug is a duplicate of bug 1837664 ***
https://bugs.launchpad.net/bugs/1837664

I'm not sure this bug should be DUP'd to the stable-release
bug. Might confuse the verification and handling triggers, 
perhaps?

Will need to make sure the fix is tested once the fix is
uploaded.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840046

Title:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [impact]

  This message is printed repeatedly in the logs:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

  [test case]

  boot the 4.15.0-58 kernel on s390x

  [regression potential]

  this affects task pud accounting; regressions may be around cleaning
  up task memory.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384

2019-08-23 Thread Nivedita Singhvi

*** This bug is a duplicate of bug 1837664 ***
https://bugs.launchpad.net/bugs/1837664

I'll unDUP it unless the kernel team says otherwise in IRC.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840046

Title:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [impact]

  This message is printed repeatedly in the logs:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

  [test case]

  boot the 4.15.0-58 kernel on s390x

  [regression potential]

  this affects task pud accounting; regressions may be around cleaning
  up task memory.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384

2019-08-26 Thread Nivedita Singhvi

** This bug is no longer a duplicate of bug 1837664
   Bionic update: upstream stable patchset 2019-07-23

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840046

Title:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [impact]

  This message is printed repeatedly in the logs:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

  [test case]

  boot the 4.15.0-58 kernel on s390x

  [regression potential]

  this affects task pud accounting; regressions may be around cleaning
  up task memory.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384

2019-08-26 Thread Nivedita Singhvi

I unduped it for test process clarity.

Trying to get the relevant people to test the fix.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840046

Title:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [impact]

  This message is printed repeatedly in the logs:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

  [test case]

  boot the 4.15.0-58 kernel on s390x

  [regression potential]

  this affects task pud accounting; regressions may be around cleaning
  up task memory.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384

2019-08-26 Thread Nivedita Singhvi

I'll update here once kernel is uploaded.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840046

Title:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [impact]

  This message is printed repeatedly in the logs:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

  [test case]

  boot the 4.15.0-58 kernel on s390x

  [regression potential]

  this affects task pud accounting; regressions may be around cleaning
  up task memory.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1840704] Re: ZFS kernel modules lack debug symbols

2019-08-29 Thread Nivedita Singhvi

** Tags added: sts

** Tags added: linux

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840704

Title:
  ZFS kernel modules lack debug symbols

Status in linux package in Ubuntu:
  In Progress

Bug description:
  The ZFS kernel modules aren't built with debug symbols,
  which introduces problems/issues for debugging/support.

  Patches are required in:

  1) linux kernel packaging, to add infrastructure to
     enable/build/strip/package debug symbols on DKMS.
     (this is sufficient with zfs-linux now in Eoan.)

  2) zfs-linux and spl-linux, for the stable releases,
     which need a few patches to enable debug symbols
 (add option './configure --enable-debuginfo' and
 '(ZFS|SPL)_DKMS_ENABLE_DEBUGINFO' to dkms.conf.)

  Initially submitting the kernel patchset for Unstable,
  for review/feedback.  It backports nicely into B/D/E,
  should it be accepted; for X (doesn't use DKMS builds)
  a simpler patch for the moment (until it does) works.

  The zfs/spl-linux patches are ready, to be submitted
  once the approach used by the kernel package settles.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840704/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1840789] Re: bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

2019-08-29 Thread Nivedita Singhvi

** Tags added: sts

** Changed in: linux (Ubuntu Xenial)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Xenial)
   Importance: High => Critical

** Changed in: linux (Ubuntu Bionic)
   Importance: High => Critical

** Changed in: linux (Ubuntu Disco)
   Importance: Undecided => Critical

** Changed in: linux (Ubuntu Eoan)
   Importance: Undecided => Critical

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840789

Title:
  bnx2x: fatal hardware error/reboot/tx timeout with LLDP enabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  Fix Released

Bug description:
  [Impact]

   * The bnx2x driver may cause hardware faults (leading to
     panic/reboot) and other behaviors as transmit timeouts,
     after commit 3968d38917eb ("bnx2x: Fix Multi-Cos.") is
     introduced.

   * This issue has been observed by an user shortly
     after starting docker & kubelet, with adapters:
     - Broadcom NetXtreme II BCM57800 [14e4:168a] from Dell [1028:1f5c]
     - Broadcom NetXtreme II BCM57840 [14e4:16a1] from Dell [1028:1f79]

   * If options to ignore hardware faults are used
     (erst_disable=1 hest_disable=1 ghes.disable=1)
     the system doesn't panic/reboot and continues
     on to timeout on adapter stats, then transmit
     timeouts, spewing some adapter firmware dumps,
     but the network interface is non-functional.

   * The issue only happened when LLDP is enabled
     on the network switches, and crashdump shows
     the bnx2x driver is stuck/waits for firmware
     to complete the stop traffic command in LLDP
     handling. Workaround used is to disable LLDP
     in the network switches/ports.

   * Analysis of the driver and firmware dumps
     didn't help significantly towards finding
     the root cause.

   * Upstream/mainline recently just reverted the
     patch, due to similar problem reports, while
     looking for the root cause/proper fix.

  [Test Case]

   * No reproducible test case found outside
     the user's systems/cluster, where it is
     enough to start docker & kubelet & wait.

   * The user verified test kernels for Xenial
     and Bionic - the problem does not happen;
 build-tested on Disco.

  [Regression Potential]

   * Users who significantly use/apply the non-default
     traffic class (tc) / class of service (cos) might
     possibly see performance changes (if any at all)
     in such applications, however that's unclear now.

   * This is a recent revert upstream (v5.3-rc'ish),
     so there's chance things might change in this area.

   * Nonetheless, the patch is authored by the driver
     vendor, and made its way into stable kernels
     (e.g., v5.2.8 which made Eoan/19.10 recently).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840789/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-05-08 Thread Nivedita Singhvi

A 4.4 test kernel with the fix backported is available at:

https://people.canonical.com/~nivedita/geneve-xenial-test/

if anyone wishes to validate the 4.4 X solution.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;

[Kernel-packages] [Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus

2019-05-14 Thread Nivedita Singhvi

Late update, but the original reporter did test the proposed
kernel on systems able to reproduce the problem and were 
tested successfully.

We do not yet have a way of reproducing this on Xenial (i.e,
any 4.4 kernel). I'm still leaving this an open issue, will be
trying to do this and once we can confirm/test, will update
and push an SRU for Xenial as well.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels).

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  It requires the following commit as well:

  i40e: Do not allow use more TC queue pairs than MSI-X vectors exist
  Commit:   1563f2d2e01242f05dd523ffd56fe104bc1afd58

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
     i40e driver version: 2.1.14-k
     Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:

  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,,

    But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-05-21 Thread Nivedita Singhvi

Bionic, Cosmic kernels successfully tested. 
I've updated the tags.


** Tags removed: verification-needed-bionic verification-needed-cosmic
** Tags added: verification-done-bionic verification-done-cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-05-22 Thread Nivedita Singhvi

** Tags added: sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Cosmic:
  Fix Committed
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif
  if (!ret && (!ipv6 || metadata))

[Kernel-packages] [Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus

2019-05-22 Thread Nivedita Singhvi

** Tags added: sts

** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic verification-done-cosmic

** Tags removed: verification-done-cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels).

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  It requires the following commit as well:

  i40e: Do not allow use more TC queue pairs than MSI-X vectors exist
  Commit:   1563f2d2e01242f05dd523ffd56fe104bc1afd58

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
     i40e driver version: 2.1.14-k
     Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:

  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,,

    But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)

2019-05-22 Thread Nivedita Singhvi

** Tags added: sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released

Bug description:
  [Impact]
  The i40e driver can get stalled on tx timeouts. This can happen when
  DCB is enabled on the connected switch. This can also trigger a
  second situation when a tx timeout occurs before the recovery of
  a previous timeout has completed due to CPU load, which is not
  handled correctly. This leads to networking delays, drops and
  application timeouts and hangs. Note that the first tx timeout
  cause is just one of the ways to end up in the second situation.

  This issue was seen on a heavily loaded Kafka broker node running
  the 4.15.0-38-generic kernel on Xenial.

  Symptoms include messages in the kernel log of the form:

  ---
  [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
  [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
  

  With the test kernel provided in this LP bug which had these
  two commits compiled in, the problem has not been seen again,
  and has been running successfully for several months:

  "i40e: Fix for Tx timeouts when interface is brought up if
   DCB is enabled"
  Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee

  "i40e: prevent overlapping tx_timeout recover"
  Commit: d5585b7b6846a6d0f9517afe57be3843150719da

  * The first commit is already in Disco, Cosmic
  * The second commit is already in Disco
  * Bionic needs both patches and Cosmic needs the second

  [Test Case]
  * We are considering the case of both issues above occurring.
  * Seen by reporter on a Kafka broker node with heavy traffic.
  * Not easy to reproduce as it requires something like the
    following example environment and heavy load:

    Kernel: 4.15.0-38-generic
    Network driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
    NIC: Intel 40Gb XL710
    DCB enabled

  [Regression Potential]
  Low, as the first only impacts i40e DCB environment, and has
  been running for several months in production-load testing
  successfully.

  --- Original Description
  Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to 
the Kernel 4.15.0-24-generic.

  On a "Dell PowerEdge R330" server with a network adapter "Intel
  Ethernet Converged Network Adapter X710-DA2" (driver i40e) the network
  card no longer works and permanently displays these three lines :

  [   98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-05-22 Thread Nivedita Singhvi

** Tags added: sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Released

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)

2019-03-19 Thread Nivedita Singhvi

** Description changed:

- Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13
- to the Kernel 4.15.0-24-generic.
+ [Impact]
+ The i40e driver can get stalled on tx timeouts. This can happen when
+ DCB is enabled on the connected switch. This can also trigger a
+ second situation when a tx timeout occurs before the recovery of
+ a previous timeout has completed due to CPU load, which is not
+ handled correctly. This leads to networking delays, drops and
+ application timeouts and hangs. Note that the first tx timeout
+ cause is just one of the ways to end up in the second situation.
+ 
+ This issue was seen on a heavily loaded Kafka broker node running
+ the 4.15.0-38-generic kernel on Xenial. 
+ 
+ Symptoms include messages in the kernel log of the form:
+ 
+ ---
+ [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
+ [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
+ 
+ 
+ With the test kernel provided in this LP bug which had these
+ two commits compiled in, the problem has not been seen again,
+ and has been running successfully for several months:
+ 
+ "i40e: Fix for Tx timeouts when interface is brought up if 
+  DCB is enabled"
+ Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee
+ 
+ "i40e: prevent overlapping tx_timeout recover"
+ Commit: d5585b7b6846a6d0f9517afe57be3843150719da
+ 
+ * The first commit is already in Disco, Cosmic
+ * The second commit is already in Disco
+ * Bionic needs both patches and Cosmic needs the second
+ 
+ [Test Case]
+ * We are considering the case of both issues above occurring.
+ * Seen by reporter on a Kafka broker node with heavy traffic.
+ * Not easy to reproduce as it requires something like the 
+   following example environment and heavy load:
+ 
+   Kernel: 4.15.0-38-generic
+   Network driver: i40e
+ version: 2.1.14-k
+ firmware-version: 6.00 0x800034e6 18.3.6
+   NIC: Intel 40Gb XL710 
+   DCB enabled
+ 
+ 
+ [Regression Potential]
+ Low, as the first only impacts i40e DCB environment, and has
+ been running for several months in production-load testing 
+ successfully.
+ 
+ 
+ --- Original Description
+ Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to 
the Kernel 4.15.0-24-generic.
  
  On a "Dell PowerEdge R330" server with a network adapter "Intel Ethernet
  Converged Network Adapter X710-DA2" (driver i40e) the network card no
  longer works and permanently displays these three lines :
  
- 
  [   98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  [Impact]
  The i40e driver can get stalled on tx timeouts. This can happen when
  DCB is enabled on the connected switch. This can also trigger a
  second situation when a tx timeout occurs before the recovery of
  a previous timeout has completed due to CPU load, which is not
  handled correctly. This leads to networking delays, drops and
  application timeouts and hangs. Note that the first tx timeout
  cause is just one of the ways to end up in the second situation.

  This issue was seen on a heavily loaded Kafka broker node running
  the 4.15.0-38-generic kernel on Xenial. 

  Symptoms include messages in the kernel log of the form:

  ---
  [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
  [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
  

  With the test kernel provided in this LP bug which had these
  two commits compiled in, the problem has not been seen again,
  and has been running successfully for several months:

  "i40e: Fix for Tx timeouts when interface is brought up if 
   DCB is enabled"
  Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee

  "i40e: prevent overlapping tx_timeout recover"
  Commit: d5585b7b6846a6d0f9517afe57be3843150719da

  * The first commit is already in Disco, Cosmic
  * The second commit is already in Disco
  * Bionic needs both patches and Cosmic needs the second

  [Test Case]
  * We are considering the case of both issues above occurring.
  * Seen by reporter on a Kafka broker node with heavy traffic.
  * Not easy to reproduce as it requires something like the 
following example environment and heavy load:

Kernel:

[Kernel-packages] [Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)

2019-03-19 Thread Nivedita Singhvi

** Tags added: bionic cosmic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  [Impact]
  The i40e driver can get stalled on tx timeouts. This can happen when
  DCB is enabled on the connected switch. This can also trigger a
  second situation when a tx timeout occurs before the recovery of
  a previous timeout has completed due to CPU load, which is not
  handled correctly. This leads to networking delays, drops and
  application timeouts and hangs. Note that the first tx timeout
  cause is just one of the ways to end up in the second situation.

  This issue was seen on a heavily loaded Kafka broker node running
  the 4.15.0-38-generic kernel on Xenial. 

  Symptoms include messages in the kernel log of the form:

  ---
  [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
  [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
  

  With the test kernel provided in this LP bug which had these
  two commits compiled in, the problem has not been seen again,
  and has been running successfully for several months:

  "i40e: Fix for Tx timeouts when interface is brought up if 
   DCB is enabled"
  Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee

  "i40e: prevent overlapping tx_timeout recover"
  Commit: d5585b7b6846a6d0f9517afe57be3843150719da

  * The first commit is already in Disco, Cosmic
  * The second commit is already in Disco
  * Bionic needs both patches and Cosmic needs the second

  [Test Case]
  * We are considering the case of both issues above occurring.
  * Seen by reporter on a Kafka broker node with heavy traffic.
  * Not easy to reproduce as it requires something like the 
following example environment and heavy load:

Kernel: 4.15.0-38-generic
Network driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
NIC: Intel 40Gb XL710 
DCB enabled

  
  [Regression Potential]
  Low, as the first only impacts i40e DCB environment, and has
  been running for several months in production-load testing 
  successfully.

  
  --- Original Description
  Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to 
the Kernel 4.15.0-24-generic.

  On a "Dell PowerEdge R330" server with a network adapter "Intel
  Ethernet Converged Network Adapter X710-DA2" (driver i40e) the network
  card no longer works and permanently displays these three lines :

  [   98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)

2019-03-19 Thread Nivedita Singhvi

Submitted SRU request

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress

Bug description:
  [Impact]
  The i40e driver can get stalled on tx timeouts. This can happen when
  DCB is enabled on the connected switch. This can also trigger a
  second situation when a tx timeout occurs before the recovery of
  a previous timeout has completed due to CPU load, which is not
  handled correctly. This leads to networking delays, drops and
  application timeouts and hangs. Note that the first tx timeout
  cause is just one of the ways to end up in the second situation.

  This issue was seen on a heavily loaded Kafka broker node running
  the 4.15.0-38-generic kernel on Xenial. 

  Symptoms include messages in the kernel log of the form:

  ---
  [4733544.982116] i40e :18:00.1 eno2: tx_timeout: VSI_seid: 390, Q 6, NTC: 
0x1a0, HWB: 0x66, NTU: 0x66, TAIL: 0x66, INT: 0x0
  [4733544.982119] i40e :18:00.1 eno2: tx_timeout recovery level 1, 
hung_queue 6
  

  With the test kernel provided in this LP bug which had these
  two commits compiled in, the problem has not been seen again,
  and has been running successfully for several months:

  "i40e: Fix for Tx timeouts when interface is brought up if 
   DCB is enabled"
  Commit: fa38e30ac73fbb01d7e5d0fd1b12d412fa3ac3ee

  "i40e: prevent overlapping tx_timeout recover"
  Commit: d5585b7b6846a6d0f9517afe57be3843150719da

  * The first commit is already in Disco, Cosmic
  * The second commit is already in Disco
  * Bionic needs both patches and Cosmic needs the second

  [Test Case]
  * We are considering the case of both issues above occurring.
  * Seen by reporter on a Kafka broker node with heavy traffic.
  * Not easy to reproduce as it requires something like the 
following example environment and heavy load:

Kernel: 4.15.0-38-generic
Network driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
NIC: Intel 40Gb XL710 
DCB enabled

  
  [Regression Potential]
  Low, as the first only impacts i40e DCB environment, and has
  been running for several months in production-load testing 
  successfully.

  
  --- Original Description
  Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel 4.13 to 
the Kernel 4.15.0-24-generic.

  On a "Dell PowerEdge R330" server with a network adapter "Intel
  Ethernet Converged Network Adapter X710-DA2" (driver i40e) the network
  card no longer works and permanently displays these three lines :

  [   98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1820948] [NEW] i40e xps management broken when > 64 queues/cpus

2019-03-19 Thread Nivedita Singhvi

Public bug reported:

[Impact]
Transmit packet steering (xps) settings don't work when
the number of queues (cpus) is higher than 64. This is
currently still an issue on the 4.15 kernel (Xenial -hwe
and Bionic kernels). 

It was fixed in Intel's i40e driver version 2.7.11 and
in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

Fix
-
The following commit fixes this issue (as identified
by Lihong Yang in discussion with Intel i40e team):

"i40e: Fix the number of queues available to be mapped for use"
Commit: bc6d33c8d93f520e97a8c6330b8910053d4f


[Test Case]
1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
   i40e driver version: 2.1.14-k 
   Any system with > 64 CPUs

2. For any queue 0 - 63, you can read/set tx xps:
   
echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
echo $?
0
cat /sys/class/net/eth2/queues/tx-63/xps_cpus
00,, 

  But for any queue number > 63, we see this error:

echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
echo: write error: Invalid argument

cat /sys/class/net/eth2/queues/tx-64/xps_cpus
cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

** Affects: linux (Ubuntu)
 Importance: High
 Status: Confirmed

** Affects: linux (Ubuntu Bionic)
 Importance: High
 Assignee: Nivedita Singhvi (niveditasinghvi)
 Status: Confirmed


** Tags: bionic

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu)
   Status: New => Confirmed

** Changed in: linux (Ubuntu Bionic)
   Status: New => Confirmed

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
     Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels). 

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
 i40e driver version: 2.1.14-k 
 Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:
 
  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,, 

But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus

2019-03-19 Thread Nivedita Singhvi

It's been reported by an external reporter and reproduced 
internally.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Confirmed

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels). 

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
 i40e driver version: 2.1.14-k 
 Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:
 
  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,, 

But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-22 Thread Nivedita Singhvi

Just briefly wanted to say that this is one we've discussed at 
length -- we may not be able to get someone who has the right
NIC to test with it in time. 

I'm sanity checking the kernel, but that is not exercising the 
key change here. 

If we could assume verification-done for our purposes here, 
that might be needed.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus

2019-03-22 Thread Nivedita Singhvi

Will be submitting SRU request early next week; trying to get
it into this next kernel release cycle. 

** Changed in: linux (Ubuntu)
 Assignee: (unassigned) => Nivedita Singhvi (niveditasinghvi)

** Changed in: linux (Ubuntu Bionic)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu)
   Status: Confirmed => In Progress

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels). 

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
 i40e driver version: 2.1.14-k 
 Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:
 
  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,, 

But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-03-25 Thread Nivedita Singhvi

I am not sure we could deterministically provoke the 
issue. At the very least to ensure no other regression
was introduced, I would run it under heavy network load.

The environment in question which saw the issue had 
network load, contention for cpus and several other 
issues occur.

The basic environment is:

1. For any 25Gb NIC/chipset that requires the 4.4 bnxt_en_bpo
   driver, set its 2 ports/interfaces up in bonding mode 
   as follows:

bond-lacp-rate fast
bond-master bond0
bond-miimon 100
bond-mode 802.3ad
bond-xmit-hash-policy layer3+4
mtu 9000 

2. Run any heavy TCP network load test over the systems
   (e.g. iperf, netperf, file transfer, etc.)

3. Theoretically, it would appear that if the number of tx
   ring descriptors were lower, than that would be more
   likely to hit this (not successfully proven by testing
   here), but can lower it and see if that helps:

   # ethtool -G eno49 tx 128  // for example


I am not sure if that helps, Scott. I'll try and smoke
up more specific steps but I cannot guarantee you will
see the issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed
Status in linux source package in Xenial:
  Fix Committed

Bug description:
  [Impact]

  The bnxt_en_bpo driver experienced tx timeouts causing the system to
  experience network stalls and fail to send data and heartbeat packets.

  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
    and triggered the Netdev Watchdog timer under load.

  * From kernel log:
    "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
    See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial
    Kernel = 4.4.0-141-generic #167
    eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

    "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting
  reset task!"

    driver: bnxt_en_bpo
    version: 1.8.1
    source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
    on the system.

  * The bnxt_en_po driver is the imported Broadcom driver
    pulled in to support newer Broadcom HW (specific boards)
    while the bnx_en module continues to support the older
    HW. The current Linux upstream driver does not compile
    easily with the 4.4 kernel (too many changes).

  * This upstream and bnxt_en driver fix is a likely solution:
     "bnxt_en: Fix TX timeout during netpoll"
     commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

    This fix has not been applied to the bnxt_en_po driver
    version, but review of the code indicates that it is
    susceptible to the bug, and the fix would be reasonable.

  [Test Case]

  * Unfortunately, this is not easy to reproduce. Also, it is only seen
  on 4.4 kernels with newer Broadcom NICs supported by the bnxt_en_bpo
  driver.

  [Regression Potential]

  * The patch is restricted to the bpo driver, with very constrained
  scope - just the newest Broadcom NICs being used by the Xenial 4.4
  kernel (as opposed to the hwe 4.15 etc. kernels, which would have the
  in-tree fixed driver).

  * The patch is very small and backport is fairly minimal and simple.

  * The fix has been running on the in-tree driver in upstream mainline
  as well as the Ubuntu Linux in-tree driver, although the Broadcom
  driver has a lot of lower level code that is different, this piece is
  still the same.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus

2019-03-26 Thread Nivedita Singhvi

Submitted patches for SRU.

** Description changed:

  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
- and Bionic kernels). 
+ and Bionic kernels).
  
  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).
  
  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):
  
  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f
  
+ It requires the following commit as well:
+ 
+ i40e: Do not allow use more TC queue pairs than MSI-X vectors exist
+ Commit:   1563f2d2e01242f05dd523ffd56fe104bc1afd58
+ 
  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
-i40e driver version: 2.1.14-k 
-Any system with > 64 CPUs
+    i40e driver version: 2.1.14-k
+    Any system with > 64 CPUs
  
  2. For any queue 0 - 63, you can read/set tx xps:
-
+ 
  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
- 00,, 
+ 00,,
  
-   But for any queue number > 63, we see this error:
+   But for any queue number > 63, we see this error:
  
  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument
  
  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels).

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  It requires the following commit as well:

  i40e: Do not allow use more TC queue pairs than MSI-X vectors exist
  Commit:   1563f2d2e01242f05dd523ffd56fe104bc1afd58

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
     i40e driver version: 2.1.14-k
     Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:

  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,,

    But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1820948] Re: i40e xps management broken when > 64 queues/cpus

2019-03-31 Thread Nivedita Singhvi

I'm still trying to confirm this for Xenial.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1820948

Title:
  i40e xps management broken when > 64 queues/cpus

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [Impact]
  Transmit packet steering (xps) settings don't work when
  the number of queues (cpus) is higher than 64. This is
  currently still an issue on the 4.15 kernel (Xenial -hwe
  and Bionic kernels).

  It was fixed in Intel's i40e driver version 2.7.11 and
  in 4.16-rc1 mainline Linux (i.e. Cosmic, Disco have fix).

  Fix
  -
  The following commit fixes this issue (as identified
  by Lihong Yang in discussion with Intel i40e team):

  "i40e: Fix the number of queues available to be mapped for use"
  Commit: bc6d33c8d93f520e97a8c6330b8910053d4f

  It requires the following commit as well:

  i40e: Do not allow use more TC queue pairs than MSI-X vectors exist
  Commit:   1563f2d2e01242f05dd523ffd56fe104bc1afd58

  
  [Test Case]
  1. Kernel version: Bionic/Xenial -hwe: any 4.15 kernel
     i40e driver version: 2.1.14-k
     Any system with > 64 CPUs

  2. For any queue 0 - 63, you can read/set tx xps:

  echo  > /sys/class/net/eth2/queues/tx-63/xps_cpus
  echo $?
  0
  cat /sys/class/net/eth2/queues/tx-63/xps_cpus
  00,,

    But for any queue number > 63, we see this error:

  echo  > /sys/class/net/eth2/queues/tx-64/xps_cpus
  echo: write error: Invalid argument

  cat /sys/class/net/eth2/queues/tx-64/xps_cpus
  cat: /sys/class/net/eth2/queues/tx-64/xps_cpus: Invalid argument

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1820948/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-04-17 Thread Nivedita Singhvi

** Description changed:

  [Impact]
  
  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :
  
  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."
  
  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).
  
  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  
  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.
  
  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)
  
  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):
  
  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot
  
  2. Install OVS
  # apt install openvswitch-switch
  
  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z
  
  (where remote_ip is the IP of the other host)
  
  You will see the following error message:
  
  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."
  
  From /var/log/openvswitch/ovs-vswitchd.log you will see:
  
  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"
  
  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.
  
  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.
  
  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.
  
  With the fixed test kernel, the interfaces and tunnel
  is created successfully.
  
  [Regression Potential]
- * Low -- affects the geneve driver only, and when ipv6 is 
-   disabled, and since it doesn't work in that case at all,
-   this fix gets the tunnel up and running for the common case.
- 
+ * Low -- affects the geneve driver only, and when ipv6 is
+   disabled, and since it doesn't work in that case at all,
+   this fix gets the tunnel up and running for the common case.
  
  [Other Info]
  
  * Analysis
  
  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.
  
  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:
  
  rather than:
  
  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6
  
  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :
  
  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...
  
  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif
  if (!ret && (!ipv6 || metadata))
  ret = geneve_sock_add(geneve, false);
  
  CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but
  even though ipv6 is false, metadata is always true
  for a geneve open as it is set unconditionally in
  ovs:
  
  In /lib/dpif_netlink_rtnl.c :
  
  case OVS_VPORT_TYPE_GENEVE:
  nl_msg_put_flag(&request, IFLA_GENEVE_COLLECT_METADATA);
  
  The second argument of geneve_sock_add is a boolean
  value indicating whether it's an ipv6 address family
  socket or not, and we thus incorrectly pass a true
  value rather than false.
  
  The current "|| metadata" check is unnecessary and incorrectly
  sends the tunnel creation code down the ipv6 path, which
  fails subsequently when the code expects an ipv6 family socket.
  
  * This issue exists in all versions of the kernel upto present
     mainline and net-next trees.
  
  * Testing with a trivial patch to remove that and make
    similar changes to those made for vxlan (which had the
    same issue) has been successful. Patches for various
    versions to be attached here soon.
  
  * Example Versions (bug exists in all versions of Ubuntu
-   and mainline):
+   and mainline)
+ 
+ Update: This has been patched upstream after original description filed
+ here, fix available in v5.0 mainline and

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-04-17 Thread Nivedita Singhvi

** Description changed:

+ SRU Justification
+ 
+ Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.
+ 
+ Fix:
+ Fixed by upstream commit in v5.0:
+ Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
+ "geneve: correctly handle ipv6.disable module parameter"
+ 
+ Hence available in Disco and later; required in X,B,C
+ Cherry picked and tested successfully for X, B, C.
+ 
+ Testcase:
+ 1. Boot with "ipv6.disable=1"
+ 2. Then try and create a geneve tunnel using:
+# ovs-vsctl add-br br1
+# ovs-vsctl add-port br1 geneve1 -- set interface geneve1
+ type=geneve options:remote_ip=192.168.x.z // ip of the other host
+ 
+ Regression Potential: Low, only geneve tunnels when ipv6 dynamically
+ disabled, current status is it doesn't work at all.
+ 
+ Other Info:
+ * Mainline commit msg includes reference to a fix for
+   non-metadata tunnels (infrastructure is not yet in
+   our tree prior to Disco), hence not being included
+   at this time under this case.
+ 
+   At this time, all geneve tunnels created as above
+   are metadata-enabled.
+ 
+ 
+ ---
  [Impact]
  
  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :
  
  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."
  
  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).
  
  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  
  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.
  
  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)
  
  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):
  
  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot
  
  2. Install OVS
  # apt install openvswitch-switch
  
  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z
  
  (where remote_ip is the IP of the other host)
  
  You will see the following error message:
  
  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."
  
  From /var/log/openvswitch/ovs-vswitchd.log you will see:
  
  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"
  
  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.
  
  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.
  
  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.
  
  With the fixed test kernel, the interfaces and tunnel
  is created successfully.
  
  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.
  
  [Other Info]
  
  * Analysis
  
  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.
  
  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:
  
  rather than:
  
  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6
  
  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :
  
  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...
  
  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif
  if (!ret && (!ipv6 || metadata))
  ret = geneve_sock_add(geneve, false);
  
  CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but
  even though ipv6 is false, metadata is always true
  for a geneve open as it is set unconditionally in
  ovs:
  
  In /lib/dpif_netlink_rtnl.c :
  
  case OVS_VPORT_TYPE_GENEVE:
  nl_msg_put_flag(&request, IFLA_GENEVE_COLLECT_METADATA);
  
  The second argument of geneve_sock_add is a

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-04-17 Thread Nivedita Singhvi

** Description changed:

  SRU Justification
  
  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.
  
  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"
  
- Hence available in Disco and later; required in X,B,C
- Cherry picked and tested successfully for X, B, C.
+ Hence available in Disco and later; required in X,B,C.
  
  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
-# ovs-vsctl add-br br1
-# ovs-vsctl add-port br1 geneve1 -- set interface geneve1
- type=geneve options:remote_ip=192.168.x.z // ip of the other host
+    # ovs-vsctl add-br br1
+    # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
+ type=geneve options:remote_ip=192.168.x.z // ip of the other host
  
  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.
  
  Other Info:
  * Mainline commit msg includes reference to a fix for
-   non-metadata tunnels (infrastructure is not yet in
-   our tree prior to Disco), hence not being included
-   at this time under this case.
+   non-metadata tunnels (infrastructure is not yet in
+   our tree prior to Disco), hence not being included
+   at this time under this case.
  
-   At this time, all geneve tunnels created as above
-   are metadata-enabled.
- 
+   At this time, all geneve tunnels created as above
+   are metadata-enabled.
  
  ---
  [Impact]
  
  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :
  
  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."
  
  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).
  
  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  
  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.
  
  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)
  
  1. On any Ubuntu Xenial kernel, disable ipv6. This example
-    is shown with the4.15.0-23-generic kernel (which differs
+    is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):
  
  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot
  
  2. Install OVS
  # apt install openvswitch-switch
  
  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z
  
  (where remote_ip is the IP of the other host)
  
  You will see the following error message:
  
  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."
  
  From /var/log/openvswitch/ovs-vswitchd.log you will see:
  
  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"
  
  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.
  
  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.
  
  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.
  
  With the fixed test kernel, the interfaces and tunnel
  is created successfully.
  
  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.
  
  [Other Info]
  
  * Analysis
  
  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.
  
  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:
  
  rather than:
  
  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6
  
  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :
  
  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...
  
  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-04-17 Thread Nivedita Singhvi

** Tags added: cosmic xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif
  if (!ret && (!ipv6 || metadata))

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-04-17 Thread Nivedita Singhvi

Submitted SRU request for Bionic, Cosmic.

Huge thanks for the testing, Matthew!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-05-07 Thread Nivedita Singhvi

Resubmitted SRU for B,C for this kernel cycle.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Cosmic:
  In Progress
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif
  if (!ret && (!ipv6 || meta

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-10 Thread Nivedita Singhvi

The second port on the NIC definitely works as the active
interface in an active-backup bonding configuration on the
other NICs.

At the moment, it's only this particular NIC that is seeing
this problem that we know of.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have followed the troubleshooting information on thi

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-10 Thread Nivedita Singhvi

We have narrowed it down to a flaw in a specific configuration setting
on this NIC, so we're comparing the good and bad configurations now.

Primary port: enp94s0f0
Secondary port: enp94s0f1d1


A] Good config for fault-tolerance (active-backup) bonding mode:
--
Primary port = active interface; Secondary port = backup

B] Bad config for fault-tolerance (active-backup) bonding mode:
--
Primary port = backup interface; Secondary port = active


We are consistently able to reproduce a drop rate difference
with UDP pkts, for the above good/bad cases:


Good Case UDP MTR Test Result
-
mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST
Start: 2020-02-10T10:14:01+
HOST: hostname Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- nn.nn.nnn.nnn  0.0%600.3   0.2   0.2   0.3   0.0


Bad Case UDP MTR Test Result
---
mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST
Start: 2020-02-10T14:10:52+
HOST: hostname Loss%   Snt   Last   Avg  Best  Wrst StDev
  1.|-- nn.nn.nnn.nnn  8.3%600.3   0.3   0.2   0.4   0.0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] De

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi

"Bad" Configuration for active-backup mode:



$ cat /proc/net/bonding/bond0
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)
  
Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp94s0f1d1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0
Peer Notification Delay (ms): 0

Slave Interface: enp94s0f1d1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 4c:d9:8f:48:08:da
Slave queue ID: 0

Slave Interface: enp94s0f0
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 2
Permanent HW addr: 4c:d9:8f:48:08:d9
Slave queue ID: 0

---
$ cat uname-rv 
5.3.0-28-generic #30~18.04.1-Ubuntu SMP Fri Jan 17 06:14:09 UTC 2020

---
Scrubbed /etc/netplan/50-cloud-init.yaml:
network:
bonds:
bond0:
addresses:
- 0.0.235.177/25
gateway4: 0.0.235.129
interfaces:
- enp94s0f0
- enp94s0f1d1
macaddress: 00:00:00:48:08:00
mtu: 9000
nameservers:
addresses:
- 0.0.235.171
- 0.0.235.172
search:
- maas
parameters:
down-delay: 0
gratuitious-arp: 1
mii-monitor-interval: 100
mode: active-backup
transmit-hash-policy: layer2
up-delay: 0
ethernets:
eno1:
match:
macaddress: 00:00:00:76:6e:ca
mtu: 1500
set-name: eno1
eno2:
match:
macaddress: 00:00:00:76:6e:cb
mtu: 1500
set-name: eno2
enp94s0f0:
match:
macaddress: 00:00:00:48:08:00
mtu: 9000
set-name: enp94s0f0
enp94s0f1d1:
match:
macaddress: 00:00:00:48:08:da
mtu: 9000
set-name: enp94s0f1d1
version: 2

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi

Good System/Good NIC (all configurations work) Comparison

NIC: NetXtreme II BCM57000 10 Gigabit Ethernet QLogic 57000
System: Dell
Kernel: 5.0.0-25-generic #26~18.04.1-Ubuntu


/proc/net/bonding/bond0
---
Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

Bonding Mode: fault-tolerance (active-backup)
Primary Slave: None
Currently Active Slave: enp5s0f1
MII Status: up
MII Polling Interval (ms): 100
Up Delay (ms): 0
Down Delay (ms): 0

Slave Interface: enp5s0f1
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:00:00:00:73:e2
Slave queue ID: 0

Slave Interface: enp5s0f0
MII Status: up
Speed: 1 Mbps
Duplex: full
Link Failure Count: 0
Permanent HW addr: 00:00:00:00:73:e0
Slave queue ID: 0


/etc/netplan/50-cloud-init.yaml

network:
bonds:
bond0:
addresses:
- 00.00.235.182/25
gateway4: 00.00.235.129
interfaces:
- enp5s0f0
- enp5s0f1
macaddress: 00:00:00:00:73:e0
mtu: 9000
nameservers:
addresses:
- 00.00.235.172
- 00.00.235.171
search:
- maas
parameters:
down-delay: 0
gratuitious-arp: 1
mii-monitor-interval: 100
mode: active-backup
transmit-hash-policy: layer2
up-delay: 0
ethernets:
...(snip)..
enp5s0f0:
match:
macaddress: 00:00:00:00:73:e0
mtu: 9000
set-name: enp5s0f0
enp5s0f1:
match:
macaddress: 00:00:00:00:73:e2
mtu: 9000
set-name: enp5s0f1
version: 2

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi

"Bad" System/NIC:

NIC: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller
System: Dell
Kernel: 5.3.0-28-generic #30~18.04.1-Ubuntu

(Note, this issue has been seen on prior kernels as well, upgraded
 to latest to see if various problems were resolved)


Attaching stats/config files from nics from this system (seeing issue).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi

** Attachment added: "ethtool -S for inactive interface enp94s0f0"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1853638/+attachment/5327556/+files/ethtool-S-enp94s0f0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have followed the troubleshooting information on this link to try determine 
the problem: htt

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi

ethtool-enp94s0f0
--
Settings for enp94s0f0:
Supported ports: [ FIBRE ]
Supported link modes:   1baseT/Full 
Supported pause frame use: Symmetric Receive-only
Supports auto-negotiation: Yes
Supported FEC modes: Not reported
Advertised link modes:  Not reported
Advertised pause frame use: No
Advertised auto-negotiation: No
Advertised FEC modes: Not reported
Speed: 1Mb/s
Duplex: Full
Port: FIBRE
PHYAD: 1
Transceiver: internal
Auto-negotiation: off
Supports Wake-on: g
Wake-on: d
Current message level: 0x (0)
   
Link detected: yes

ethtool-i-enp94s0f0
--
driver: bnxt_en
version: 1.10.0
firmware-version: 214.0.253.1/pkg 21.40.25.31
expansion-rom-version: 
bus-info: :5e:00.0
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no


ethtool-c-enp94s0f0
-
Coalesce parameters for enp94s0f0:
Adaptive RX: off  TX: off
stats-block-usecs: 100
sample-interval: 0
pkt-rate-low: 0
pkt-rate-high: 0

rx-usecs: 10
rx-frames: 15
rx-usecs-irq: 1
rx-frames-irq: 1

tx-usecs: 28
tx-frames: 30
tx-usecs-irq: 2
tx-frames-irq: 2

rx-usecs-low: 0
rx-frame-low: 0
tx-usecs-low: 0
tx-frame-low: 0

rx-usecs-high: 0
rx-frame-high: 0
tx-usecs-high: 0
tx-frame-high: 0

ethtool-g-enp94s0f0

Ring parameters for enp94s0f0:
Pre-set maximums:
RX: 2047
RX Mini:0
RX Jumbo:   8191
TX: 2047
Current hardware settings:
RX: 511
RX Mini:0
RX Jumbo:   2044
TX: 511

ethtool-k-enp94s0f0
-
Features for enp94s0f0:
rx-checksumming: on
tx-checksumming: on
tx-checksum-ipv4: on
tx-checksum-ip-generic: off [fixed]
tx-checksum-ipv6: on
tx-checksum-fcoe-crc: off [fixed]
tx-checksum-sctp: off [fixed]
scatter-gather: on
tx-scatter-gather: on
tx-scatter-gather-fraglist: off [fixed]
tcp-segmentation-offload: on
tx-tcp-segmentation: on
tx-tcp-ecn-segmentation: off [fixed]
tx-tcp-mangleid-segmentation: off
tx-tcp6-segmentation: on
udp-fragmentation-offload: off
generic-segmentation-offload: on
generic-receive-offload: on
large-receive-offload: off
rx-vlan-offload: on
tx-vlan-offload: on
ntuple-filters: on
receive-hashing: on
highdma: on [fixed]
rx-vlan-filter: off [fixed]
vlan-challenged: off [fixed]
tx-lockless: off [fixed]
netns-local: off [fixed]
tx-gso-robust: off [fixed]
tx-fcoe-segmentation: off [fixed]
tx-gre-segmentation: on
tx-gre-csum-segmentation: on
tx-ipxip4-segmentation: on
tx-ipxip6-segmentation: off [fixed]
tx-udp_tnl-segmentation: on
tx-udp_tnl-csum-segmentation: on
tx-gso-partial: on
tx-sctp-segmentation: off [fixed]
tx-esp-segmentation: off [fixed]
tx-udp-segmentation: off [fixed]
fcoe-mtu: off [fixed]
tx-nocache-copy: off
loopback: off [fixed]
rx-fcs: off [fixed]
rx-all: off [fixed]
tx-vlan-stag-hw-insert: on
rx-vlan-stag-hw-parse: on
rx-vlan-stag-filter: off [fixed]
l2-fwd-offload: off [fixed]
hw-tc-offload: on
esp-hw-offload: off [fixed]
esp-tx-csum-hw-offload: off [fixed]
rx-udp_tunnel-port-offload: on
tls-hw-tx-offload: off [fixed]
tls-hw-rx-offload: off [fixed]
rx-gro-hw: on
tls-hw-record: off [fixed]

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12A

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-11 Thread Nivedita Singhvi

Edwin, let me know if you can get in touch with me via the contact email 
on my Launchpad page. Thanks for all the help!

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have followed the troubleshooting information on this link to try determine 
the problem: https://files.ettus.com/manual/page_usrp_x3x0_config.html
  -

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-12 Thread Nivedita Singhvi

Additional observations.

MAAS is being used to deploy the system and configure
the bond interface and settings.

MAAS allows you to specify which is the primary interface, with
the other being the backup, for the active-backup bonding mode.
However, it does not appear to be working -it's not passing along 
a primary primitive, for instance, in the netplan yaml or otherwise 
resulting in this being honored (still need to confirm).

MAAS allows you to enter a mac address for the bond interface,
but if not supplied, by default it will use the mac address of
the "primary" interface, as configured.

MAAS then populates the /etc/netplan/50-cloud-init.yaml, including
a macaddr= line with the default.

netplan then passes that along to systemd-networkd.

The bonding kernel, however, will use as the active interface
whichever interface is first attached to the bond (i.e., which
completes getting attached to the bond interface first) in the
absence of a primary= directive.

The bonding kernel will, however, use the mac addr supplied
as an override.

So let's say the active interface was configured in MAAS to be
f0, and it's mac is used to be the mac address of the bond,
but f1 (the second port of the NIC) actually gets attached
first to the bond and is used as the active interface by the
bond.

We have a situation where f0 = backup, f1 = active, and bond0
is using the mac of f0. While this should work, there is a
potential for problems depending on the circumstances.

It's likely this has nothing to do with our current issue, but
here for completeness. Will see if we can test/confirm.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-14 Thread Nivedita Singhvi

Edwin,

Do you happen to notice any IPv6 or LLDP or other link-local traffic
on the interfaces? (including backup interface). 

The MTR loss  % is purely a capture of their packets xmitted 
and responses received, so for that UDP MTR test, this is saying
that UDP packets were lost, somewhere. 

The NIC does not have any drops showing via ethtool -S 
stats but I'm hunting down which are the right pair of before/afters.

Other than the tpa_abort counts, there were no errors that I saw.
I can't tell what the tpa_abort means for the frame - is it purely 
a failure only to coalesce, or does it end up dropping packets at
some point in that functionality? I'm assuming not, as whatever the
reason, those would be counted as drops, I hope, and printed in 
the interface stats. 

I'll attach all the stats here once I get them sorted out, I thought
I had a clean diff of before and after from the tester, but after
looking through, I don't think the file I have is from before/after
the mtr test, as there was negligible UDP traffic. I'll try and
get clarification from the reporter. 

Note that when the provision of primary= is used to configure 
which interface is primary, and when the primary port is used
as the active interface for the bond, no problems are seen (and
that works deterministically to set the correct active interface).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected

[Kernel-packages] [Bug 1811963] Re: Sporadic problems with X710 (i40e) and bonding where one interface is shown as "state DOWN" and without LOWER_UP

2020-04-16 Thread Nivedita Singhvi

Hi Malte,

Was this issue resolved for you?

There are several other possibilities that it could be - and
if it's still a problem with current mainline, please let
us know.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1811963

Title:
  Sporadic problems with X710 (i40e) and bonding where one interface is
  shown as "state DOWN" and without LOWER_UP

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  After rebooting the physical server there is a 50/50 chance of all connected 
interfaces coming up. This affects Dell EMC R740's and R440's equipped with the 
X710 network cards.
  As far as I noticed (~20 reboots on different machines), this happens only 
when using bonding (in this case active-backup or mode 1, did not test 
different modes yet). The networking-hardware on the other side shows the ports 
"connected". tcpdump shows frames being received, even if the interface is in 
"state DOWN".

  Tried with:

  Ubuntu 16.04, kernel 4.4.0-141, driver 2.7.26 (from the Intel-website), 
firmware 18.8.9
  Ubuntu 16.04, kernel 4.4.0-141, driver 1.4.25-k, firmware 18.8.9
  Ubuntu 16.04, kernel 4.15.0-43 (hwe), driver 2.1.14-k, firmware 18.8.9

  The following excerpts are made using Intels driver in version 2.7.26,
  therefore tainting the kernel, but the same happens using the original
  kernel's version or the hardware enablement kernel's version.

  Sporadic failure case:

  [6.319226] i40e: loading out-of-tree module taints kernel.
  [6.319227] i40e: loading out-of-tree module taints kernel.
  [6.319422] i40e: module verification failed: signature and/or required 
key missing - tainting kernel
  [6.410837] i40e: Intel(R) 40-10 Gigabit Ethernet Connection Network 
Driver - version 2.7.26
  [6.410838] i40e: Copyright(c) 2013 - 2018 Intel Corporation.
  [6.423542] i40e :3b:00.0: fw 6.81.49447 api 1.7 nvm 6.80 0x80003d72 
18.8.9
  [6.658526] i40e :3b:00.0: MAC address: ff:ff:ff:ff:ff:ff
  [6.710391] i40e :3b:00.0: PCI-Express: Speed 8.0GT/s Width x8
  [6.725692] i40e :3b:00.0: Features: PF-id[0] VFs: 64 VSIs: 2 QP: 40 
RSS FD_ATR FD_SB NTUPLE CloudF DCB VxLAN Geneve NVGRE PTP VEPA
  [6.750239] i40e :3b:00.1: fw 6.81.49447 api 1.7 nvm 6.80 0x80003d72 
18.8.9
  [6.987874] i40e :3b:00.1: MAC address: ff:ff:ff:ff:ff:f1
  [7.005397] i40e :3b:00.1 eth0: NIC Link is Up, 10 Gbps Full Duplex, 
Flow Control: None
  [7.024993] i40e :3b:00.1: PCI-Express: Speed 8.0GT/s Width x8
  [7.040298] i40e :3b:00.1: Features: PF-id[1] VFs: 64 VSIs: 2 QP: 40 
RSS FD_ATR FD_SB NTUPLE CloudF DCB VxLAN Geneve NVGRE PTP VEPA
  [7.054384] i40e :3b:00.1 enp59s0f1: renamed from eth0
  [7.079613] i40e :3b:00.0 enp59s0f0: renamed from eth1
  [9.788893] i40e :3b:00.0 enp59s0f0: already using mac address 
ff:ff:ff:ff:ff:ff
  [9.819480] i40e :3b:00.1 enp59s0f1: set new mac address 
ff:ff:ff:ff:ff:ff

  [9.728194] bond0: Setting MII monitoring interval to 100
  [9.788690] bond0: Adding slave enp59s0f0
  [9.805195] bond0: Enslaving enp59s0f0 as a backup interface with a down 
link
  [9.819470] bond0: Adding slave enp59s0f1
  [9.836360] bond0: making interface enp59s0f1 the new active one
  [9.836614] bond0: Enslaving enp59s0f1 as an active interface with an up 
link

  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: fault-tolerance (active-backup)
  Primary Slave: None
  Currently Active Slave: enp59s0f1
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  Slave Interface: enp59s0f0
  MII Status: down
  Speed: Unknown
  Duplex: Unknown
  Link Failure Count: 0
  Permanent HW addr: ff:ff:ff:ff:ff:ff
  Slave queue ID: 0

  Slave Interface: enp59s0f1
  MII Status: up
  Speed: Unknown
  Duplex: Unknown
  Link Failure Count: 0
  Permanent HW addr: ff:ff:ff:ff:ff:f1
  Slave queue ID: 0

  4: enp59s0f0:  mtu 1500 qdisc mq 
master bond0 portid  state DOWN group default qlen 1000
  link/ether ff:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
  5: enp59s0f1:  mtu 1500 qdisc mq 
master bond0 portid fff1 state UP group default qlen 1000
  link/ether ff:ff:ff:ff:ff:f1 brd ff:ff:ff:ff:ff:ff
  6: bond0:  mtu 1500 qdisc noqueue 
state UP group default qlen 1000
  link/ether ff:ff:ff:ff:ff:ff brd ff:ff:ff:ff:ff:ff
  inet 123.123.123.123/24 brd 123.123.123.255 scope global bond0
 valid_lft forever preferred_lft forever
  inet6 :::::/64 scope link 
 valid_lft forever preferred_lft forever

  bond0 Link encap:Ethernet  HWaddr ff:ff:ff:ff:ff:ff  
inet addr:123.123.123.123  Bcast:123.123.123.255  Mask:255.255.255.0
inet6 addr: :::::/64 Scope:Link
UP BROADCAST RUNNING MASTER MULTICAST  MTU:1500  Metric:1

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-05-11 Thread Nivedita Singhvi

We are closing this LP bug for now as we aren't able to reproduce 
in-house, and we cannot get access to a live testing repro env 
at this time.  

Here is what we know:

- There seems to be different performance for some tests when
  the NIC is configured with active-backup bonding mode, between
  the case when the active interface is the primary port, and 
  when the active interface is the secondary port. 
  i.e.:

Primary port: enp94s0f0 // when this is the active, works fine
Secondary port: enp94s0f1d1 // when this is the active, more drops

- Switch info: 2 x Fortigate 1024D switches, each machine is connected 
  to both

- NIC info:  root@u072:~# lspci | grep BCM57416
01:00.0 Ethernet controller: Broadcom Inc. and subsidiaries BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)

# ethtool -i enp1s0f0np0
driver: bnxt_en
version: 1.10.0
firmware-version: 214.0.253.1/pkg 21.40.25.31

- Our attempt at a reproducer (initially reported in production env via 
graphical
monitoring):

mtr --no-dns --report --report-cycles 60 --udp -s 1428 $DEST
good system = ~ 0% drops
bad systems = ~ 8% drops

We are not getting NIC stats drops, nor UDP kernel drops, so it's
not clear where the packet is being dropped, whether it's being
dropped silently somewhere (?), or if that's a red herring and
a mtr test issue, and what's seen in production is something else.

If someone can reproduce this, or something similar, or if we manage
to, we will re-open this bug or file a new one.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-05-11 Thread Nivedita Singhvi

Hello, diarmuid,

Re: original issue report, were you able to resolve your issue?

Please let us know.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have followed the troubleshooting information on this link to try determine 
the problem: https://files.ettus.com/manual/page_usrp_x3x0_config.html
  - There is no firew

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-05-11 Thread Nivedita Singhvi

The issue we have reported is easily avoided by specifying
the primary port to be the active interface of the bond. 

On netplan-using systems:

Add the directive "primary: $interface" (e.g. "primary: p94s0f0")
to the "parameters:" section of the netplan config file.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have

[Kernel-packages] [Bug 1882039] Re: The thread level parallelism would be a bottleneck when searching for the shared pmd by using hugetlbfs

2020-06-10 Thread Nivedita Singhvi

** Changed in: linux (Ubuntu Bionic)
   Importance: Medium => High

** Changed in: linux (Ubuntu Bionic)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu Eoan)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu Bionic)
 Assignee: (unassigned) => Gavin Guo (mimi0213kimo)

** Changed in: linux (Ubuntu Focal)
   Status: Triaged => In Progress

** Changed in: linux (Ubuntu Focal)
   Importance: Medium => High

** Changed in: linux (Ubuntu Eoan)
   Importance: Medium => High

** Changed in: linux (Ubuntu)
   Importance: Medium => High

** Changed in: linux (Ubuntu Eoan)
 Assignee: (unassigned) => Gavin Guo (mimi0213kimo)

** Changed in: linux (Ubuntu Focal)
 Assignee: (unassigned) => Gavin Guo (mimi0213kimo)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1882039

Title:
  The thread level parallelism would be a bottleneck when searching for
  the shared pmd by using hugetlbfs

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Eoan:
  In Progress
Status in linux source package in Focal:
  In Progress

Bug description:
  [Impact]

  There is performance overhead observed when many threads
  are using hugetlbfs in the database environment.

  [Fix]

  bdfbd98bc018 hugetlbfs: take read_lock on i_mmap for PMD sharing

  The patch improves the locking by using the read lock instead of the
  write lock. And it allows multiple threads searching the suitable shared
  VMA. As there is no modification inside the searching process. The 
  improvement increases the parallelism and decreases the waiting time of
  the other threads.

  [Test]

  The customer stand-up a database with seed data. Then they have a
  loading "driver" which makes a bunch of connections that look like user
  workflows from the database perspective. Finally, the measuring response
  times improvement can be observed for these "users" as well as various
  other metrics at the database level.

  [Regression Potential]

  The modification is only in replacing the write lock to a read one. And 
  there is no modification inside the loop. The regression probability is
  low.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1882039/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

2020-06-14 Thread Nivedita Singhvi

Tested.


** Tags removed: verification-needed-bionic
** Tags added: verification-done-bionic

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1879658

Title:
  Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [IMPACT]

  Setting an MTU larger than the default 1500 results in an
  error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
  when attempting to create ipvlan interfaces:

  # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  This breaks Docker and other applications which use a Jumbo
  MTU (9000) when using ipvlans.

  The bug is caused by the following recent commit to Bionic
  & Xenial-hwe; which is pulled in via the stable patchset below,
  which enforces a strict min/max MTU when MTUs are being set up
  via rtnetlink for ipvlans:

  Breaking commit:
  ---
  Ubuntu-hwe-4.15.0-92.93~16.04.1
  * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
    * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()

  The above patch applies checks of dev->min_mtu and dev->max_mtu
  to avoid a malicious user from crashing the kernel with a bad
  value. It was patching the original patchset to centralize min/max
  MTU checking from various different subsystems of the networking
  kernel. However, in that patchset, the max_mtu had not been set
  to the largest phys (64K) or jumbo (9000 bytes), and defaults to
  1500. The recent commit above which enforces strict bounds checking
  for MTU size exposes the bug of the max mtu not being set correctly
  for the ipvlan driver (this has been previously fixed in bonding,
  teaming drivers).

  Fix:
  ---
  This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans,
  but was not backported to Bionic along with other patches. The missing commit 
in the Bionic backport:

  ipvlan: use ETH_MAX_MTU as max mtu
  commit 548feb33c598dfaf9f8e066b842441ac49b84a8a

  [Test Case]

  1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe)

  2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
     (where test1 eno1 is the physical interface you are adding
  the ipvlan on)

  3. # ip link
  ...
  14: test1@eno1:  mtu 9000 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
  ...
    // check that your test1 ipvlan is created with mtu 9000

  4. Install 4.15.0-92 kernel or later

  5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  6. With the above fix commit backported to the xenial-hwe/Bionic,
  the jumbo mtu ipvlan creation works again, identical to before 92.

  [Regression Potential]

  This commit is in upstream mainline as of v4.18-rc2, and hence
  is already in Cosmic and later, i.e. all post Bionic releases
  currently. Hence there's low regression potential here.

  It only impacts ipvlan functionality, and not other networking
  systems, so core systems should not be affected by this. And
  affects on setup so it either works or doesn't. Patch is trivial.

  It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions
  (where the latent bug got exposed).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

2020-06-15 Thread Nivedita Singhvi

Packages tested

linux-gcp (4.15.0-1078.88~16.04.1) xenial;
linux-hwe (4.15.0-107.108~16.04.1) xenial;
linux-gcp-4.15 (4.15.0-1078.88) bionic;
linux (4.15.0-107.108) bionic;

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1879658

Title:
  Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  Fix Committed

Bug description:
  [IMPACT]

  Setting an MTU larger than the default 1500 results in an
  error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
  when attempting to create ipvlan interfaces:

  # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  This breaks Docker and other applications which use a Jumbo
  MTU (9000) when using ipvlans.

  The bug is caused by the following recent commit to Bionic
  & Xenial-hwe; which is pulled in via the stable patchset below,
  which enforces a strict min/max MTU when MTUs are being set up
  via rtnetlink for ipvlans:

  Breaking commit:
  ---
  Ubuntu-hwe-4.15.0-92.93~16.04.1
  * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
    * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()

  The above patch applies checks of dev->min_mtu and dev->max_mtu
  to avoid a malicious user from crashing the kernel with a bad
  value. It was patching the original patchset to centralize min/max
  MTU checking from various different subsystems of the networking
  kernel. However, in that patchset, the max_mtu had not been set
  to the largest phys (64K) or jumbo (9000 bytes), and defaults to
  1500. The recent commit above which enforces strict bounds checking
  for MTU size exposes the bug of the max mtu not being set correctly
  for the ipvlan driver (this has been previously fixed in bonding,
  teaming drivers).

  Fix:
  ---
  This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans,
  but was not backported to Bionic along with other patches. The missing commit 
in the Bionic backport:

  ipvlan: use ETH_MAX_MTU as max mtu
  commit 548feb33c598dfaf9f8e066b842441ac49b84a8a

  [Test Case]

  1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe)

  2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
     (where test1 eno1 is the physical interface you are adding
  the ipvlan on)

  3. # ip link
  ...
  14: test1@eno1:  mtu 9000 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
  ...
    // check that your test1 ipvlan is created with mtu 9000

  4. Install 4.15.0-92 kernel or later

  5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  6. With the above fix commit backported to the xenial-hwe/Bionic,
  the jumbo mtu ipvlan creation works again, identical to before 92.

  [Regression Potential]

  This commit is in upstream mainline as of v4.18-rc2, and hence
  is already in Cosmic and later, i.e. all post Bionic releases
  currently. Hence there's low regression potential here.

  It only impacts ipvlan functionality, and not other networking
  systems, so core systems should not be affected by this. And
  affects on setup so it either works or doesn't. Patch is trivial.

  It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions
  (where the latent bug got exposed).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1879658] [NEW] Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

2020-05-20 Thread Nivedita Singhvi

Public bug reported:

[IMPACT]

Setting an MTU larger than the default 1500 results in an
error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
when attempting to create ipvlan interfaces:

# ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
RTNETLINK answers: Invalid argument

This breaks Docker and other applications which use a Jumbo
MTU (9000) when using ipvlans.

The bug is caused by the following recent commit to Bionic
& Xenial-hwe; which is pulled in via the stable patchset below,
which enforces a strict min/max MTU when MTUs are being set up
via rtnetlink for ipvlans:

Breaking commit:
---
Ubuntu-hwe-4.15.0-92.93~16.04.1
* Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
  * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()

The above patch applies checks of dev->min_mtu and dev->max_mtu
to avoid a malicious user from crashing the kernel with a bad
value. It was patching the original patchset to centralize min/max
MTU checking from various different subsystems of the networking
kernel. However, in that patchset, the max_mtu had not been set
to the largest phys (64K) or jumbo (9000 bytes), and defaults to
1500. The recent commit above which enforces strict bounds checking
for MTU size exposes the bug of the max mtu not being set correctly
for the ipvlan driver (this has been previously fixed in bonding,
teaming drivers).

Fix:
---
This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans,
but was not backported to Bionic along with other patches. The missing commit 
in the Bionic backport:

ipvlan: use ETH_MAX_MTU as max mtu
commit 548feb33c598dfaf9f8e066b842441ac49b84a8a

[Test Case]

1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe)

2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
   (where test1 eno1 is the physical interface you are adding
the ipvlan on)

3. # ip link
...
14: test1@eno1:  mtu 9000 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
...
  // check that your test1 ipvlan is created with mtu 9000

4. Install 4.15.0-92 kernel or later

5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
RTNETLINK answers: Invalid argument

6. With the above fix commit backported to the xenial-hwe/Bionic,
the jumbo mtu ipvlan creation works again, identical to before 92.

[Regression Potential]

This commit is in upstream mainline as of v4.18-rc2, and hence
is already in Cosmic and later, i.e. all post Bionic releases
currently. Hence there's low regression potential here.

It only impacts ipvlan functionality, and not other networking
systems, so core systems should not be affected by this. And
affects on setup so it either works or doesn't. Patch is trivial.

It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions
(where the latent bug got exposed).

** Affects: linux (Ubuntu)
 Importance: Critical
 Status: Incomplete

** Affects: linux (Ubuntu Bionic)
 Importance: Critical
 Status: Incomplete


** Tags: bionic sts

** Changed in: linux (Ubuntu)
   Importance: Undecided => Critical

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => Critical

** Description changed:

  [IMPACT]
  
  Setting an MTU larger than the default 1500 results in an
  error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
  when attempting to create ipvlan interfaces:
  
  # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument
  
- This breaks Docker and other applications which use a Jumbo 
+ This breaks Docker and other applications which use a Jumbo
  MTU (9000) when using ipvlans.
  
- The bug is caused by the following recent commit to Bionic 
- & Xenial-hwe; which is pulled in via the stable patchset below, 
+ The bug is caused by the following recent commit to Bionic
+ & Xenial-hwe; which is pulled in via the stable patchset below,
  which enforces a strict min/max MTU when MTUs are being set up
  via rtnetlink for ipvlans:
  
  Breaking commit:
  ---
  Ubuntu-hwe-4.15.0-92.93~16.04.1
  * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
-   * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()
+   * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()
  
  The above patch applies checks of dev->min_mtu and dev->max_mtu
  to avoid a malicious user from crashing the kernel with a bad
  value. It was patching the original patchset to centralize min/max
  MTU checking from various different subsystems of the networking
  kernel. However, in that patchset, the max_mtu had not been set
  to the largest phys (64K) or jumbo (9000 bytes), and defaults to
  1500. The recent commit above which enforces strict bounds checking
- for MTU size exposes the bug of the max mtu not being set correctly.
+ for MTU size exposes the bug of the max mtu not being set correctly
+ for the ip

[Kernel-packages] [Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

2020-05-21 Thread Nivedita Singhvi

SRU request has been submitted.

If anyone would like to test, there are test images up on:
https://people.canonical.com/~nivedita/ipvlan-test-fix-278887/

You can 'wget' the files and then 'dpkg -i' the modules, 
linux-image, modules-extra debs in that order, and reboot.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1879658

Title:
  Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

Status in linux package in Ubuntu:
  Incomplete
Status in linux source package in Bionic:
  In Progress

Bug description:
  [IMPACT]

  Setting an MTU larger than the default 1500 results in an
  error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
  when attempting to create ipvlan interfaces:

  # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  This breaks Docker and other applications which use a Jumbo
  MTU (9000) when using ipvlans.

  The bug is caused by the following recent commit to Bionic
  & Xenial-hwe; which is pulled in via the stable patchset below,
  which enforces a strict min/max MTU when MTUs are being set up
  via rtnetlink for ipvlans:

  Breaking commit:
  ---
  Ubuntu-hwe-4.15.0-92.93~16.04.1
  * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
    * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()

  The above patch applies checks of dev->min_mtu and dev->max_mtu
  to avoid a malicious user from crashing the kernel with a bad
  value. It was patching the original patchset to centralize min/max
  MTU checking from various different subsystems of the networking
  kernel. However, in that patchset, the max_mtu had not been set
  to the largest phys (64K) or jumbo (9000 bytes), and defaults to
  1500. The recent commit above which enforces strict bounds checking
  for MTU size exposes the bug of the max mtu not being set correctly
  for the ipvlan driver (this has been previously fixed in bonding,
  teaming drivers).

  Fix:
  ---
  This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans,
  but was not backported to Bionic along with other patches. The missing commit 
in the Bionic backport:

  ipvlan: use ETH_MAX_MTU as max mtu
  commit 548feb33c598dfaf9f8e066b842441ac49b84a8a

  [Test Case]

  1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe)

  2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
     (where test1 eno1 is the physical interface you are adding
  the ipvlan on)

  3. # ip link
  ...
  14: test1@eno1:  mtu 9000 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
  ...
    // check that your test1 ipvlan is created with mtu 9000

  4. Install 4.15.0-92 kernel or later

  5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  6. With the above fix commit backported to the xenial-hwe/Bionic,
  the jumbo mtu ipvlan creation works again, identical to before 92.

  [Regression Potential]

  This commit is in upstream mainline as of v4.18-rc2, and hence
  is already in Cosmic and later, i.e. all post Bionic releases
  currently. Hence there's low regression potential here.

  It only impacts ipvlan functionality, and not other networking
  systems, so core systems should not be affected by this. And
  affects on setup so it either works or doesn't. Patch is trivial.

  It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions
  (where the latent bug got exposed).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

2020-05-21 Thread Nivedita Singhvi

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

** Changed in: linux (Ubuntu)
   Status: Confirmed => In Progress

** Changed in: linux (Ubuntu)
   Status: In Progress => Invalid

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1879658

Title:
  Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress

Bug description:
  [IMPACT]

  Setting an MTU larger than the default 1500 results in an
  error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
  when attempting to create ipvlan interfaces:

  # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  This breaks Docker and other applications which use a Jumbo
  MTU (9000) when using ipvlans.

  The bug is caused by the following recent commit to Bionic
  & Xenial-hwe; which is pulled in via the stable patchset below,
  which enforces a strict min/max MTU when MTUs are being set up
  via rtnetlink for ipvlans:

  Breaking commit:
  ---
  Ubuntu-hwe-4.15.0-92.93~16.04.1
  * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
    * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()

  The above patch applies checks of dev->min_mtu and dev->max_mtu
  to avoid a malicious user from crashing the kernel with a bad
  value. It was patching the original patchset to centralize min/max
  MTU checking from various different subsystems of the networking
  kernel. However, in that patchset, the max_mtu had not been set
  to the largest phys (64K) or jumbo (9000 bytes), and defaults to
  1500. The recent commit above which enforces strict bounds checking
  for MTU size exposes the bug of the max mtu not being set correctly
  for the ipvlan driver (this has been previously fixed in bonding,
  teaming drivers).

  Fix:
  ---
  This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans,
  but was not backported to Bionic along with other patches. The missing commit 
in the Bionic backport:

  ipvlan: use ETH_MAX_MTU as max mtu
  commit 548feb33c598dfaf9f8e066b842441ac49b84a8a

  [Test Case]

  1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe)

  2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
     (where test1 eno1 is the physical interface you are adding
  the ipvlan on)

  3. # ip link
  ...
  14: test1@eno1:  mtu 9000 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
  ...
    // check that your test1 ipvlan is created with mtu 9000

  4. Install 4.15.0-92 kernel or later

  5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  6. With the above fix commit backported to the xenial-hwe/Bionic,
  the jumbo mtu ipvlan creation works again, identical to before 92.

  [Regression Potential]

  This commit is in upstream mainline as of v4.18-rc2, and hence
  is already in Cosmic and later, i.e. all post Bionic releases
  currently. Hence there's low regression potential here.

  It only impacts ipvlan functionality, and not other networking
  systems, so core systems should not be affected by this. And
  affects on setup so it either works or doesn't. Patch is trivial.

  It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions
  (where the latent bug got exposed).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1879658] Re: Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

2020-05-22 Thread Nivedita Singhvi

Test kernel has been tested successfully so far by
original reporter and has fixed the Docker breakage
and so on.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1879658

Title:
  Cannot create ipvlans with > 1500 MTU on recent Bionic kernels

Status in linux package in Ubuntu:
  Invalid
Status in linux source package in Bionic:
  In Progress

Bug description:
  [IMPACT]

  Setting an MTU larger than the default 1500 results in an
  error on the recent (4.15.0-92+) Bionic/Xenial -hwe kernels
  when attempting to create ipvlan interfaces:

  # ip link add test0 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  This breaks Docker and other applications which use a Jumbo
  MTU (9000) when using ipvlans.

  The bug is caused by the following recent commit to Bionic
  & Xenial-hwe; which is pulled in via the stable patchset below,
  which enforces a strict min/max MTU when MTUs are being set up
  via rtnetlink for ipvlans:

  Breaking commit:
  ---
  Ubuntu-hwe-4.15.0-92.93~16.04.1
  * Bionic update: upstream stable patchset 2020-02-21 (LP: #1864261)
    * net: rtnetlink: validate IFLA_MTU attribute in rtnl_create_link()

  The above patch applies checks of dev->min_mtu and dev->max_mtu
  to avoid a malicious user from crashing the kernel with a bad
  value. It was patching the original patchset to centralize min/max
  MTU checking from various different subsystems of the networking
  kernel. However, in that patchset, the max_mtu had not been set
  to the largest phys (64K) or jumbo (9000 bytes), and defaults to
  1500. The recent commit above which enforces strict bounds checking
  for MTU size exposes the bug of the max mtu not being set correctly
  for the ipvlan driver (this has been previously fixed in bonding,
  teaming drivers).

  Fix:
  ---
  This was fixed in the upstream kernel as of v4.18-rc2 for ipvlans,
  but was not backported to Bionic along with other patches. The missing commit 
in the Bionic backport:

  ipvlan: use ETH_MAX_MTU as max mtu
  commit 548feb33c598dfaf9f8e066b842441ac49b84a8a

  [Test Case]

  1. Install any kernel earlier than 4.15.0-92 (Bionic/Xenial-hwe)

  2. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
     (where test1 eno1 is the physical interface you are adding
  the ipvlan on)

  3. # ip link
  ...
  14: test1@eno1:  mtu 9000 qdisc noop state DOWN mode 
DEFAULT group default qlen 1000
  ...
    // check that your test1 ipvlan is created with mtu 9000

  4. Install 4.15.0-92 kernel or later

  5. # ip link add test1 mtu 9000 link eno1 type ipvlan mode l2
  RTNETLINK answers: Invalid argument

  6. With the above fix commit backported to the xenial-hwe/Bionic,
  the jumbo mtu ipvlan creation works again, identical to before 92.

  [Regression Potential]

  This commit is in upstream mainline as of v4.18-rc2, and hence
  is already in Cosmic and later, i.e. all post Bionic releases
  currently. Hence there's low regression potential here.

  It only impacts ipvlan functionality, and not other networking
  systems, so core systems should not be affected by this. And
  affects on setup so it either works or doesn't. Patch is trivial.

  It only impacts Bionic/Xenial-hwe 4.15.0-92 onwards versions
  (where the latent bug got exposed).

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1879658/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2020-05-28 Thread Nivedita Singhvi

Could anyone hitting this bug confirm it is a DUP
of LP Bug 1852077 and that latest releases fix this
issue? 

The handling of the state changes/updates borked here
due to not just marking it as a DUP and closing this one.

I will close this next week otherwise.

** Changed in: linux (Ubuntu Focal)
   Status: In Progress => Fix Released

** Changed in: linux (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

** Changed in: linux (Ubuntu Disco)
   Status: Fix Committed => Fix Released

** Changed in: linux (Ubuntu Eoan)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  We are losing port channel aggregation on reboot.

  After the reboot, /var/log/syslog contains the entries:
  [  250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports
  [  282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports

  Aggregator IDs of the slave interfaces are different:
  ubuntu@node-6:~$ cat /proc/net/bonding/bond2 
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer3+4 (1)
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Min links: 0
  Aggregator selection policy (ad_select): stable

  Slave Interface: enp24s0f1np1
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:51
  Slave queue ID: 0
  Aggregator ID: 1
  Actor Churn State: none
  Partner Churn State: none
  Actor Churned Count: 0
  Partner Churned Count: 0

  Slave Interface: enp24s0f0np0
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:50
  Slave queue ID: 0
  Aggregator ID: 2
  Actor Churn State: churned
  Partner Churn State: churned
  Actor Churned Count: 1
  Partner Churned Count: 1

  The mismatch in "Aggregator ID" on the port is a symptom of the issue.
  If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up',
  the port with the mismatched ID appears to renegotiate with the port-
  channel and becomes aggregated.

  The other way to workaround this issue is to put bond ports down and
  bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

  When I change the order of bringing the ports up (first enp24s0f1np1,
  and second enp24s0f0np0), the issue is still there.

  When the issue occurs, a port on the switch, corresponding to
  interface enp24s0f0np0 is in Suspended state. After applying the
  workaround the port is no longer in Suspended state and Aggregator IDs
  in /proc/net/bonding/bond2 are equal.

  I installed 5.0.0 kernel, the issue is still there.

  Operating System: 
  Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

  ubuntu@node-6:~$ uname -a
  Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@node-6:~$ sudo lspci -vnvn
  https://pastebin.ubuntu.com/p/Dy2CKDbySC/

  Hardware: Dell PowerEdge R740xd
  BIOS version: 2.1.7

  sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-
  AQBEU7Gw8a_AJTuq0AOZO

  ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
  https://pastebin.ubuntu.com/p/sqCx79vZWM/

  ubuntu@node-6:~$ lspci -n | grep 18:00
  18:00.0 0200: 14e4:16d8 (rev 01)
  18:00.1 0200: 14e4:16d8 (rev 01)

  ubuntu@node-6:~$ modinfo bnx2x
  https://pastebin.ubuntu.com/p/pkmzsFjK8M/

  ubuntu@node-6:~$ ip -o l 
  https://pastebin.ubuntu.com/p/QpW7TjnT2v/

  ubuntu@node-6:~$ ip -o a
  https://pastebin.ubuntu.com/p/MczKtrnmDR/

  ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
  https://pastebin.ubuntu.com/p/9cZpPc7C6P/

  ubuntu@node-6:~$ sudo lshw -c network
  https://pastebin.ubuntu.com/p/gmfgZptzDT/
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jun 26 10:21 seq
   crw-rw 1 root audio 116, 33 Jun 26 10:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/

[Kernel-packages] [Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2020-05-28 Thread Nivedita Singhvi

Note that fix for all the above series are already released.

i.e, from Ubuntu-4.15.0-73.82.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Disco:
  Fix Released
Status in linux source package in Eoan:
  Fix Released
Status in linux source package in Focal:
  Fix Released

Bug description:
  We are losing port channel aggregation on reboot.

  After the reboot, /var/log/syslog contains the entries:
  [  250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports
  [  282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports

  Aggregator IDs of the slave interfaces are different:
  ubuntu@node-6:~$ cat /proc/net/bonding/bond2 
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer3+4 (1)
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Min links: 0
  Aggregator selection policy (ad_select): stable

  Slave Interface: enp24s0f1np1
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:51
  Slave queue ID: 0
  Aggregator ID: 1
  Actor Churn State: none
  Partner Churn State: none
  Actor Churned Count: 0
  Partner Churned Count: 0

  Slave Interface: enp24s0f0np0
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:50
  Slave queue ID: 0
  Aggregator ID: 2
  Actor Churn State: churned
  Partner Churn State: churned
  Actor Churned Count: 1
  Partner Churned Count: 1

  The mismatch in "Aggregator ID" on the port is a symptom of the issue.
  If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up',
  the port with the mismatched ID appears to renegotiate with the port-
  channel and becomes aggregated.

  The other way to workaround this issue is to put bond ports down and
  bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

  When I change the order of bringing the ports up (first enp24s0f1np1,
  and second enp24s0f0np0), the issue is still there.

  When the issue occurs, a port on the switch, corresponding to
  interface enp24s0f0np0 is in Suspended state. After applying the
  workaround the port is no longer in Suspended state and Aggregator IDs
  in /proc/net/bonding/bond2 are equal.

  I installed 5.0.0 kernel, the issue is still there.

  Operating System: 
  Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

  ubuntu@node-6:~$ uname -a
  Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@node-6:~$ sudo lspci -vnvn
  https://pastebin.ubuntu.com/p/Dy2CKDbySC/

  Hardware: Dell PowerEdge R740xd
  BIOS version: 2.1.7

  sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-
  AQBEU7Gw8a_AJTuq0AOZO

  ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
  https://pastebin.ubuntu.com/p/sqCx79vZWM/

  ubuntu@node-6:~$ lspci -n | grep 18:00
  18:00.0 0200: 14e4:16d8 (rev 01)
  18:00.1 0200: 14e4:16d8 (rev 01)

  ubuntu@node-6:~$ modinfo bnx2x
  https://pastebin.ubuntu.com/p/pkmzsFjK8M/

  ubuntu@node-6:~$ ip -o l 
  https://pastebin.ubuntu.com/p/QpW7TjnT2v/

  ubuntu@node-6:~$ ip -o a
  https://pastebin.ubuntu.com/p/MczKtrnmDR/

  ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
  https://pastebin.ubuntu.com/p/9cZpPc7C6P/

  ubuntu@node-6:~$ sudo lshw -c network
  https://pastebin.ubuntu.com/p/gmfgZptzDT/
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jun 26 10:21 seq
   crw-rw 1 root audio 116, 33 Jun 26 10:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 18.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 004: ID 1604:10c0 Tascam 
   Bus 001 Device 003: ID 1604:10c0 Tascam 
   Bus 001 Device 002: ID 1604:10c0 Tascam 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R740xd
  Package: linux (not installed)
  PciM

[Kernel-packages] [Bug 1852077] Re: Backport: bonding: fix state transition issue in link monitoring

2019-11-21 Thread Nivedita Singhvi

Still waiting on these patches being committed to all the Ubuntu trees. 
Any ETA? Is this waiting on being picked up via -stable?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1852077

Title:
  Backport: bonding: fix state transition issue in link monitoring

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  In Progress
Status in linux source package in Focal:
  In Progress

Bug description:
  == Justification ==
  From the well explained commit message:

  Since de77ecd4ef02 ("bonding: improve link-status update in
  mii-monitoring"), the bonding driver has utilized two separate variables
  to indicate the next link state a particular slave should transition to.
  Each is used to communicate to a different portion of the link state
  change commit logic; one to the bond_miimon_commit function itself, and
  another to the state transition logic.

   Unfortunately, the two variables can become unsynchronized,
  resulting in incorrect link state transitions within bonding.  This can
  cause slaves to become stuck in an incorrect link state until a
  subsequent carrier state transition.

   The issue occurs when a special case in bond_slave_netdev_event
  sets slave->link directly to BOND_LINK_FAIL.  On the next pass through
  bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
  case will set the proposed next state (link_new_state) to BOND_LINK_UP,
  but the new_link to BOND_LINK_DOWN.  The setting of the final link state
  from new_link comes after that from link_new_state, and so the slave
  will end up incorrectly in _DOWN state.

   Resolve this by combining the two variables into one.

  == Fixes ==
  * 1899bb32 (bonding: fix state transition issue in link monitoring)

  This patch can be cherry-picked into E/F

  For older releases like B/D, it will needs to be backported as they are
  missing the slave_err() printk marco added in 5237ff79 (bonding: add
  slave_foo printk macros) as well as the commit to replace netdev_err()
  with slave_err() in e2a7420d (bonding/main: convert to using slave
  printk macros)

  For Xenial, the commit that causes this issue, de77ecd4, does not
  exist.

  == Test ==
  Test kernels can be found here:
  https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/

  The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
  the patched kernel works as expected.

  == Regression Potential ==
  Low.
  This patch just unify the variable used in link state change commit
  logic to prevent the occurrence of an incorrect state. And the changes
  are limited to the bonding driver itself.

  (Although the include/net/bonding.h will be used in other drivers, but
  the changes to that file is only affecting this bond_main.c driver)

  == Original Bug Report ==
  There's an issue with bonding driver in the current ubuntu kernels.
  Sometimes one link stuck in a weird state.
  It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html 
in upstream.
  Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.

  We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
  should be reproducible with other current kernel versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2019-11-21 Thread Nivedita Singhvi

This is being handled as a DUP of LP Bug 1852077

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077

** Changed in: linux (Ubuntu)
   Status: Expired => In Progress

** Tags added: sts

** Also affects: linux (Ubuntu Disco)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Bionic)
   Importance: Undecided
   Status: New

** Also affects: linux (Ubuntu Focal)
   Importance: Undecided
   Status: In Progress

** Also affects: linux (Ubuntu Eoan)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  New
Status in linux source package in Disco:
  New
Status in linux source package in Eoan:
  New
Status in linux source package in Focal:
  In Progress

Bug description:
  We are losing port channel aggregation on reboot.

  After the reboot, /var/log/syslog contains the entries:
  [  250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports
  [  282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports

  Aggregator IDs of the slave interfaces are different:
  ubuntu@node-6:~$ cat /proc/net/bonding/bond2 
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer3+4 (1)
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Min links: 0
  Aggregator selection policy (ad_select): stable

  Slave Interface: enp24s0f1np1
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:51
  Slave queue ID: 0
  Aggregator ID: 1
  Actor Churn State: none
  Partner Churn State: none
  Actor Churned Count: 0
  Partner Churned Count: 0

  Slave Interface: enp24s0f0np0
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:50
  Slave queue ID: 0
  Aggregator ID: 2
  Actor Churn State: churned
  Partner Churn State: churned
  Actor Churned Count: 1
  Partner Churned Count: 1

  The mismatch in "Aggregator ID" on the port is a symptom of the issue.
  If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up',
  the port with the mismatched ID appears to renegotiate with the port-
  channel and becomes aggregated.

  The other way to workaround this issue is to put bond ports down and
  bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

  When I change the order of bringing the ports up (first enp24s0f1np1,
  and second enp24s0f0np0), the issue is still there.

  When the issue occurs, a port on the switch, corresponding to
  interface enp24s0f0np0 is in Suspended state. After applying the
  workaround the port is no longer in Suspended state and Aggregator IDs
  in /proc/net/bonding/bond2 are equal.

  I installed 5.0.0 kernel, the issue is still there.

  Operating System: 
  Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

  ubuntu@node-6:~$ uname -a
  Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@node-6:~$ sudo lspci -vnvn
  https://pastebin.ubuntu.com/p/Dy2CKDbySC/

  Hardware: Dell PowerEdge R740xd
  BIOS version: 2.1.7

  sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-
  AQBEU7Gw8a_AJTuq0AOZO

  ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
  https://pastebin.ubuntu.com/p/sqCx79vZWM/

  ubuntu@node-6:~$ lspci -n | grep 18:00
  18:00.0 0200: 14e4:16d8 (rev 01)
  18:00.1 0200: 14e4:16d8 (rev 01)

  ubuntu@node-6:~$ modinfo bnx2x
  https://pastebin.ubuntu.com/p/pkmzsFjK8M/

  ubuntu@node-6:~$ ip -o l 
  https://pastebin.ubuntu.com/p/QpW7TjnT2v/

  ubuntu@node-6:~$ ip -o a
  https://pastebin.ubuntu.com/p/MczKtrnmDR/

  ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
  https://pastebin.ubuntu.com/p/9cZpPc7C6P/

  ubuntu@node-6:~$ sudo lshw -c network
  https://pastebin.ubuntu.com/p/gmfgZptzDT/
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jun 26 10:21 seq
   crw-rw 1 root audio 116, 33 Jun 26 10:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 18.04
  IwConfig: Er

[Kernel-packages] [Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2019-11-21 Thread Nivedita Singhvi

https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/

There is a test kernel above (from that LP bug).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  New
Status in linux source package in Disco:
  New
Status in linux source package in Eoan:
  New
Status in linux source package in Focal:
  In Progress

Bug description:
  We are losing port channel aggregation on reboot.

  After the reboot, /var/log/syslog contains the entries:
  [  250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports
  [  282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports

  Aggregator IDs of the slave interfaces are different:
  ubuntu@node-6:~$ cat /proc/net/bonding/bond2 
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer3+4 (1)
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Min links: 0
  Aggregator selection policy (ad_select): stable

  Slave Interface: enp24s0f1np1
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:51
  Slave queue ID: 0
  Aggregator ID: 1
  Actor Churn State: none
  Partner Churn State: none
  Actor Churned Count: 0
  Partner Churned Count: 0

  Slave Interface: enp24s0f0np0
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:50
  Slave queue ID: 0
  Aggregator ID: 2
  Actor Churn State: churned
  Partner Churn State: churned
  Actor Churned Count: 1
  Partner Churned Count: 1

  The mismatch in "Aggregator ID" on the port is a symptom of the issue.
  If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up',
  the port with the mismatched ID appears to renegotiate with the port-
  channel and becomes aggregated.

  The other way to workaround this issue is to put bond ports down and
  bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

  When I change the order of bringing the ports up (first enp24s0f1np1,
  and second enp24s0f0np0), the issue is still there.

  When the issue occurs, a port on the switch, corresponding to
  interface enp24s0f0np0 is in Suspended state. After applying the
  workaround the port is no longer in Suspended state and Aggregator IDs
  in /proc/net/bonding/bond2 are equal.

  I installed 5.0.0 kernel, the issue is still there.

  Operating System: 
  Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

  ubuntu@node-6:~$ uname -a
  Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@node-6:~$ sudo lspci -vnvn
  https://pastebin.ubuntu.com/p/Dy2CKDbySC/

  Hardware: Dell PowerEdge R740xd
  BIOS version: 2.1.7

  sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-
  AQBEU7Gw8a_AJTuq0AOZO

  ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
  https://pastebin.ubuntu.com/p/sqCx79vZWM/

  ubuntu@node-6:~$ lspci -n | grep 18:00
  18:00.0 0200: 14e4:16d8 (rev 01)
  18:00.1 0200: 14e4:16d8 (rev 01)

  ubuntu@node-6:~$ modinfo bnx2x
  https://pastebin.ubuntu.com/p/pkmzsFjK8M/

  ubuntu@node-6:~$ ip -o l 
  https://pastebin.ubuntu.com/p/QpW7TjnT2v/

  ubuntu@node-6:~$ ip -o a
  https://pastebin.ubuntu.com/p/MczKtrnmDR/

  ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
  https://pastebin.ubuntu.com/p/9cZpPc7C6P/

  ubuntu@node-6:~$ sudo lshw -c network
  https://pastebin.ubuntu.com/p/gmfgZptzDT/
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jun 26 10:21 seq
   crw-rw 1 root audio 116, 33 Jun 26 10:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 18.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 004: ID 1604:10c0 Tascam 
   Bus 001 Device 003: ID 1604:10c0 Tascam 
   Bus 001 Device 002: ID 1604:10c0 Tascam 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R740xd
  Package: linux (not installed)
  PciMultimedi

[Kernel-packages] [Bug 1852077] Re: Backport: bonding: fix state transition issue in link monitoring

2019-11-21 Thread Nivedita Singhvi

FWIW, the fix has been committed to -stable:

"bonding: fix state transition issue in link monitoring"
Commit: 1899bb325149e481de31a4f32b59ea6f24e176ea

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/bonding?id=1899bb325149e481de31a4f32b59ea6f24e176ea


** Tags added: sts

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1852077

Title:
  Backport: bonding: fix state transition issue in link monitoring

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  In Progress
Status in linux source package in Disco:
  In Progress
Status in linux source package in Eoan:
  In Progress
Status in linux source package in Focal:
  In Progress

Bug description:
  == Justification ==
  From the well explained commit message:

  Since de77ecd4ef02 ("bonding: improve link-status update in
  mii-monitoring"), the bonding driver has utilized two separate variables
  to indicate the next link state a particular slave should transition to.
  Each is used to communicate to a different portion of the link state
  change commit logic; one to the bond_miimon_commit function itself, and
  another to the state transition logic.

   Unfortunately, the two variables can become unsynchronized,
  resulting in incorrect link state transitions within bonding.  This can
  cause slaves to become stuck in an incorrect link state until a
  subsequent carrier state transition.

   The issue occurs when a special case in bond_slave_netdev_event
  sets slave->link directly to BOND_LINK_FAIL.  On the next pass through
  bond_miimon_inspect after the slave goes carrier up, the BOND_LINK_FAIL
  case will set the proposed next state (link_new_state) to BOND_LINK_UP,
  but the new_link to BOND_LINK_DOWN.  The setting of the final link state
  from new_link comes after that from link_new_state, and so the slave
  will end up incorrectly in _DOWN state.

   Resolve this by combining the two variables into one.

  == Fixes ==
  * 1899bb32 (bonding: fix state transition issue in link monitoring)

  This patch can be cherry-picked into E/F

  For older releases like B/D, it will needs to be backported as they are
  missing the slave_err() printk marco added in 5237ff79 (bonding: add
  slave_foo printk macros) as well as the commit to replace netdev_err()
  with slave_err() in e2a7420d (bonding/main: convert to using slave
  printk macros)

  For Xenial, the commit that causes this issue, de77ecd4, does not
  exist.

  == Test ==
  Test kernels can be found here:
  https://people.canonical.com/~phlin/kernel/lp-1852077-bonding/

  The X-hwe and Disco kernel were tested by the bug reporter, Aleksei,
  the patched kernel works as expected.

  == Regression Potential ==
  Low.
  This patch just unify the variable used in link state change commit
  logic to prevent the occurrence of an incorrect state. And the changes
  are limited to the bonding driver itself.

  (Although the include/net/bonding.h will be used in other drivers, but
  the changes to that file is only affecting this bond_main.c driver)

  == Original Bug Report ==
  There's an issue with bonding driver in the current ubuntu kernels.
  Sometimes one link stuck in a weird state.
  It was fixed with patch https://www.spinics.net/lists/netdev/msg609506.html 
in upstream.
  Commit 1899bb325149e481de31a4f32b59ea6f24e176ea.

  We see this bug with linux 4.15 (ubuntu xenial, hwe kernel), but it
  should be reproducible with other current kernel versions.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1852077/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2019-11-21 Thread Nivedita Singhvi

FWIW, the fix has been committed to -stable:

"bonding: fix state transition issue in link monitoring"
Commit: 1899bb325149e481de31a4f32b59ea6f24e176ea

https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux.git/commit/drivers/net/bonding?id=1899bb325149e481de31a4f32b59ea6f24e176ea

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  New
Status in linux source package in Disco:
  New
Status in linux source package in Eoan:
  New
Status in linux source package in Focal:
  In Progress

Bug description:
  We are losing port channel aggregation on reboot.

  After the reboot, /var/log/syslog contains the entries:
  [  250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports
  [  282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports

  Aggregator IDs of the slave interfaces are different:
  ubuntu@node-6:~$ cat /proc/net/bonding/bond2 
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer3+4 (1)
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Min links: 0
  Aggregator selection policy (ad_select): stable

  Slave Interface: enp24s0f1np1
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:51
  Slave queue ID: 0
  Aggregator ID: 1
  Actor Churn State: none
  Partner Churn State: none
  Actor Churned Count: 0
  Partner Churned Count: 0

  Slave Interface: enp24s0f0np0
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:50
  Slave queue ID: 0
  Aggregator ID: 2
  Actor Churn State: churned
  Partner Churn State: churned
  Actor Churned Count: 1
  Partner Churned Count: 1

  The mismatch in "Aggregator ID" on the port is a symptom of the issue.
  If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up',
  the port with the mismatched ID appears to renegotiate with the port-
  channel and becomes aggregated.

  The other way to workaround this issue is to put bond ports down and
  bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

  When I change the order of bringing the ports up (first enp24s0f1np1,
  and second enp24s0f0np0), the issue is still there.

  When the issue occurs, a port on the switch, corresponding to
  interface enp24s0f0np0 is in Suspended state. After applying the
  workaround the port is no longer in Suspended state and Aggregator IDs
  in /proc/net/bonding/bond2 are equal.

  I installed 5.0.0 kernel, the issue is still there.

  Operating System: 
  Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

  ubuntu@node-6:~$ uname -a
  Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@node-6:~$ sudo lspci -vnvn
  https://pastebin.ubuntu.com/p/Dy2CKDbySC/

  Hardware: Dell PowerEdge R740xd
  BIOS version: 2.1.7

  sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-
  AQBEU7Gw8a_AJTuq0AOZO

  ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
  https://pastebin.ubuntu.com/p/sqCx79vZWM/

  ubuntu@node-6:~$ lspci -n | grep 18:00
  18:00.0 0200: 14e4:16d8 (rev 01)
  18:00.1 0200: 14e4:16d8 (rev 01)

  ubuntu@node-6:~$ modinfo bnx2x
  https://pastebin.ubuntu.com/p/pkmzsFjK8M/

  ubuntu@node-6:~$ ip -o l 
  https://pastebin.ubuntu.com/p/QpW7TjnT2v/

  ubuntu@node-6:~$ ip -o a
  https://pastebin.ubuntu.com/p/MczKtrnmDR/

  ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
  https://pastebin.ubuntu.com/p/9cZpPc7C6P/

  ubuntu@node-6:~$ sudo lshw -c network
  https://pastebin.ubuntu.com/p/gmfgZptzDT/
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jun 26 10:21 seq
   crw-rw 1 root audio 116, 33 Jun 26 10:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  DistroRelease: Ubuntu 18.04
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig': 'iwconfig'
  Lsusb:
   Bus 002 Device 001: ID 1d6b:0003 Linux Foundation 3.0 root hub
   Bus 001 Device 004: ID 1604:10c0 Tascam 
   Bus 001 Device 003: ID 1604:10c0 Tascam 
   Bus 001 Device 002:

[Kernel-packages] [Bug 1834322] Re: Losing port aggregate with 802.3ad port-channel/bonding aggregation on reboot

2019-12-30 Thread Nivedita Singhvi

Fix has been committed to B, D, E. I've manually updated this
bug for now (it was not formally DUP'd to LP Bug 1852077.


** Changed in: linux (Ubuntu Focal)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Eoan)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Disco)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Importance: Undecided => High

** Changed in: linux (Ubuntu Bionic)
   Status: New => Fix Committed

** Changed in: linux (Ubuntu Disco)
   Status: New => Fix Committed

** Changed in: linux (Ubuntu Eoan)
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1834322

Title:
  Losing port aggregate with 802.3ad port-channel/bonding aggregation on
  reboot

Status in linux package in Ubuntu:
  In Progress
Status in linux source package in Bionic:
  Fix Committed
Status in linux source package in Disco:
  Fix Committed
Status in linux source package in Eoan:
  Fix Committed
Status in linux source package in Focal:
  In Progress

Bug description:
  We are losing port channel aggregation on reboot.

  After the reboot, /var/log/syslog contains the entries:
  [  250.790758] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports
  [  282.029426] bond2: An illegal loopback occurred on adapter (enp24s0f1np1)
 Check the configuration to verify that all adapters are 
connected to 802.3ad compliant switch ports

  Aggregator IDs of the slave interfaces are different:
  ubuntu@node-6:~$ cat /proc/net/bonding/bond2 
  Ethernet Channel Bonding Driver: v3.7.1 (April 27, 2011)

  Bonding Mode: IEEE 802.3ad Dynamic link aggregation
  Transmit Hash Policy: layer3+4 (1)
  MII Status: up
  MII Polling Interval (ms): 100
  Up Delay (ms): 0
  Down Delay (ms): 0

  802.3ad info
  LACP rate: fast
  Min links: 0
  Aggregator selection policy (ad_select): stable

  Slave Interface: enp24s0f1np1
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:51
  Slave queue ID: 0
  Aggregator ID: 1
  Actor Churn State: none
  Partner Churn State: none
  Actor Churned Count: 0
  Partner Churned Count: 0

  Slave Interface: enp24s0f0np0
  MII Status: up
  Speed: 1 Mbps
  Duplex: full
  Link Failure Count: 0
  Permanent HW addr: b0:26:28:48:9f:50
  Slave queue ID: 0
  Aggregator ID: 2
  Actor Churn State: churned
  Partner Churn State: churned
  Actor Churned Count: 1
  Partner Churned Count: 1

  The mismatch in "Aggregator ID" on the port is a symptom of the issue.
  If we do 'ip link set dev bond2 down' and 'ip link set dev bond2 up',
  the port with the mismatched ID appears to renegotiate with the port-
  channel and becomes aggregated.

  The other way to workaround this issue is to put bond ports down and
  bring up port enp24s0f0np0 first and port enp24s0f1np1 second.

  When I change the order of bringing the ports up (first enp24s0f1np1,
  and second enp24s0f0np0), the issue is still there.

  When the issue occurs, a port on the switch, corresponding to
  interface enp24s0f0np0 is in Suspended state. After applying the
  workaround the port is no longer in Suspended state and Aggregator IDs
  in /proc/net/bonding/bond2 are equal.

  I installed 5.0.0 kernel, the issue is still there.

  Operating System: 
  Ubuntu 18.04.2 LTS (GNU/Linux 4.15.0-52-generic x86_64)

  ubuntu@node-6:~$ uname -a
  Linux node-6 4.15.0-52-generic #56-Ubuntu SMP Tue Jun 4 22:49:08 UTC 2019 
x86_64 x86_64 x86_64 GNU/Linux

  ubuntu@node-6:~$ sudo lspci -vnvn
  https://pastebin.ubuntu.com/p/Dy2CKDbySC/

  Hardware: Dell PowerEdge R740xd
  BIOS version: 2.1.7

  sosreport: https://drive.google.com/open?id=1-eN7cZJIeu-
  AQBEU7Gw8a_AJTuq0AOZO

  ubuntu@node-6:~$ lspci | grep Ethernet | grep 10G
  https://pastebin.ubuntu.com/p/sqCx79vZWM/

  ubuntu@node-6:~$ lspci -n | grep 18:00
  18:00.0 0200: 14e4:16d8 (rev 01)
  18:00.1 0200: 14e4:16d8 (rev 01)

  ubuntu@node-6:~$ modinfo bnx2x
  https://pastebin.ubuntu.com/p/pkmzsFjK8M/

  ubuntu@node-6:~$ ip -o l 
  https://pastebin.ubuntu.com/p/QpW7TjnT2v/

  ubuntu@node-6:~$ ip -o a
  https://pastebin.ubuntu.com/p/MczKtrnmDR/

  ubuntu@node-6:~$ cat /etc/netplan/98-juju.yaml
  https://pastebin.ubuntu.com/p/9cZpPc7C6P/

  ubuntu@node-6:~$ sudo lshw -c network
  https://pastebin.ubuntu.com/p/gmfgZptzDT/
  --- 
  ProblemType: Bug
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Jun 26 10:21 seq
   crw-rw 1 root audio 116, 33 Jun 26 10:21 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay': 'aplay'
  ApportVersion: 2.20.9-0ubuntu7.6
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord': 
'arecord'
  AudioDevicesInUse: Error: command ['fuser',

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi

Hello, Edwin,

We have two separate users/customers filing reports, and I can answer for
one of them. I'll ask the original poster separately as well to reply.

With respect to one of these situations, this is the following system:

Dell PowerEdge R440/0XP8V5, BIOS 2.2.11 06/14/2019

Note that a similar system does not have any issues:

Dell Inc. PowerEdge R430/0CN7X8, BIOS 2.3.4 11/08/2016

So the NIC in the "bad" environment is:

BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)
Product Name: Broadcom Adv. Dual 10G SFP+ Ethernet

The NIC in the "good" environment is:

Broadcom Inc. and subsidiaries NetXtreme II BCM57810
10 Gigabit Ethernet [14e4:1006]
Product Name: QLogic 57810 10 Gigabit Ethernet

I'll have to scrub some files and see what I can attach,
apologies, I'll have it here by tmrw. 

Unfortunately, we don't have an easy reproducer.
A single iperf and netperf test (both UDP and TCP) showed identical
results from both "good" and "bad" environments.

What we have is an identical kernel, network configuration and
stack with the "bad" system showing double, triple the latency
to the systems from a remote server. I'll have more information
for you shortly here regarding the exact k8 cmd.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi

** Attachment added: "active interface ethtool-S"
   
https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1853638/+attachment/5324070/+files/ethtool-S-enp94s0f0

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have followed the troubleshooting information on this link to try determine 
the problem: https://fi

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi

** Attachment added: "backup interface ethtool-S"
   
https://bugs.launchpad.net/ubuntu/+source/network-manager/+bug/1853638/+attachment/5324071/+files/ethtool-S-enp94s0f1d1

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have followed the troubleshooting information on this link to try determine 
the problem: https://

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-29 Thread Nivedita Singhvi

Note that iperf was identical whereas netperf and mtr showed
up differences (so it's possibly sporadic as well, not continuous)


1. iperf tcp test
--
GoodSystem.9.84 Gbits/sec
BadSystem18.37 Gbits/sec
BadSystem2...9.85 Gbits/sec


2. iperf udp test
--
GoodSystem.1.05 Mbits/sec
BadSystem2...1.05 Mbits/sec


3. mtr ping test
---
GoodSystem..0.0% Loss; 0.2 Avg; 0.1 Best, 0.9 Worst, 0.1 StdDev
BadSystem2...11.7% Loss; 0.1 Avg; 0.1 Best, 0.2 Worst, 0.0 StdDev


4. netperf tcp_rr 1/1 bytes

GoodSystem..17921.83 t/sec
BadSystem1.13912.45 t/sec
BadSystem2


5. netperf tcp_rr 64/64 bytes

GoodSystem..16987.48 t/sec
BadSystem1.13355.93 t/sec
BadSystem2


6. netperf tcp_rr 128/8192 bytes
---
GoodSystem..2396.45 t/sec
BadSystem1.1678.54 t/sec
BadSystem2

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted sa

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi

> The mtr packet loss is an interesting result. What mtr options did you
use? Is this a UDP or ICMP test?


The mtr command was:

mtr --no-dns --report --report-cycles 60 $IP_ADDR

so ICMP was going out.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I have followed the troubleshooting information on this link to try de

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi

Thanks very much for helping on this, Edwin! Please let me
know if there's anything specific you need. 

I'm asking them to disable any IPv6, LLDP traffic in their environment,
and retest and collect information again.

Also, I'd like to disable tpa, would this be at all useful:

modprobe bnx disable_tpa=1

??

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi

> There are more than one variable at play here. 
> Does the problem follow the NIC if you swap the 
> NICs between systems? Are OS / kernel and driver 
> versions the same on both systems?

Unfortunately, I've not been able to get them to try
permutations or switches, as yet, as this is still a
production system/environment. 

I'll try and obtain more information about it.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly o

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-31 Thread Nivedita Singhvi

> NICs between systems? Are OS / kernel and driver
> versions the same on both systems? 

Yes, identical distro release, kernel, and most of the software
stack (I have not obtained and examined the full sw stack). 

Configuration of networking settings is also the same.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested them
  with normal 1G network cards and have no dropped samples.

  Personally I think its something to do with the 10G network card,
  possibly on a ubuntu driver???

  Note, Dell have said there is no hardware problem with the 10G
  interfaces

  I h

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-04 Thread Nivedita Singhvi

Hey Edwin, sorry, I didn't see your last question.

I'll try and confirm but I've seen loss in both 
directions but it's not clear whether that's significant
enough or not yet. 

e.g., TCP traffic is retransmitted, so it could be segments
lost while outgoing or acks lost incoming. 

4407 retransmitted TCP segments 
130 TCP timeouts

in stats collected about 5 mins apart - which isn't 
sufficient a sample size, we're trying to get a new 
collection of stats, logs using the netperf TCP_RR test.

In our case, note, we're more concerned (and have more solid
data) of latency issues than dropped packets (which I expect
some of with heavy network testing). 

For example, netperf TCP_RR latency is about 70-78% of the older
systems for 1,1 request/response byte sizes as well as 64/64, 
100/200, 128/8192 sizes.

I'll update here as soon as we have more data from the production
environment.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-02-09 Thread Nivedita Singhvi

Hello Edwin,

Here is more information on the issue we are seeing wrt dropped 
packets and other connectivity issues with this NIC.

The problem is *only* seen when the second port on the NIC is 
chosen as the active interface of a active-backup configuration. 

So on the "bad" system with the interfaces:

enp94s0f0 -> when chosen as active, all OK
enp94s0f1d1 -> when chosen as active, not OK

I'll see if the reporters can confirm that on the "good" systems,
there was no problem when the second interface is active.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we have tested th

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-17 Thread Nivedita Singhvi

I have reports of the same device appearing to drop packets and incur
greater number of retransmissions under certain circumstances which
we're still trying to nail down.

I'm using this bug for now until proven to be a different problem.

This is causing issues in a production environment.


** Changed in: network-manager (Ubuntu)
   Status: New => Confirmed

** Changed in: network-manager (Ubuntu)
   Importance: Undecided => Critical

** Tags added: sts

** Also affects: linux (Ubuntu)
   Importance: Undecided
   Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  New
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$

  
  In this particular case description, the nodes are USRP x310s. However, we 
have the same issue with N210 nodes dropping samples connected to the BCM57416 
NetXtreme-E Dual-Media 10G RDMA Ethernet device.

  There is no problem with the USRPs themselves, as we

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-17 Thread Nivedita Singhvi

We suspect this is a device (hw/fw) issue, however, not NetworkManager 
or kernel (driver bnxt_en). I've added the kernel for the driver impact 
(just in case, for now). This is really to eliminate all other causes
and confirm whether it's the device at root cause). 

NIC

Product Name: Broadcom Adv. Dual 10G SFP+ Ethernet
5e:00.1 Ethernet controller: Broadcom Inc. and subsidiaries 
BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet Controller (rev 01)

NIC Driver/FW
---
driver: bnxt_en
version: 1.10.0
firmware-version: 214.0.253.1/pkg 21.40.25.31
expansion-rom-version:
bus-info: :5e:00.1
supports-statistics: yes

Kernel
-
5.0.0-37-generic #40~18.04.1-Ubuntu SMP Thu Nov 14 12:06:39 UTC 2019

(appears to be an issue on all kernel versions)

Environment Configuration
-
active-backup bonding mode

(having the active backup up *might* potentially be the problem, 
 but it might just be the device itself).


The exact same distro, kernel, applications and configuration
works fine with a different NIC (Broadcom 10g bnx2x).

There were quite a few total tpa_abort stats counts (1118473)
during the duration of a 2 minute iperf test. 

Hoping to get more information from other users seeing the
same issue.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence er

[Kernel-packages] [Bug 1853638] Re: BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be dropping data

2020-01-17 Thread Nivedita Singhvi

(active interface)

> cat ethtool-S-enp94s0f1d1 | grep abort
 [0]: tpa_aborts: 19775497
 [1]: tpa_aborts: 26758635
 [2]: tpa_aborts: 12008147
 [3]: tpa_aborts: 15829167
 [4]: tpa_aborts: 25099500
 [5]: tpa_aborts: 3292554
 [6]: tpa_aborts: 2863692
 [7]: tpa_aborts: 20224692


(backup interface)
> cat ethtool-S-enp94s0f0 | grep abort
 [0]: tpa_aborts: 3158584
 [1]: tpa_aborts: 1670319
 [2]: tpa_aborts: 1749371
 [3]: tpa_aborts: 1454301
 [4]: tpa_aborts: 123020
 [5]: tpa_aborts: 1403509
 [6]: tpa_aborts: 1298383
 [7]: tpa_aborts: 1858753

Netted out from previous capture, there were

*f0 = 2014 tpa_aborts
*d1 = 1118473 tpa_aborts


** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

** Changed in: linux (Ubuntu)
   Importance: Undecided => Critical

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1853638

Title:
  BCM57416 NetXtreme-E Dual-Media 10G RDMA Ethernet device seems to be
  dropping data

Status in linux package in Ubuntu:
  Confirmed
Status in network-manager package in Ubuntu:
  Confirmed

Bug description:
  The issue appears to be with the BCM57416 NetXtreme-E Dual-Media 10G
  RDMA Ethernet device seems to be dropping data

  Basically, we are dropping data, as you can see from the benchmark
  tool as follows:

  tcdforge@x310a:/usr/local/lib/lib/uhd/examples$ ./benchmark_rate --rx_rate 
10e6 --tx_rate 10e6 --duration 300
  [INFO] [UHD] linux; GNU C++ version 5.4.0 20160609; Boost_105800; 
UHD_3.14.1.1-0-g98c7c986
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam

  [00:00:00.07] Creating the usrp device with: ...
  [INFO] [X300] X300 initialization sequence...
  [INFO] [X300] Maximum frame size: 1472 bytes.
  [INFO] [X300] Radio 1x clock: 200 MHz
  [INFO] [GPS] Found an internal GPSDO: LC_XO, Firmware Rev 0.929a
  [INFO] [0/DmaFIFO_0] Initializing block control (NOC ID: 0xF1F0D000)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1308 MB/s)
  [INFO] [0/DmaFIFO_0] BIST passed (Throughput: 1316 MB/s)
  [INFO] [0/Radio_0] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/Radio_1] Initializing block control (NOC ID: 0x12AD1001)
  [INFO] [0/DDC_0] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DDC_1] Initializing block control (NOC ID: 0xDDC0)
  [INFO] [0/DUC_0] Initializing block control (NOC ID: 0xD0C0)
  [INFO] [0/DUC_1] Initializing block control (NOC ID: 0xD0C0)
  Using Device: Single USRP:
Device: X-Series Device
Mboard 0: X310
RX Channel: 0
  RX DSP: 0
  RX Dboard: A
  RX Subdev: SBX-120 RX
RX Channel: 1
  RX DSP: 0
  RX Dboard: B
  RX Subdev: SBX-120 RX
TX Channel: 0
  TX DSP: 0
  TX Dboard: A
  TX Subdev: SBX-120 TX
TX Channel: 1
  TX DSP: 0
  TX Dboard: B
  TX Subdev: SBX-120 TX

  [00:00:04.305374] Setting device timestamp to 0...
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.310990] Testing receive rate 10.00 Msps on 1 channels
  [WARNING] [UHD] Unable to set the thread priority. Performance may be 
negatively affected.
  Please see the general application notes in the manual for instructions.
  EnvironmentError: OSError: error in pthread_setschedparam
  [00:00:04.318356] Testing transmit rate 10.00 Msps on 1 channels
  [00:00:06.693119] Detected Rx sequence error.
  D[00:00:09.402843] Detected Rx sequence error.
  DD[00:00:40.927978] Detected Rx sequence error.
  D[00:01:44.982243] Detected Rx sequence error.
  D[00:02:11.400692] Detected Rx sequence error.
  D[00:02:14.805292] Detected Rx sequence error.
  D[00:02:41.875596] Detected Rx sequence error.
  D[00:03:06.927743] Detected Rx sequence error.
  D[00:03:47.967891] Detected Rx sequence error.
  D[00:03:58.233659] Detected Rx sequence error.
  D[00:03:58.876588] Detected Rx sequence error.
  D[00:04:03.139770] Detected Rx sequence error.
  D[00:04:45.287465] Detected Rx sequence error.
  D[00:04:56.425845] Detected Rx sequence error.
  D[00:04:57.929209] Detected Rx sequence error.
  [00:05:04.529548] Benchmark complete.
  Benchmark rate summary:
Num received samples: 2995435936
Num dropped samples:  4622800
Num overruns detected:0
Num transmitted samples:  3008276544
Num sequence errors (Tx): 0
Num sequence errors (Rx): 15
Num underruns detected:   0
Num late commands:0
Num timeouts (Tx):0
Num timeouts (Rx):0
  Done!

  tcdforge@x3

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2019-06-26 Thread Nivedita Singhvi

As the test kernel with the backported Xenial fix 
has been up for almost 2 months now, I'm submitting
the SRU for Xenial, although I have not received
feedback from original reporter or others.

Backported patch for Xenial varies slightly from the
cherry-picked patch for B, C. 

My testing has been successful (see original testing
information in description).

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Xenial:
  In Progress
Status in linux source package in Bionic:
  Fix Released
Status in linux source package in Cosmic:
  Fix Released
Status in linux source package in Disco:
  Fix Released

Bug description:
  SRU Justification

  Impact: Cannot create geneve tunnels if ipv6 is disabled dynamically.

  Fix:
  Fixed by upstream commit in v5.0:
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7
  "geneve: correctly handle ipv6.disable module parameter"

  Hence available in Disco and later; required in X,B,C.

  Testcase:
  1. Boot with "ipv6.disable=1"
  2. Then try and create a geneve tunnel using:
     # ovs-vsctl add-br br1
     # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z // ip of the other host

  Regression Potential: Low, only geneve tunnels when ipv6 dynamically
  disabled, current status is it doesn't work at all.

  Other Info:
  * Mainline commit msg includes reference to a fix for
    non-metadata tunnels (infrastructure is not yet in
    our tree prior to Disco), hence not being included
    at this time under this case.

    At this time, all geneve tunnels created as above
    are metadata-enabled.

  ---
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in
  an OS environment with open vswitch, where ipv6 has been disabled,
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not
  add network device geneve0 to ofproto (Address family not supported
  by protocol)."

  [Fix]
  There is an upstream commit for this in v5.0 mainline (and in Disco and later 
Ubuntu kernels).

  "geneve: correctly handle ipv6.disable module parameter"
  Commit: cf1c9ccba7308e48a68fa77f476287d9d614e4c7

  This fix is needed on all our series prior to Disco
  and the v5.0 kernel: X, C, B. It is identical to the
  fix we implemented and tested internally with, but had
  not pushed upstream yet.

  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
     is shown with the 4.15.0-23-generic kernel (which differs
     slightly from 4.4.x in symptoms):

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'.
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system:
  failed to add geneve1 as port: Address family not supported
  by protocol"

  You will notice from the "ifconfig" output that the device
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from
  /etc/default/grub + update-grub + reboot), the same
  'ovs-vsctl add-port' command completes successfully.
  You can see that it is working properly by adding an
  IP to the br1 and pinging each host.

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't
  happen using the 'ovs-vsctl add-port' command, no warning is
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is
  also not created and ping test won't work.

  With the fixed test kernel, the interfaces and tunnel
  is created successfully.

  [Regression Potential]
  * Low -- affects the geneve driver only, and when ipv6 is
    disabled, and since it doesn't work in that case at all,
    this fix gets the tunnel up and running for the common case.

  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments
  as a design and support  principle.

  Currently, however, what's in the implementation requires support
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

[Kernel-packages] [Bug 1840046] Re: BUG: non-zero pgtables_bytes on freeing mm: -16384

2019-09-05 Thread Nivedita Singhvi

This issue has been tested and successfully verified:

Verification successful !

"...test appliance built with 4.15.0-58 was unusable ... hundreds of
"BUG: non-zero pgtables_bytes on freeing mm: -16384" in syslog, RestAPI
interface timeouts, failed to produce FFDC data using sosreport.

Build with 4.15.0-60.67 displays none of these behaviors ... smoke test
completed successfully."


** Tags added: verification-done-bionic

** Changed in: linux (Ubuntu Bionic)
   Status: Fix Committed => Fix Released

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1840046

Title:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Fix Released

Bug description:
  [impact]

  This message is printed repeatedly in the logs:
  BUG: non-zero pgtables_bytes on freeing mm: -16384

  [test case]

  boot the 4.15.0-58 kernel on s390x

  [regression potential]

  this affects task pud accounting; regressions may be around cleaning
  up task memory.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1840046/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-06-07 Thread Nivedita Singhvi

Hi Joseph,

We're continuing the investigation into this issue, and I was wondering
if you and Nabuto could provide what the last point you had reached was,
and/or next step you were going to do.

>From what I can summarize (please confirm/correct):

* Artful (4.13.*) kernels (with any Artful config) is good
* Artful (4.13.*) kernels (with any Xenial config) is also bad

* 4.12-rc4 - relatively good (1.x%) but still not 0% (<5%)
* 4.12-rc3 - also bad (~ 27%)

* Xenial (4.4.*) kernels (with any Xenial config) is bad
* Xenial (4.4.*) kernels (with any Artful config) is still bad

[data point: 4.12-rc4 with Artful configs is good. 4.12-rc4 with Xenial
configs is bad.]

So a kernel change + config change results in masked/fixed behavior, I
guess?

Is the remaining bisect window basically 4.12-rc4 -> 4.13 ?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  When provisioning Ubuntu servers with MAAS at once, some bonding pairs
  will have unexpected LACP status such as "Expired". It randomly
  happens at each provisioning with the default xenial kernel(4.4), but
  not reproducible with HWE kernel(4.13). I'm using Intel X710 cards
  (Dell-branded).

  Using the HWE kernel works as a workaround for short term, but it's
  not ideal since 4.13 is not covered by Canonical Livepatch service.

  How to reproduce:
  1. configure LACP bonding with MAAS
  2. provision machines
  3. check the bonding status in /proc/net/bonding/bond*

  frequency of occurrence:
  About 5 bond pairs in 22 pairs at each provisioning.

  [reproducible combination]
  $ uname -a
  Linux comp006 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 1.4.25-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  [non-reproducible combination]
  $ uname -a
  Linux comp006 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-116-generic 4.4.0-116.140
  ProcVersionSignature: Ubuntu 4.4.0-116.140-generic 4.4.98
  Uname: Linux 4.4.0-116-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar  6 06:37 seq
   crw-rw 1 root audio 116, 33 Mar  6 06:37 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.15
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Mar  6 06:46:32 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 002 Device 002: ID 8087:8002 Intel Corp. 
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
   Bus 001 Device 002: ID 8087:800a Intel Corp. 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R730
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-116-generic.efi.signed 
root=UUID=0528f88e-cf1a-43e2-813a-e7261b88d460 ro console=tty0 
console=ttyS0,115200n8
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-116-generic N/A
   linux-backports-modules-4.4.0-116-generic  N/A
   linux-firmware 1.157.17
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/16/2017
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.5.5
  dmi.board.name: 072T6D
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A08
  dmi.chassis.asset.tag: 0018880
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.5.5:bd08/16/2017:svnDellInc.:pnPowerEdgeR730:pvr:rvnDellInc.:rn072T6D:rvrA08:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R730
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+subscriptions

-

[Kernel-packages] [Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-06-11 Thread Nivedita Singhvi

I would have thought this would be the relevant patch:

bonding: speed/duplex update at NETDEV_UP event
 Mahesh Bandewar authored and davem330 committed on Sep 28, 2017
1 parent b5c7d4e commit 4d2c0cda07448ea6980f00102dc3964eb25e241c 

However, it was first available in v4.15-rc1.

At least as far as bonding kernel changes go, there does not
seem another obvious candidate that might have fixed this problem
between 4.12 and 4.13 (first skim).

At least for one scenario I looked at, we got a bad speed/duplex
setting, which eventually ended up with the bond interface 
aggregating on a separate port, and/or ending up in LACP DISABLED
state which it never got out of. We only checked correct/latest
device speed/duplex settings via the NETDEV_CHANGE path, where
we called _ethtool_get_settings(). If we don't receive a change
event again to correct the speed/duplex, we never recover.

There are some other patches which help address this at different
points, but are either before or later (see above) the window.

I'll take a look at code outside the bonding dir which might
impact this. 

Joseph, could you provide the raw config files you used as well?
It was not super clear in the png image if those were the only
diffs. They did not seem very relevant diffs either.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  When provisioning Ubuntu servers with MAAS at once, some bonding pairs
  will have unexpected LACP status such as "Expired". It randomly
  happens at each provisioning with the default xenial kernel(4.4), but
  not reproducible with HWE kernel(4.13). I'm using Intel X710 cards
  (Dell-branded).

  Using the HWE kernel works as a workaround for short term, but it's
  not ideal since 4.13 is not covered by Canonical Livepatch service.

  How to reproduce:
  1. configure LACP bonding with MAAS
  2. provision machines
  3. check the bonding status in /proc/net/bonding/bond*

  frequency of occurrence:
  About 5 bond pairs in 22 pairs at each provisioning.

  [reproducible combination]
  $ uname -a
  Linux comp006 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 1.4.25-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  [non-reproducible combination]
  $ uname -a
  Linux comp006 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-116-generic 4.4.0-116.140
  ProcVersionSignature: Ubuntu 4.4.0-116.140-generic 4.4.98
  Uname: Linux 4.4.0-116-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar  6 06:37 seq
   crw-rw 1 root audio 116, 33 Mar  6 06:37 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.15
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Mar  6 06:46:32 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 002 Device 002: ID 8087:8002 Intel Corp. 
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
   Bus 001 Device 002: ID 8087:800a Intel Corp. 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R730
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-116-generic.efi.signed 
root=UUID=0528f88e-cf1a-43e2-813a-e7261b88d460 ro console=tty0 
console=ttyS0,115200n8
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-116-generic N/A
   linux-backports-modules-4.4.0-116-generic  N/A
   linux-firmware 1.157.17
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/16/2017
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.5.5
  dmi.board.name: 072T6D
  dmi.

[Kernel-packages] [Bug 1753662] Re: [i40e] LACP bonding start up race conditions

2018-06-11 Thread Nivedita Singhvi

Jeff,

Please do provide your logs and whatever other information you can share
from your error case, any piece of info will help here. I do not yet
have a repro environment myself.

I suspect that most of the changes which seem to help or fix the issue
are simply changing the timing enough to affect the race window, making
it less likely to occur, so are masking the problem rather than fixing
the root cause.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1753662

Title:
  [i40e] LACP bonding start up race conditions

Status in linux package in Ubuntu:
  Triaged
Status in linux source package in Xenial:
  Triaged

Bug description:
  When provisioning Ubuntu servers with MAAS at once, some bonding pairs
  will have unexpected LACP status such as "Expired". It randomly
  happens at each provisioning with the default xenial kernel(4.4), but
  not reproducible with HWE kernel(4.13). I'm using Intel X710 cards
  (Dell-branded).

  Using the HWE kernel works as a workaround for short term, but it's
  not ideal since 4.13 is not covered by Canonical Livepatch service.

  How to reproduce:
  1. configure LACP bonding with MAAS
  2. provision machines
  3. check the bonding status in /proc/net/bonding/bond*

  frequency of occurrence:
  About 5 bond pairs in 22 pairs at each provisioning.

  [reproducible combination]
  $ uname -a
  Linux comp006 4.4.0-116-generic #140-Ubuntu SMP Mon Feb 12 21:23:04 UTC 2018 
x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 1.4.25-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  [non-reproducible combination]
  $ uname -a
  Linux comp006 4.13.0-36-generic #40~16.04.1-Ubuntu SMP Fri Feb 16 23:25:58 
UTC 2018 x86_64 x86_64 x86_64 GNU/Linux

  $ sudo ethtool -i eno1
  driver: i40e
  version: 2.1.14-k
  firmware-version: 6.00 0x800034e6 18.3.6
  expansion-rom-version: 
  bus-info: :01:00.0
  supports-statistics: yes
  supports-test: yes
  supports-eeprom-access: yes
  supports-register-dump: yes
  supports-priv-flags: yes

  ProblemType: Bug
  DistroRelease: Ubuntu 16.04
  Package: linux-image-4.4.0-116-generic 4.4.0-116.140
  ProcVersionSignature: Ubuntu 4.4.0-116.140-generic 4.4.98
  Uname: Linux 4.4.0-116-generic x86_64
  AlsaDevices:
   total 0
   crw-rw 1 root audio 116,  1 Mar  6 06:37 seq
   crw-rw 1 root audio 116, 33 Mar  6 06:37 timer
  AplayDevices: Error: [Errno 2] No such file or directory: 'aplay'
  ApportVersion: 2.20.1-0ubuntu2.15
  Architecture: amd64
  ArecordDevices: Error: [Errno 2] No such file or directory: 'arecord'
  AudioDevicesInUse: Error: command ['fuser', '-v', '/dev/snd/seq', 
'/dev/snd/timer'] failed with exit code 1:
  Date: Tue Mar  6 06:46:32 2018
  IwConfig: Error: [Errno 2] No such file or directory: 'iwconfig'
  Lsusb:
   Bus 002 Device 002: ID 8087:8002 Intel Corp. 
   Bus 002 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
   Bus 001 Device 003: ID 413c:a001 Dell Computer Corp. Hub
   Bus 001 Device 002: ID 8087:800a Intel Corp. 
   Bus 001 Device 001: ID 1d6b:0002 Linux Foundation 2.0 root hub
  MachineType: Dell Inc. PowerEdge R730
  PciMultimedia:
   
  ProcEnviron:
   TERM=screen
   PATH=(custom, no user)
   LANG=en_US.UTF-8
   SHELL=/bin/bash
  ProcFB: 0 EFI VGA
  ProcKernelCmdLine: BOOT_IMAGE=/vmlinuz-4.4.0-116-generic.efi.signed 
root=UUID=0528f88e-cf1a-43e2-813a-e7261b88d460 ro console=tty0 
console=ttyS0,115200n8
  RelatedPackageVersions:
   linux-restricted-modules-4.4.0-116-generic N/A
   linux-backports-modules-4.4.0-116-generic  N/A
   linux-firmware 1.157.17
  RfKill: Error: [Errno 2] No such file or directory: 'rfkill'
  SourcePackage: linux
  UpgradeStatus: No upgrade log present (probably fresh install)
  dmi.bios.date: 08/16/2017
  dmi.bios.vendor: Dell Inc.
  dmi.bios.version: 2.5.5
  dmi.board.name: 072T6D
  dmi.board.vendor: Dell Inc.
  dmi.board.version: A08
  dmi.chassis.asset.tag: 0018880
  dmi.chassis.type: 23
  dmi.chassis.vendor: Dell Inc.
  dmi.modalias: 
dmi:bvnDellInc.:bvr2.5.5:bd08/16/2017:svnDellInc.:pnPowerEdgeR730:pvr:rvnDellInc.:rn072T6D:rvrA08:cvnDellInc.:ct23:cvr:
  dmi.product.name: PowerEdge R730
  dmi.sys.vendor: Dell Inc.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1753662/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1771480] Re: WARNING: CPU: 28 PID: 34085 at /build/linux-90Gc2C/linux-3.13.0/net/core/dev.c:1433 dev_disable_lro+0x87/0x90()

2018-05-15 Thread Nivedita Singhvi

The warning message:

"failed to disable LRO!"

is coming from the function dev_disable_lro():

/**
 *  dev_disable_lro - disable Large Receive Offload on a device
 *  @dev: device
 *
 *  Disable Large Receive Offload (LRO) on a net device.  Must be
 *  called under RTNL.  This is needed if received packets may be
 *  forwarded to another interface.
 */

dev_disable_lro()
...
if (unlikely(dev->features & NETIF_F_LRO))
netdev_WARN(dev, "failed to disable LRO!\n");
...


Likely relevant callers here:
bond_enslave()
if (!(bond_dev->features & NETIF_F_LRO))
dev_disable_lro(slave_dev);
br_add_if()
dev_disable_lro(dev);

...
Looking like the second, from the trace.

I'd say if you can repro then turn on debug and also
dynamic debug on the files br_if.c and dev.c.  

Possibly another issue with the device name? Is bond1.2001 
a vlan interface?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1771480

Title:
  WARNING: CPU: 28 PID: 34085 at /build/linux-
  90Gc2C/linux-3.13.0/net/core/dev.c:1433 dev_disable_lro+0x87/0x90()

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  I have multiple instances of this dev_disable_lro error in kern.log. Also 
seeing this: 
  systemd-udevd[1452]: timeout: killing 'bridge-network-interface' [2765] 
  <4>May 1 22:56:42 xxx kernel: [ 404.520990] bonding: bond0: Warning: No 
802.3ad response from the link partner for any adapters in the bond 
  <4>May 1 22:56:44 xxx kernel: [ 406.926429] bonding: bond0: Warning: No 
802.3ad response from the link partner for any adapters in the bond 
  <4>May 1 22:56:45 xxx kernel: [ 407.569020] [ cut here 
] 
  <4>May 1 22:56:45 xxx kernel: [ 407.569029] WARNING: CPU: 28 PID: 34085 at 
/build/linux-90Gc2C/linux-3.13.0/net/core/dev.c:1433 
dev_disable_lro+0x87/0x90() 
  <4>May 1 22:56:45 xxx kernel: [ 407.569032] netdevice: bond0.2004 
  <4>May 1 22:56:45 xxx kernel: [ 407.569032] failed to disable LRO! 
  <4>May 1 22:56:45 xxx kernel: [ 407.569035] Modules linked in: 8021q garp mrp 
bridge stp llc bonding iptable_filter ip_tables x_tables nf_conntrack_proto_gre 
nf_conntrack_ipv6 nf_defrag_ipv6 nf_conntrack_ipv4 nf_defrag_ipv4 nf_conntrack 
ipmi_devintf mxm_wmi dcdbas x86_pkg_temp_thermal coretemp kvm_intel kvm 
crct10dif_pclmul crc32_pclmul ghash_clmulni_intel aesni_intel aes_x86_64 lrw 
gf128mul glue_helper ablk_helper cryptd lpc_ich mei_me mei ipmi_si shpchp wmi 
acpi_power_meter mac_hid xfs libcrc32c raid10 usb_storage raid456 
async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq raid1 
raid0 igb ixgbe i2c_algo_bit multipath ahci dca ptp libahci pps_core linear 
megaraid_sas mdio dm_multipath scsi_dh 
  <4>May 1 22:56:45 xxx kernel: [ 407.569112] CPU: 28 PID: 34085 Comm: brctl 
Not tainted 3.13.0-142-generic #191-Ubuntu 
  <4>May 1 22:56:45 xxx kernel: [ 407.569115] Hardware name: Dell Inc. 
PowerEdge R730xd/072T6D, BIOS 2.7.1 001/22/2018 
  <4>May 1 22:56:45 xxx kernel: [ 407.569118]  881fcc753c70 
8172e7fc 881fcc753cb8 
  <4>May 1 22:56:45 xxx kernel: [ 407.569129] 0009 881fcc753ca8 
8106afad 883fcc6f8000 
  <4>May 1 22:56:45 xxx kernel: [ 407.569139] 883fcc696880 883fcc6f8000 
 881fce82dd40 
  <4>May 1 22:56:45 xxx kernel: [ 407.569150] Call Trace: 
  <4>May 1 22:56:45 xxx kernel: [ 407.569160] [] 
dump_stack+0x64/0x82 
  <4>May 1 22:56:45 xxx kernel: [ 407.569168] [] 
warn_slowpath_common+0x7d/0xa0 
  <4>May 1 22:56:45 xxx kernel: [ 407.569175] [] 
warn_slowpath_fmt+0x4c/0x50 
  <4>May 1 22:56:45 xxx kernel: [ 407.569183] [] 
dev_disable_lro+0x87/0x90 
  <4>May 1 22:56:45 xxx kernel: [ 407.569195] [] 
br_add_if+0x1f3/0x430 [bridge] 
  <4>May 1 22:56:45 xxx kernel: [ 407.569205] [] 
add_del_if+0x5d/0x90 [bridge] 
  <4>May 1 22:56:45 xxx kernel: [ 407.569215] [] 
br_dev_ioctl+0x5b/0x90 [bridge] 
  <4>May 1 22:56:45 xxx kernel: [ 407.569223] [] 
dev_ifsioc+0x313/0x360 
  <4>May 1 22:56:45 xxx kernel: [ 407.569230] [] ? 
dev_get_by_name_rcu+0x69/0x90 
  <4>May 1 22:56:45 xxx kernel: [ 407.569237] [] 
dev_ioctl+0xe9/0x590 
  <4>May 1 22:56:45 xxx kernel: [ 407.569245] [] 
sock_do_ioctl+0x45/0x50 
  <4>May 1 22:56:45 xxx kernel: [ 407.569252] [] 
sock_ioctl+0x1f0/0x2c0 
  <4>May 1 22:56:45 xxx kernel: [ 407.569260] [] 
do_vfs_ioctl+0x2e0/0x4c0 
  <4>May 1 22:56:45 xxx kernel: [ 407.569267] [] ? 
fput+0xe/0x10 
  <4>May 1 22:56:45 xxx kernel: [ 407.569273] [] 
SyS_ioctl+0x81/0xa0 
  <4>May 1 22:56:45 xxx kernel: [ 407.569283] [] 
system_call_fastpath+0x2f/0x34 
  <4>May 1 22:56:45 xxx kernel: [ 407.569287] ---[ end trace df5aa31d75a7e2b1 
]--- 
  <4>May 1 22:56:54 xxx kernel: [ 416.320138] bonding: bond1: Warning: the 
permanent HWaddr of enp131s0f0 - a0:36:9f:c1:25:d0 - is still in use by bond1. 
Se

[Kernel-packages] [Bug 1794232] [NEW] Geneve tunnels don't work when ipv6 is disabled

2018-09-24 Thread Nivedita Singhvi

Public bug reported:

[Impact]

When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in 
an OS environment with open vswitch, where ipv6 has been disabled, 
the create fails with the error :

“ovs-vsctl: Error detected while setting up 'geneve0': could not 
add network device geneve0 to ofproto (Address family not supported 
by protocol)."

 
[Test Case]
(Best to do this on a kvm guest VM so as not to interfere with  
 your system's networking)

1. On any Ubuntu Xenial kernel, disable ipv6. This example
   is shown with the4.15.0-23-generic kernel (which differs
   slightly from 4.4.x in symptoms):   
  
- Edit /etc/default/grub to add the line:
GRUB_CMDLINE_LINUX="ipv6.disable=1"
- # update-grub
- Reboot


2. Install OVS
# apt install openvswitch-switch

3. Create a Geneve tunnel
# ovs-vsctl add-br br1
# ovs-vsctl add-port br1 geneve1 -- set interface geneve1 
type=geneve options:remote_ip=192.168.x.z

(where remote_ip is the IP of the other host)

You will see the following error message:

"ovs-vsctl: Error detected while setting up 'geneve1'. 
See ovs-vswitchd log for details."

>From /var/log/openvswitch/ovs-vswitchd.log you will see:

"2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: 
failed to add geneve1 as port: Address family not supported 
by protocol"

You will notice from the "ifconfig" output that the device 
genev_sys_6081 is not created.

If you do not disable IPv6 (remove ipv6.disable=1 from 
/etc/default/grub + update-grub + reboot), the same 
'ovs-vsctl add-port' command completes successfully. 
You can see that it is working properly by adding an 
IP to the br1 and pinging each host. 

On kernel 4.4 (4.4.0-128-generic), the error message doesn't 
happen using the 'ovs-vsctl add-port' command, no warning is 
shown in ovs-vswitchd.log, but the device genev_sys_6081 is 
also not created and ping test won't work.


[Other Info]

* Analysis

Geneve tunnels should work with either IPv4 or IPv6 environments 
as a design and support  principle.

Currently, however, what's in the implementation requires support 
for ipv6 for metadata-based tunnels which geneve is:

rather than:

a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
b) ipv4 + metadata + ipv6

What enforces this in the current 4.4.0-x code when opening a Geneve
tunnel is the following in geneve_open() :

bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
bool metadata = geneve->collect_md;
...

#if IS_ENABLED(CONFIG_IPV6)
geneve->sock6 = NULL;
if (ipv6 || metadata)
ret = geneve_sock_add(geneve, true);
#endif
if (!ret && (!ipv6 || metadata))
ret = geneve_sock_add(geneve, false);


CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but
even though ipv6 is false, metadata is always true 
for a geneve open as it is set unconditionally in 
ovs:

In /lib/dpif_netlink_rtnl.c :

case OVS_VPORT_TYPE_GENEVE:
nl_msg_put_flag(&request, IFLA_GENEVE_COLLECT_METADATA);  

The second argument of geneve_sock_add is a boolean
value indicating whether it's an ipv6 address family 
socket or not, and we thus incorrectly pass a true
value rather than false.

The current "|| metadata" check is unnecessary and incorrectly
sends the tunnel creation code down the ipv6 path, which
fails subsequently when the code expects an ipv6 family socket.


* This issue exists in all versions of the kernel upto present
   mainline and net-next trees.  

* Testing with a trivial patch to remove that and make 
  similar changes to those made for vxlan (which had the 
  same issue) has been successful. Patches for various
  versions to be attached here soon.

* We are in the process of sending a patch for this upstream
  once it has completed adequate testing. 

* Example Versions (bug exists in all versions of Ubuntu
  and mainline):

$ uname -r
4.4.0-135-generic

$ lsb_release -rd
Description:Ubuntu 16.04.5 LTS
Release:16.04

$ dpkg -l | grep openvswitch-switch
ii  openvswitch-switch   2.5.4-0ubuntu0.16.04.1

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: geneve kernel-bug

** Tags added: geneve kernel-bug

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  New

Bug description:
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in 
  an OS environment with open vswitch, where ipv6 has been disabled, 
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not 
  add network device geneve0 to ofproto (Address family not supported 
  by protocol)."

   
  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with  
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable i

[Kernel-packages] [Bug 1794232] Re: Geneve tunnels don't work when ipv6 is disabled

2018-09-25 Thread Nivedita Singhvi

Logs not necessary at this time, will attach patches and other
information as needed. 

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1794232

Title:
  Geneve tunnels don't work when ipv6 is disabled

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [Impact]

  When attempting to create a geneve tunnel on Ubuntu 16.04 Xenial, in 
  an OS environment with open vswitch, where ipv6 has been disabled, 
  the create fails with the error :

  “ovs-vsctl: Error detected while setting up 'geneve0': could not 
  add network device geneve0 to ofproto (Address family not supported 
  by protocol)."

   
  [Test Case]
  (Best to do this on a kvm guest VM so as not to interfere with  
   your system's networking)

  1. On any Ubuntu Xenial kernel, disable ipv6. This example
 is shown with the4.15.0-23-generic kernel (which differs
 slightly from 4.4.x in symptoms):   

  - Edit /etc/default/grub to add the line:
  GRUB_CMDLINE_LINUX="ipv6.disable=1"
  - # update-grub
  - Reboot

  
  2. Install OVS
  # apt install openvswitch-switch

  3. Create a Geneve tunnel
  # ovs-vsctl add-br br1
  # ovs-vsctl add-port br1 geneve1 -- set interface geneve1 
  type=geneve options:remote_ip=192.168.x.z

  (where remote_ip is the IP of the other host)

  You will see the following error message:

  "ovs-vsctl: Error detected while setting up 'geneve1'. 
  See ovs-vswitchd log for details."

  From /var/log/openvswitch/ovs-vswitchd.log you will see:

  "2018-07-02T16:48:13.295Z|00026|dpif|WARN|system@ovs-system: 
  failed to add geneve1 as port: Address family not supported 
  by protocol"

  You will notice from the "ifconfig" output that the device 
  genev_sys_6081 is not created.

  If you do not disable IPv6 (remove ipv6.disable=1 from 
  /etc/default/grub + update-grub + reboot), the same 
  'ovs-vsctl add-port' command completes successfully. 
  You can see that it is working properly by adding an 
  IP to the br1 and pinging each host. 

  On kernel 4.4 (4.4.0-128-generic), the error message doesn't 
  happen using the 'ovs-vsctl add-port' command, no warning is 
  shown in ovs-vswitchd.log, but the device genev_sys_6081 is 
  also not created and ping test won't work.


  [Other Info]

  * Analysis

  Geneve tunnels should work with either IPv4 or IPv6 environments 
  as a design and support  principle.

  Currently, however, what's in the implementation requires support 
  for ipv6 for metadata-based tunnels which geneve is:

  rather than:

  a) ipv4 + metadata // whether ipv6 compiled or dynamically disabled
  b) ipv4 + metadata + ipv6

  What enforces this in the current 4.4.0-x code when opening a Geneve
  tunnel is the following in geneve_open() :

  bool ipv6 = geneve->remote.sa.sa_family == AF_INET6;
  bool metadata = geneve->collect_md;
  ...

  #if IS_ENABLED(CONFIG_IPV6)
  geneve->sock6 = NULL;
  if (ipv6 || metadata)
  ret = geneve_sock_add(geneve, true);
  #endif
  if (!ret && (!ipv6 || metadata))
  ret = geneve_sock_add(geneve, false);

  
  CONFIG_IPV6 is enabled, IPv6 is disabled at boot, but
  even though ipv6 is false, metadata is always true 
  for a geneve open as it is set unconditionally in 
  ovs:

  In /lib/dpif_netlink_rtnl.c :

  case OVS_VPORT_TYPE_GENEVE:
  nl_msg_put_flag(&request, IFLA_GENEVE_COLLECT_METADATA);  

  The second argument of geneve_sock_add is a boolean
  value indicating whether it's an ipv6 address family 
  socket or not, and we thus incorrectly pass a true
  value rather than false.

  The current "|| metadata" check is unnecessary and incorrectly
  sends the tunnel creation code down the ipv6 path, which
  fails subsequently when the code expects an ipv6 family socket.

  
  * This issue exists in all versions of the kernel upto present
 mainline and net-next trees.  

  * Testing with a trivial patch to remove that and make 
similar changes to those made for vxlan (which had the 
same issue) has been successful. Patches for various
versions to be attached here soon.

  * We are in the process of sending a patch for this upstream
once it has completed adequate testing. 

  * Example Versions (bug exists in all versions of Ubuntu
and mainline):

  $ uname -r
  4.4.0-135-generic

  $ lsb_release -rd
  Description:  Ubuntu 16.04.5 LTS
  Release:  16.04

  $ dpkg -l | grep openvswitch-switch
  ii  openvswitch-switch   2.5.4-0ubuntu0.16.04.1

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1794232/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.

[Kernel-packages] [Bug 1781413] [NEW] Cannot set MTU higher than 1500 in Xen instance

2018-07-12 Thread Nivedita Singhvi

Public bug reported:

[Impact]
The latest Xenial update has broken MTU functionality in Xen: specifically, 
setting MTUs larger than 1500 fails. This prevents Jumbo Frames and other 
features which require larger than 1500 byte MTUs from being used. This can 
lead to a failure to sync/connect to other components in the cluster/cloud
which expect higher MTUs and result in unavailable services. 

This can be worked around by manually using ethtool to set SCATTER/GATHER
functionality:
- $ sudo ethtool -K $interface_name sg on 


The issue is caused by the following commit to the xen-netfront driver:
"xen-netfront: Fix race between device setup and open"
commit f599c64fdf7d9c108e8717fb04bc41c680120da4
Introduced: v4.16-rc1

Reverting the above fix has confirmed that the problem goes away.

The following commits fix this issue in the mainline kernel:

"xen-netfront: Fix mismatched rtnl_unlock"
commit cb257783c2927b73614b20f915a91ff78aa6f3e8
Introduced: v4.18-rc3
"xen-netfront: Update features after registering netdev"
commit 45c8184c1bed1ca8a7f02918552063a00b909bf5
Introduced: v4.18-rc3


[Test Case]
1. Launch a Xen instance using the latest kernel version (e.g. 4.4.0-130,
or  4.4.0-1062-aws)
2. Change MTU to 9000 or other value > 1500. 

[Regression Potential]
The kernel patch might not be able to set MTU.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New

** Affects: linux (Ubuntu Xenial)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1781413

Title:
  Cannot set MTU higher than 1500 in Xen instance

Status in linux package in Ubuntu:
  New
Status in linux source package in Xenial:
  New

Bug description:
  [Impact]
  The latest Xenial update has broken MTU functionality in Xen: specifically, 
  setting MTUs larger than 1500 fails. This prevents Jumbo Frames and other 
  features which require larger than 1500 byte MTUs from being used. This can 
  lead to a failure to sync/connect to other components in the cluster/cloud
  which expect higher MTUs and result in unavailable services. 

  This can be worked around by manually using ethtool to set SCATTER/GATHER
  functionality:
  - $ sudo ethtool -K $interface_name sg on 

  
  The issue is caused by the following commit to the xen-netfront driver:
  "xen-netfront: Fix race between device setup and open"
  commit f599c64fdf7d9c108e8717fb04bc41c680120da4
  Introduced: v4.16-rc1

  Reverting the above fix has confirmed that the problem goes away.

  The following commits fix this issue in the mainline kernel:

  "xen-netfront: Fix mismatched rtnl_unlock"
  commit cb257783c2927b73614b20f915a91ff78aa6f3e8
  Introduced: v4.18-rc3
  "xen-netfront: Update features after registering netdev"
  commit 45c8184c1bed1ca8a7f02918552063a00b909bf5
  Introduced: v4.18-rc3

  
  [Test Case]
  1. Launch a Xen instance using the latest kernel version (e.g. 4.4.0-130,
  or  4.4.0-1062-aws)
  2. Change MTU to 9000 or other value > 1500. 

  [Regression Potential]
  The kernel patch might not be able to set MTU.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1781413/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814095] [NEW] bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-01-31 Thread Nivedita Singhvi

Public bug reported:

The following 25Gb Broadcom NIC error was seen on Xenial
running the 4.4.0-141-generic kernel on an amd64 host
seeing moderate-heavy network traffic (just once):

* The bnxt_en_po driver froze on a "TX timed out" error
  and triggered the Netdev Watchdog timer under load. 

* From kernel log:
  "NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
  See attached kern.log excerpt file for full excerpt of error log.

* Release = Xenial 
  Kernel = 4.4.0-141-generic #167
  eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet
  
* This caused the driver to reset in order to recover:
  
  "bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
 
  driver: bnxt_en_bpo
  version: 1.8.1
  source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

* The loss of connectivity and softirq stall caused other failures
  on the system. 

* The bnxt_en_po driver is the imported Broadcom driver
  pulled in to support newer Broadcom HW (specific boards)
  while the bnx_en module continues to support the older
  HW. The current Linux upstream driver does not compile
  easily with the 4.4 kernel (too many changes). 

* This upstream and bnxt_en driver fix is a likely solution:
   "bnxt_en: Fix TX timeout during netpoll"
   commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906
  
  This fix has not been applied to the bnxt_en_po driver
  version, but review of the code indicates that it is 
  susceptible to the bug, and the fix would be reasonable. 

* No easy way to reproduce this

** Affects: linux (Ubuntu)
 Importance: High
 Status: New


** Tags: xenial

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  New

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-01-31 Thread Nivedita Singhvi

** Attachment added: "kern.log.excerpt-netdev-watchdog-timeout.txt"
   
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+attachment/5234643/+files/kern.log.excerpt-netdev-watchdog-timeout.txt

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  New

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-01-31 Thread Nivedita Singhvi

Due to earlier NIC flapping observed on systems for the 
25Gb Broadcom NIC, with originally the following config,
the firmware was upgraded to avoid a known FW bug:

$ cat ethtool_-i_enp59s0f1d1
driver: bnxt_en_bpo
version: 1.8.1
firmware-version: 20.8.163/1.8.4 pkg 20.08.04.03
expansion-rom-version:
bus-info: :3b:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no 

The FW was upgraded on affected systems to:

$ cat ethtool_-i_eno2d1
driver: bnxt_en_bpo
version: 1.8.1
firmware-version: 214.0.166/1.9.2 pkg 21.40.16.6
expansion-rom-version: 
bus-info: :19:00.1
supports-statistics: yes
supports-test: yes
supports-eeprom-access: yes
supports-register-dump: no
supports-priv-flags: no

Unfortunately, it's not quite clear which FW version the
current bug happened on (I believe the newer but can't 
confirm -- happened in the midst of several reboots)

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Incomplete

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1814095] Re: bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

2019-01-31 Thread Nivedita Singhvi

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1814095

Title:
  bnxt_en_po: TX timed out triggering Netdev Watchdog Timer

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  The following 25Gb Broadcom NIC error was seen on Xenial
  running the 4.4.0-141-generic kernel on an amd64 host
  seeing moderate-heavy network traffic (just once):

  * The bnxt_en_po driver froze on a "TX timed out" error
and triggered the Netdev Watchdog timer under load. 

  * From kernel log:
"NETDEV WATCHDOG: eno2d1 (bnxt_en_bpo): transmit queue 0 timed out"
See attached kern.log excerpt file for full excerpt of error log.

  * Release = Xenial 
Kernel = 4.4.0-141-generic #167
eno2d1 = Product Name: Broadcom Adv. Dual 25Gb Ethernet

  * This caused the driver to reset in order to recover:

"bnxt_en_bpo :19:00.1 eno2d1: TX timeout detected, starting reset task!"
   
driver: bnxt_en_bpo
version: 1.8.1
source: ubuntu/bnxt/bnxt.c: bnxt_tx_timeout()

  * The loss of connectivity and softirq stall caused other failures
on the system. 

  * The bnxt_en_po driver is the imported Broadcom driver
pulled in to support newer Broadcom HW (specific boards)
while the bnx_en module continues to support the older
HW. The current Linux upstream driver does not compile
easily with the 4.4 kernel (too many changes). 

  * This upstream and bnxt_en driver fix is a likely solution:
 "bnxt_en: Fix TX timeout during netpoll"
 commit: 73f21c653f930f438d53eed29b5e4c65c8a0f906

This fix has not been applied to the bnxt_en_po driver
version, but review of the code indicates that it is 
susceptible to the bug, and the fix would be reasonable. 

  * No easy way to reproduce this

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1814095/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1779756] Re: Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu 18.04)

2019-02-11 Thread Nivedita Singhvi

Any update on a Bionic fix?

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1779756

Title:
  Intel XL710 - i40e driver does not work with kernel 4.15 (Ubuntu
  18.04)

Status in linux package in Ubuntu:
  Fix Released
Status in linux source package in Bionic:
  Confirmed

Bug description:
  Today Ubuntu 16.04 LTS Enablement Stacks has moved from the Kernel
  4.13 to the Kernel 4.15.0-24-generic.

  On a "Dell PowerEdge R330" server with a network adapter "Intel
  Ethernet Converged Network Adapter X710-DA2" (driver i40e) the network
  card no longer works and permanently displays these three lines :

  
  [   98.012098] i40e :01:00.0 enp1s0f0: tx_timeout: VSI_seid: 388, Q 8, 
NTC: 0x0, HWB: 0x0, NTU: 0x1, TAIL: 0x1, INT: 0x1
  [   98.012119] i40e :01:00.0 enp1s0f0: tx_timeout recovery level 11, 
hung_queue 8
  [   98.012125] i40e :01:00.0 enp1s0f0: tx_timeout recovery unsuccessful

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1779756/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1809168] [NEW] FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-19 Thread Nivedita Singhvi

Public bug reported:

[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
--
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU


1. Bionic kernels all WORK

2. Artful kernels:

   Ubuntu-4.11.0-1.6 <-- WORKS
   Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

   Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
   Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

   Ubuntu-lts-* <--- ALL FAIL
   Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1   <-- FAILS

Also works:
https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-
System cannot find the root disk and drops into
an initramfs shell:

mdadm script local-block "CREATE group disk not found"
"Gave up waiting for root device. Common problems:
 - Boot args (cat /proc/cmdline)
   - Check rootdelay=...
   - Check root= ...
 - Missing modules (cat /proc/modules; ls/dev)
ALERT! UUID=... does not exist. Dropping to a shell!

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

** Affects: linux (Ubuntu)
 Importance: Undecided
 Status: New


** Tags: artful fips xenial

** Package changed: libgcrypt20 (Ubuntu) => linux (Ubuntu)

** Description changed:

  [IMPACT]
  
- Install and boot of the FIPS kernel packages
+ Booting of the Xenial-based FIPS kernel packages
  failed with disk not found errors on amd64.
- This was also observed on standard Ubuntu 
- kernels prior to 4.11.0. 
+ This was also observed on standard Ubuntu
+ kernels prior to 4.11.0.
  
  FIPS
  --
  1. linux-image-4.4.0-1002-fips <-- FAIL
  2. linux-image-4.4.0-1006-fips <-- FAIL
- 
  
  UBUNTU
  
  
  1. Bionic kernels all WORK
  
  2. Artful kernels:
  
-Ubuntu-4.11.0-1.6 <-- WORKS
-Ubuntu-4.10.0-26.30 <-- FAILS
+    Ubuntu-4.11.0-1.6 <-- WORKS
+    Ubuntu-4.10.0-26.30 <-- FAILS
  
  3. Xenial kernels:
-
-Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
-Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS
-
-Ubuntu-lts-* <--- ALL FAIL
-Ubuntu-4.4.0-* <--- ALL FAIL
  
+    Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
+    Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS
+ 
+    Ubuntu-lts-* <--- ALL FAIL
+    Ubuntu-4.4.0-* <--- ALL FAIL
  
  We have narrowed down the window to be:
  
  4.11.0-1.6 (custom build) <--- WORKS
  4.10.0-43.47~16.04.1   <-- FAILS
  
  Also works:
  
https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb
  
- 
  Symptoms
  -
  System cannot find the root disk and drops into
  an initramfs shell:
  
  mdadm script local-block "CREATE group disk not found"
  "Gave up waiting for root device. Common problems:
-  - Boot args (cat /proc/cmdline)
-- Check rootdelay=...
-- Check root= ...
-  - Missing modules (cat /proc/modules; ls/dev)
+  - Boot args (cat /proc/cmdline)
+    - Check rootdelay=...
+    - Check root= ...
+  - Missing modules (cat /proc/modules; ls/dev)
  ALERT! UUID=... does not exist. Dropping to a shell!
  
  ...
  (initramfs)_
  
- There does not appear to be any workaround so far. 
- The disks are encrypted SSDs. 
+ There does not appear to be any workaround so far.
+ The disks are encrypted SSDs.
  
- Attaching commit list between the last known 
- failing Artful kernel and earliest known 
+ Attaching commit list between the last known
+ failing Artful kernel and earliest known
  working kernel (adjacent tags) and other info.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
  FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
  device not found

Status in linux package in Ubuntu:
  New

Bug description:
  [IMPACT]

  Booting of the Xenial-based FIPS kernel packages
  failed with disk not found errors on amd64.
  This was also observed on standard Ubuntu
  kernels prior to 4.11.0.

  FIPS
  --
  1. linux-image-4.4.0-1002-fips <-- FAIL
  2. linux-image-4.4.0-1006-fips <-- FAIL

  UBUNTU
  

  1. Bionic kernels all WORK

  2. Artful kernels:

     Ubuntu-4.11.0-1.6 <-- WORKS
     Ubuntu-4.10.0-26.30 <-- FAILS

  3. Xenial kernels:

     Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
     Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

     Ubuntu-lts-* <--- ALL FAIL
     Ubuntu-4.4.0-* <--- ALL FAIL

  We have narrowed down the window to be:

  4.11.0-1.6 (custom build) <--- WORKS
  4.10.0-43.47~16.04.1   <-- FAILS

  Also works:
  
https://kernel.ubuntu.com/~kernel-ppa/mainline

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-19 Thread Nivedita Singhvi

** Attachment added: "Commit list for the artful window where fix went in"

https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+attachment/5223616/+files/Ubuntu-4.10.0-26.30---Ubuntu-4.11.0-0.5---commitlist

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
device not found

Status in linux package in Ubuntu:
New

Bug description:
[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
--
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU

1. Bionic kernels all WORK

2. Artful kernels:

Ubuntu-4.11.0-1.6 <-- WORKS
Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

Ubuntu-lts-* <--- ALL FAIL
Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1 <-- FAILS

Also works:

https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-
System cannot find the root disk and drops into
an initramfs shell:

mdadm script local-block "CREATE group disk not found"
"Gave up waiting for root device. Common problems:
- Boot args (cat /proc/cmdline)
- Check rootdelay=...
- Check root= ...
- Missing modules (cat /proc/modules; ls/dev)
ALERT! UUID=... does not exist. Dropping to a shell!

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

--
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-19 Thread Nivedita Singhvi

** Changed in: linux (Ubuntu)
   Importance: Undecided => High

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
  FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
  device not found

Status in linux package in Ubuntu:
  New

Bug description:
  [IMPACT]

  Booting of the Xenial-based FIPS kernel packages
  failed with disk not found errors on amd64.
  This was also observed on standard Ubuntu
  kernels prior to 4.11.0.

  FIPS
  --
  1. linux-image-4.4.0-1002-fips <-- FAIL
  2. linux-image-4.4.0-1006-fips <-- FAIL

  UBUNTU
  

  1. Bionic kernels all WORK

  2. Artful kernels:

     Ubuntu-4.11.0-1.6 <-- WORKS
     Ubuntu-4.10.0-26.30 <-- FAILS

  3. Xenial kernels:

     Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
     Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

     Ubuntu-lts-* <--- ALL FAIL
     Ubuntu-4.4.0-* <--- ALL FAIL

  We have narrowed down the window to be:

  4.11.0-1.6 (custom build) <--- WORKS
  4.10.0-43.47~16.04.1   <-- FAILS

  Also works:
  
https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

  Symptoms
  -
  System cannot find the root disk and drops into
  an initramfs shell:

  mdadm script local-block "CREATE group disk not found"
  "Gave up waiting for root device. Common problems:
   - Boot args (cat /proc/cmdline)
     - Check rootdelay=...
     - Check root= ...
   - Missing modules (cat /proc/modules; ls/dev)
  ALERT! UUID=... does not exist. Dropping to a shell!

  ...
  (initramfs)_

  There does not appear to be any workaround so far.
  The disks are encrypted SSDs.

  Attaching commit list between the last known
  failing Artful kernel and earliest known
  working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-19 Thread Nivedita Singhvi

** Changed in: linux (Ubuntu)
   Status: Incomplete => Confirmed

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
  FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
  device not found

Status in linux package in Ubuntu:
  Confirmed

Bug description:
  [IMPACT]

  Booting of the Xenial-based FIPS kernel packages
  failed with disk not found errors on amd64.
  This was also observed on standard Ubuntu
  kernels prior to 4.11.0.

  FIPS
  --
  1. linux-image-4.4.0-1002-fips <-- FAIL
  2. linux-image-4.4.0-1006-fips <-- FAIL

  UBUNTU
  

  1. Bionic kernels all WORK

  2. Artful kernels:

     Ubuntu-4.11.0-1.6 <-- WORKS
     Ubuntu-4.10.0-26.30 <-- FAILS

  3. Xenial kernels:

     Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
     Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

     Ubuntu-lts-* <--- ALL FAIL
     Ubuntu-4.4.0-* <--- ALL FAIL

  We have narrowed down the window to be:

  4.11.0-1.6 (custom build) <--- WORKS
  4.10.0-43.47~16.04.1   <-- FAILS

  Also works:
  
https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

  Symptoms
  -
  System cannot find the root disk and drops into
  an initramfs shell:

  mdadm script local-block "CREATE group disk not found"
  "Gave up waiting for root device. Common problems:
   - Boot args (cat /proc/cmdline)
     - Check rootdelay=...
     - Check root= ...
   - Missing modules (cat /proc/modules; ls/dev)
  ALERT! UUID=... does not exist. Dropping to a shell!

  ...
  (initramfs)_

  There does not appear to be any workaround so far.
  The disks are encrypted SSDs.

  Attaching commit list between the last known
  failing Artful kernel and earliest known
  working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-19 Thread Nivedita Singhvi

Thanks, Joe. I'll update this bug as soon as I get the results
from the reporter.

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
device not found

Status in linux package in Ubuntu:
Triaged
Status in linux source package in Xenial:
Triaged

Bug description:
[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
--
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU

1. Bionic kernels all WORK

2. Artful kernels:

Ubuntu-4.11.0-1.6 <-- WORKS
Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

Ubuntu-lts-* <--- ALL FAIL
Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1 <-- FAILS

Also works:

https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-
System cannot find the root disk and drops into
an initramfs shell:

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-19 Thread Nivedita Singhvi

The disk in question is a PERC_H740P_Adp.

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
device not found

Status in linux package in Ubuntu:
Triaged
Status in linux source package in Xenial:
Triaged

Bug description:
[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
--
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU

1. Bionic kernels all WORK

2. Artful kernels:

Ubuntu-4.11.0-1.6 <-- WORKS
Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

Ubuntu-lts-* <--- ALL FAIL
Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1 <-- FAILS

Also works:

https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-
System cannot find the root disk and drops into
an initramfs shell:

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-20 Thread Nivedita Singhvi

v4.10 Final < FAILS
v4.11-rc1 < WORKS

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
device not found

Status in linux package in Ubuntu:
Triaged
Status in linux source package in Xenial:
Triaged

Bug description:
[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
--
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU

1. Bionic kernels all WORK

2. Artful kernels:

Ubuntu-4.11.0-1.6 <-- WORKS
Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

Ubuntu-lts-* <--- ALL FAIL
Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1 <-- FAILS

Also works:

https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-
System cannot find the root disk and drops into
an initramfs shell:

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-20 Thread Nivedita Singhvi

There are some similar issues out there:

https://feeding.cloud.geek.nz/posts/recovering-from-unbootable-ubuntu-
encrypted-lvm-root-partition/

https://bugs.launchpad.net/ubuntu/+source/xubuntu-meta/+bug/1801629

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
device not found

Status in linux package in Ubuntu:
Triaged
Status in linux source package in Xenial:
Triaged

Bug description:
[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
--
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU

1. Bionic kernels all WORK

2. Artful kernels:

Ubuntu-4.11.0-1.6 <-- WORKS
Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

Ubuntu-lts-* <--- ALL FAIL
Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1 <-- FAILS

Also works:

https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-
System cannot find the root disk and drops into
an initramfs shell:

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

[Kernel-packages] [Bug 1809168] Re: FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root device not found

2018-12-20 Thread Nivedita Singhvi

One check to see if the above is the issue:

1. dpkg -l | grep crypt
2. dpkg -l | grep lvm

If lvm2 is not installed, for instance, it should be possible to
do the following to fix the problem:

1. # apt install lvm2
2. # update-initramfs -c -k all

--
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux in Ubuntu.
https://bugs.launchpad.net/bugs/1809168

Title:
FIPS and Ubuntu standard kernels prior to 4.11.0 won't boot; root
device not found

Status in linux package in Ubuntu:
Triaged
Status in linux source package in Xenial:
Triaged

Bug description:
[IMPACT]

Booting of the Xenial-based FIPS kernel packages
failed with disk not found errors on amd64.
This was also observed on standard Ubuntu
kernels prior to 4.11.0.

FIPS
--
1. linux-image-4.4.0-1002-fips <-- FAIL
2. linux-image-4.4.0-1006-fips <-- FAIL

UBUNTU

1. Bionic kernels all WORK

2. Artful kernels:

Ubuntu-4.11.0-1.6 <-- WORKS
Ubuntu-4.10.0-26.30 <-- FAILS

3. Xenial kernels:

Ubuntu-hwe-4.11.0-12.17_16.04.1 <--- WORKS
Ubuntu-hwe-4.10.0-43.47_16.04.1 <--- FAILS

Ubuntu-lts-* <--- ALL FAIL
Ubuntu-4.4.0-* <--- ALL FAIL

We have narrowed down the window to be:

4.11.0-1.6 (custom build) <--- WORKS
4.10.0-43.47~16.04.1 <-- FAILS

Also works:

https://kernel.ubuntu.com/~kernel-ppa/mainline/v4.11/linux-image-4.11.0-041100-generic_4.11.0-041100.201705041534_amd64.deb

Symptoms
-
System cannot find the root disk and drops into
an initramfs shell:

...
(initramfs)_

There does not appear to be any workaround so far.
The disks are encrypted SSDs.

Attaching commit list between the last known
failing Artful kernel and earliest known
working kernel (adjacent tags) and other info.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1809168/+subscriptions

1 2 >

1 - 100 of 120 matches

Mail list logo