[Kernel-packages] [Bug 1921769] Re: Backport mlx5e fix for tunnel offload

2021-04-19 Thread Naadir Jeewa
** Tags removed: verification-needed-focal
** Tags added: verification-done-focal

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1921769

Title:
  Backport mlx5e fix for tunnel offload

Status in linux package in Ubuntu:
  Fix Released
Status in linux-azure package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Invalid
Status in linux-azure source package in Bionic:
  Invalid
Status in linux source package in Focal:
  Fix Committed
Status in linux-azure source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux-azure source package in Groovy:
  Confirmed
Status in linux source package in Hirsute:
  Fix Released
Status in linux-azure source package in Hirsute:
  Confirmed

Bug description:
  [SRU Justification]

  We've discovered an issue on Ubuntu 20.04 when used with Kubernetes
  CNIs that perform offloading using Geneve that causes the kernel to
  panic on Azure instances with accelerated networking with the
  following errors:

  [ 307.561223] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d4, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.573864] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 307.764902] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d7, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.777332] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 322.814393] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x218, ci 
0x1a7, sqn 0x2bd, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 322.826685] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2bd

  NVIDIA fixed this issue in
  
https://github.com/torvalds/linux/commit/5ccc0ecda9e8a67add654d93d7e0ac4346c0fa22
  , so we're looking to have this backported to at least the linux-azure
  package.

  [Test Plan]
  Spin up a Kubernetes CNI that uses Geneve offloading

  [Where problems could occur]
  Its possible some traffic won't get geneve acceleration. This patch has been 
backported to v5.10.y and v5.11.y

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921769/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1921769] Re: Backport mlx5e fix for tunnel offload

2021-04-13 Thread Naadir Jeewa
Follow up comments, albeit a bit late.

The test PPA kernel did resolve the issue for us, and can confirm that
our 18.04 images were using the 5.4 kernel.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1921769

Title:
  Backport mlx5e fix for tunnel offload

Status in linux package in Ubuntu:
  Fix Released
Status in linux-azure package in Ubuntu:
  Confirmed
Status in linux source package in Bionic:
  Invalid
Status in linux-azure source package in Bionic:
  Invalid
Status in linux source package in Focal:
  Fix Committed
Status in linux-azure source package in Focal:
  In Progress
Status in linux source package in Groovy:
  In Progress
Status in linux-azure source package in Groovy:
  Confirmed
Status in linux source package in Hirsute:
  Fix Released
Status in linux-azure source package in Hirsute:
  Confirmed

Bug description:
  [SRU Justification]

  We've discovered an issue on Ubuntu 20.04 when used with Kubernetes
  CNIs that perform offloading using Geneve that causes the kernel to
  panic on Azure instances with accelerated networking with the
  following errors:

  [ 307.561223] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d4, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.573864] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 307.764902] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d7, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.777332] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 322.814393] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x218, ci 
0x1a7, sqn 0x2bd, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 322.826685] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2bd

  NVIDIA fixed this issue in
  
https://github.com/torvalds/linux/commit/5ccc0ecda9e8a67add654d93d7e0ac4346c0fa22
  , so we're looking to have this backported to at least the linux-azure
  package.

  [Test Plan]
  Spin up a Kubernetes CNI that uses Geneve offloading

  [Where problems could occur]
  Its possible some traffic won't get geneve acceleration. This patch has been 
backported to v5.10.y and v5.11.y

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1921769/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1921769] Re: Backport mlx5e fix for tunnel offload

2021-03-29 Thread Naadir Jeewa
Can confirm this is also the case on Ubuntu 18.04.

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1921769

Title:
  Backport mlx5e fix for tunnel offload

Status in linux-azure package in Ubuntu:
  New

Bug description:
  We've discovered an issue on Ubuntu 20.04 when used with Kubernetes
  CNIs that perform offloading using Geneve that causes the kernel to
  panic on Azure instances with accelerated networking with the
  following errors:

  [ 307.561223] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d4, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.573864] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 307.764902] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d7, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.777332] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 322.814393] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x218, ci 
0x1a7, sqn 0x2bd, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 322.826685] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2bd

  NVIDIA fixed this issue in
  
https://github.com/torvalds/linux/commit/5ccc0ecda9e8a67add654d93d7e0ac4346c0fa22
  , so we're looking to have this backported to at least the linux-azure
  package.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1921769/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp


[Kernel-packages] [Bug 1921769] [NEW] Backport mlx5e fix for tunnel offload

2021-03-29 Thread Naadir Jeewa
Public bug reported:

We've discovered an issue on Ubuntu 20.04 when used with Kubernetes CNIs
that perform offloading using Geneve that causes the kernel to panic on
Azure instances with accelerated networking with the following errors:

[ 307.561223] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 0x3d4, 
sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
[ 307.573864] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
[ 307.764902] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 0x3d7, 
sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
[ 307.777332] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
[ 322.814393] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x218, ci 0x1a7, 
sqn 0x2bd, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
[ 322.826685] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2bd

NVIDIA fixed this issue in
https://github.com/torvalds/linux/commit/5ccc0ecda9e8a67add654d93d7e0ac4346c0fa22
, so we're looking to have this backported to at least the linux-azure
package.

** Affects: linux-azure (Ubuntu)
 Importance: Undecided
 Status: New

-- 
You received this bug notification because you are a member of Kernel
Packages, which is subscribed to linux-azure in Ubuntu.
https://bugs.launchpad.net/bugs/1921769

Title:
  Backport mlx5e fix for tunnel offload

Status in linux-azure package in Ubuntu:
  New

Bug description:
  We've discovered an issue on Ubuntu 20.04 when used with Kubernetes
  CNIs that perform offloading using Geneve that causes the kernel to
  panic on Azure instances with accelerated networking with the
  following errors:

  [ 307.561223] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d4, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.573864] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 307.764902] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x200, ci 
0x3d7, sqn 0x2c5, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 307.777332] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2c5
  [ 322.814393] mlx5_core 0001:00:02.0 enP1s1: Error cqe on cqn 0x218, ci 
0x1a7, sqn 0x2bd, opcode 0xd, syndrome 0x2, vendor syndrome 0x68
  [ 322.826685] mlx5_core 0001:00:02.0 enP1s1: ERR CQE on SQ: 0x2bd

  NVIDIA fixed this issue in
  
https://github.com/torvalds/linux/commit/5ccc0ecda9e8a67add654d93d7e0ac4346c0fa22
  , so we're looking to have this backported to at least the linux-azure
  package.

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux-azure/+bug/1921769/+subscriptions

-- 
Mailing list: https://launchpad.net/~kernel-packages
Post to : kernel-packages@lists.launchpad.net
Unsubscribe : https://launchpad.net/~kernel-packages
More help   : https://help.launchpad.net/ListHelp