Public bug reported:

Support for ConnectX hardware through doca-ofed-26.01 DKMS in Ubuntu
Resolute

[ SRU Justification ]

This is an effort to repackage the DKMS portion of the doca-ofed upstream 
package distributed by NVIDIA through a public HTTP source (e.g. 
https://linux.mellanox.com/public/repo/doca/).
NVIDIA distributes this as a set of deb packages in a tarball at the following 
address: https://developer.nvidia.com/doca-downloads; this package is for the 
3.3.0 version of the upstream NVIDIA package.

Current Ubuntu packaging lacks support for NVIDIA Bluefield/IGX/DGX
platform: this is needed as customers using NVIDIA hardware can take
advantage of custom, high performance drivers directly through Canonical
apt system, without the need of downloading an external tarball to be
manually installed on the target machines.

This helps as well any automated installation flow as, by just
specifying the need for this package as an apt dependency, avoids the
automation issue of finding the HTTP address of the latest version of
the doca-ofed package distributed by NVIDIA and the relative scripting
to manually install them.

This package adds a set of kernel modules related to the NVIDIA mlnx5
hardware family (e.g. IB-capable ConnectX devices).

This package is related to the 
https://launchpad.net/ubuntu/+source/mofed-modules-24.10/ package already 
accepted last year. This is the newer LTS version distributed by NVIDIA.
There are no modification in repackaging, apart from a more coherent name with 
NVIDIA naming scheme, has been applied to the repackaging.
Another difference from the 24.10 version, is a smaller list of .ko files that 
will be built on the target machine, following what invoking the doca-ofed 
target on the original package installs.

[ Current Test Plan ]

 * Hardware tested on:
     * Noble GA kernel kernel 7.0.0.7 - Equivalent to Kernel 7.0rc4
         - Tested on DGX B200 (amd64) and IGX GH200 (arm64)

 * Before uploading to Ubuntu an update version of the repackaged kernel
modules, a local test to the affected kernels must be executed. At the
moment, the main target kernel is the noble:linux-nvidia-tegra one, so
the tests will be thoroughly executed on the specific hardware running
said kernel. If the modules load correctly, the package is considered
sane and can be uploaded to the wider public. The compilation test is
executed as well on the latest version of Resolute.

* Another test suite run on each new spin of the module, is a smoke test 
running on DGX B200 (amd64) and GH200 (arm64) machines; those machines contains 
the hardware this set of kernel modules target.
The suite is able to test for the produced modules to be able to run IB 
interactions, fully load all the built modules and interactions between 
kernel-space modules/user-space applications.
A full test run can be found on the ARGOS-2019 Jira ticket.
Test-plan can be found at 
https://github.com/canonical/mofed-userspace-integration/tree/main/testcases
Test machines are freely available in Testflinger.

* The produced modules has been tested by an Nvidia team that has given
us the green light for the release.

[ Where problems could occur ]

 * As the set of drivers are replacing some in-tree modules with Nvidia
modified ones, Ubuntu users installing this package can see their non-
NVIDIA Bluefield Infiniband devices stop working or regressing in
performance.

[ Other Info ]

 * The repackaging code is contained in the repository at the following
address: https://kernel.ubuntu.com/forgejo/alessiofaina/nvidia-doca-
ofed-dkms

 * Examples of the repackaged DKMS can be found at the following
personal PPA: https://launchpad.net/~alessiofaina/+archive/ubuntu/mofed-
autoupload-test/+packages

* All the modules as GPL-2 or Dual BSD/GPL licensed, no closed source
binaries are redistributed.

** Affects: linux (Ubuntu)
     Importance: Critical
     Assignee: Alessio Faina (alessiofaina)
         Status: In Progress

** Affects: linux (Ubuntu Resolute)
     Importance: Critical
     Assignee: Alessio Faina (alessiofaina)
         Status: In Progress

** Also affects: linux (Ubuntu Resolute)
   Importance: Undecided
       Status: New

** Changed in: linux (Ubuntu Resolute)
   Importance: Undecided => Medium

** Changed in: linux (Ubuntu Resolute)
   Importance: Medium => Critical

** Changed in: linux (Ubuntu Resolute)
       Status: New => In Progress

** Changed in: linux (Ubuntu Resolute)
     Assignee: (unassigned) => Alessio Faina (alessiofaina)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2144874

Title:
   [NEEDS PACKAGING] [mlnx5 hardware not supported] NVIDIA doca-
  ofed-26.01-dkms DKMS suite Edit

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/linux/+bug/2144874/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to