** Description changed: [ Impact ] During the keynote presentation at Microsoft Ignite, 2024, Microsoft announced the private preview of a new series of AI-oriented Azure virtual machines based on the NVIDIA GB200 superchip: https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/microsoft- adopts-nvidia-blackwell-to-power-the-next-frontier-of-ai- supercomputin/4303541 . As part of Microsoft’s and Canonical’s joint effort to ensure that the latest Ubuntu LTS release supports the most salient features of this new virtual machine series, Canonical would like to back-port all post-24.04 changes to the Microsoft Azure Network Adapter (MANA) provider in rdma-core from ubuntu/devel to ubuntu/noble. [ Test Plan ] # Regression tests See https://github.com/linux-rdma/rdma- core/blob/master/Documentation/testing.md. # MANA (new functionality) ``` ## Server terminal ### Install Azure CLI: https://learn.microsoft.com/en- us/cli/azure/install-azure-cli-linux?pivots=apt az group create --name mana --location westeurope marketplace_offer=... # ubuntu-24_04-lts for Noble, ubuntu-24_10 for Oracular path_to_ssh_public_key=... az vm create --resource-group mana --name mana-virtual-machine --image canonical:${marketplace_offer}:server-arm64:latest --size Standard_D2ps_v6 --admin-username ubuntu --ssh-key-values $path_to_ssh_public_key --accelerated-networking true machine_ip_address=$(az network public-ip list --resource-group mana --query '[0].ipAddress' | tr -d '"') path_to_ssh_private_key=... ssh -i $path_to_ssh_private_key ubuntu@$machine_ip_address ### Enable proposed archive: https://wiki.ubuntu.com/Testing/EnableProposed ### rdma-core is the source of the rdmacm-utils package sudo apt-add-repository -y universe && sudo apt install -y linux-azure-nvidia rdma-core rdmacm-utils installed_kernel_version=$(uname -r) sudo apt remove -y linux-image-$installed_kernel_version linux- modules-$installed_kernel_version # DO NOT abort kernel removal when prompted sudo reboot ssh -i $path_to_ssh_private_key ubuntu@$machine_ip_address rping -s -C 10 -v ## Client terminal az network nic list --resource-group mana --query '[0].ipConfigurations[0].privateIPAddress' | tr -d '"' # Note for use in rping command below path_to_ssh_private_key=... machine_ip_address=$(az network public-ip list --resource-group mana --query '[0].ipAddress' | tr -d '"') ssh -i $path_to_ssh_private_key ubuntu@$machine_ip_address rping -c -a $machine_private_ip_address -C 10 -v # Use MANA server private IP address obtained above ``` [ Where problems could occur ] A regression could adversely impact remote direct memory access (RDMA) through one or more non-MANA providers (kernel drivers), potentially even preventing RDMA altogether. [ Other Info ] The second point under https://documentation.ubuntu.com/sru/en/latest/reference/requirements/#other- safe-cases describes the changes that Canonical is seeking to back-port to ubuntu/noble. As such, the changes, while representing new features, appear to qualify for SRU. - The patches associated with the SRU request only touch the following - files, all of which are MANA-specific: + All but one of the patches associated with the SRU request only touch + the following files, all of which are MANA-specific: providers/mana/* kernel-headers/rdma/mana-abi.h - Furthermore, the patches encapsulate all the changes to those files - since the Ubuntu 24.04 release; there are no residual changes to those - files that the patches do not cover. + The lone patch that touches other files (lp-2100089-mana-15.patch in the + case of Noble, lp-2100089-mana-14.patch in the case of Oracular; see + https://launchpad.net/~danpdraper/+archive/ubuntu/rdma-core-mana) adds a + driver parameter to the ibv_cmd_reg_dmabuf_mr function, replaces the + invocation of DECLARE_COMMAND_BUFFER in that function with an invocation + of DECLARE_COMMAND_BUFFER_LINK, and updates all invocations of the + ibv_cmd_reg_dmabuf_mr function to provide NULL as the driver. The + difference between the signature of the DECLARE_COMMAND_BUFFER_LINK + macro and the signature of the DECLARE_COMMAND_BUFFER macro is that the + former includes a driver parameter, and the implementation of the + DECLARE_COMMAND_BUFFER invokes the DECLARE_COMMAND_BUFFER_LINK macro + with NULL as the driver: https://git.launchpad.net/ubuntu/+source/rdma- + core/tree/libibverbs/cmd_ioctl.h?h=ubuntu/noble-devel#n189 and + https://git.launchpad.net/ubuntu/+source/rdma- + core/tree/libibverbs/cmd_ioctl.h?h=ubuntu/oracular-devel#n189. As such, + the effect of the non-MANA patch is simply to remove the + DECLARE_COMMAND_BUFFER macro from the ibv_cmd_reg_dmabuf_mr function’s + call stack.
-- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2100089 Title: rdma-core in latest Ubuntu LTS does not support Microsoft Azure Network Adapter To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rdma-core/+bug/2100089/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
