** Description changed:

  [ Impact ]
  
  During the keynote presentation at Microsoft Ignite, 2024, Microsoft
  announced the private preview of a new series of AI-oriented Azure
  virtual machines based on the NVIDIA GB200 superchip:
  
https://techcommunity.microsoft.com/blog/azurehighperformancecomputingblog/microsoft-
  adopts-nvidia-blackwell-to-power-the-next-frontier-of-ai-
  supercomputin/4303541 . As part of Microsoft’s and Canonical’s joint
  effort to ensure that the latest Ubuntu LTS release supports the most
  salient features of this new virtual machine series, Canonical would
  like to back-port all post-24.04 changes to the Microsoft Azure Network
  Adapter (MANA) provider in rdma-core from ubuntu/devel to ubuntu/noble.
  
  [ Test Plan ]
  
  # Regression tests
  
  See https://github.com/linux-rdma/rdma-
  core/blob/master/Documentation/testing.md.
  
  # MANA (new functionality)
  
  ```
  ## Server terminal
  
  ### Install Azure CLI: https://learn.microsoft.com/en-
  us/cli/azure/install-azure-cli-linux?pivots=apt
  
  az group create --name mana --location westeurope
  
  marketplace_offer=... # ubuntu-24_04-lts for Noble, ubuntu-24_10 for
  Oracular
  
  path_to_ssh_public_key=...
  
  az vm create --resource-group mana --name mana-virtual-machine --image
  canonical:${marketplace_offer}:server-arm64:latest --size
  Standard_D2ps_v6 --admin-username ubuntu --ssh-key-values
  $path_to_ssh_public_key --accelerated-networking true
  
  machine_ip_address=$(az network public-ip list --resource-group mana
  --query '[0].ipAddress' | tr -d '"')
  
  path_to_ssh_private_key=...
  
  ssh -i $path_to_ssh_private_key ubuntu@$machine_ip_address
  
  ### Enable proposed archive:
  https://wiki.ubuntu.com/Testing/EnableProposed
  
  ### rdma-core is the source of the rdmacm-utils package
  sudo apt-add-repository -y universe && sudo apt install -y linux-azure-nvidia 
rdma-core rdmacm-utils
  
  installed_kernel_version=$(uname -r)
  
  sudo apt remove -y linux-image-$installed_kernel_version linux-
  modules-$installed_kernel_version # DO NOT abort kernel removal when
  prompted
  
  sudo reboot
  
  ssh -i $path_to_ssh_private_key ubuntu@$machine_ip_address
  
  rping -s -C 10 -v
  
  ## Client terminal
  
  az network nic list --resource-group mana --query
  '[0].ipConfigurations[0].privateIPAddress' | tr -d '"' # Note for use in
  rping command below
  
  path_to_ssh_private_key=...
  
  machine_ip_address=$(az network public-ip list --resource-group mana
  --query '[0].ipAddress' | tr -d '"')
  
  ssh -i $path_to_ssh_private_key ubuntu@$machine_ip_address
  
  rping -c -a $machine_private_ip_address -C 10 -v # Use MANA server private IP 
address obtained above
  ```
  
  [ Where problems could occur ]
  
  A regression could adversely impact remote direct memory access (RDMA)
  through one or more non-MANA providers (kernel drivers), potentially
  even preventing RDMA altogether.
  
  [ Other Info ]
  
  The second point under
  https://documentation.ubuntu.com/sru/en/latest/reference/requirements/#other-
  safe-cases describes the changes that Canonical is seeking to back-port
  to ubuntu/noble. As such, the changes, while representing new features,
  appear to qualify for SRU.
  
- The patches associated with the SRU request only touch the following
- files, all of which are MANA-specific:
+ All but one of the patches associated with the SRU request only touch
+ the following files, all of which are MANA-specific:
  
  providers/mana/*
  kernel-headers/rdma/mana-abi.h
  
- Furthermore, the patches encapsulate all the changes to those files
- since the Ubuntu 24.04 release; there are no residual changes to those
- files that the patches do not cover.
+ The lone patch that touches other files (lp-2100089-mana-15.patch in the
+ case of Noble, lp-2100089-mana-14.patch in the case of Oracular; see
+ https://launchpad.net/~danpdraper/+archive/ubuntu/rdma-core-mana) adds a
+ driver parameter to the ibv_cmd_reg_dmabuf_mr function, replaces the
+ invocation of DECLARE_COMMAND_BUFFER in that function with an invocation
+ of DECLARE_COMMAND_BUFFER_LINK, and updates all invocations of the
+ ibv_cmd_reg_dmabuf_mr function to provide NULL as the driver. The
+ difference between the signature of the DECLARE_COMMAND_BUFFER_LINK
+ macro and the signature of the DECLARE_COMMAND_BUFFER macro is that the
+ former includes a driver parameter, and the implementation of the
+ DECLARE_COMMAND_BUFFER invokes the DECLARE_COMMAND_BUFFER_LINK macro
+ with NULL as the driver: https://git.launchpad.net/ubuntu/+source/rdma-
+ core/tree/libibverbs/cmd_ioctl.h?h=ubuntu/noble-devel#n189 and
+ https://git.launchpad.net/ubuntu/+source/rdma-
+ core/tree/libibverbs/cmd_ioctl.h?h=ubuntu/oracular-devel#n189. As such,
+ the effect of the non-MANA patch is simply to remove the
+ DECLARE_COMMAND_BUFFER macro from the ibv_cmd_reg_dmabuf_mr function’s
+ call stack.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2100089

Title:
  rdma-core in latest Ubuntu LTS does not support Microsoft Azure
  Network Adapter

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rdma-core/+bug/2100089/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to