Public bug reported:

[Impact]

Resolute (26.04 LTS) currently ships rocm-llvm 7.1.1+dfsg-0ubuntu1, which
predates the ROCm 7.2.x point releases. Users on resolute therefore lack:

 * device-libs and comgr support for hardware enabled in ROCm 7.2.x
   (notably the GFX12.5 cluster intrinsics and other newer AMDGPU
   subtarget features), which means downstream ROCm components built
   against this rocm-llvm cannot generate or load code for these
   targets.
 * Upstream bug fixes and toolchain hardening accumulated in ROCm
   7.2.0 -> 7.2.3 across comgr, device-libs and hipcc.
 * Compatibility with the LLVM 22 toolchain that the rest of the ROCm
   7.2.x stack expects.

The SRU updates rocm-llvm to 7.2.3+dfsg-0ubuntu1, the latest 7.2.x
point release. The mechanism of the fix is a new upstream release: the
package is rebased to upstream ROCm 7.2.3, the build is moved onto the
LLVM 22 toolchain (clang-22 / libclang-cpp22-dev / libclang-rt-22-dev),
the binary rocm-device-libs-21 is renamed to rocm-device-libs-22 to
match, and four delta patches that were either applied upstream or that
targeted LLVM 21 are dropped. A single new patch
(llvm22-options-header-rename.patch) adapts comgr to the LLVM 22
clang/Options/ split.

This update is a prerequisite for the rest of the ROCm 7.2.x stack
(rocr-runtime, rocblas, rccl, rocthrust, rocwmma, amdsmi, ...) to be
SRUed into resolute; without it those packages either fail to build or
fail at runtime against the older rocm-llvm 7.1.1.

[Test Plan]

The package builds on amd64, arm64 and ppc64el. The verification steps
below assume an amd64 host with `-proposed` enabled.

  1. Enable -proposed for resolute and refresh:
       sudo sed -i 's/^# *deb /deb /' /etc/apt/sources.list.d/ubuntu.sources \
         || true
       printf 'Types: deb\nURIs: http://archive.ubuntu.com/ubuntu\n'\
'Suites: resolute-proposed\nComponents: main universe\n'\
'Signed-By: /usr/share/keyrings/ubuntu-archive-keyring.gpg\n' \
         | sudo tee /etc/apt/sources.list.d/ubuntu-proposed.sources
       sudo apt update

  2. Confirm the candidate version is the SRU:
       apt-cache policy hipcc libamd-comgr3 libamd-comgr-dev \
                        rocm-device-libs-22
       # Expect 7.2.3+dfsg-0ubuntu1 as the -proposed candidate.

  3. Install the SRU candidate and the matching toolchain:
       sudo apt install -t resolute-proposed \
         hipcc libamd-comgr3 libamd-comgr-dev rocm-device-libs-22 \
         clang-22 libclang-cpp22-dev libclang-rt-22-dev

  4. Sanity-check the comgr ABI is intact (libamd-comgr3 SONAME):
       dpkg -L libamd-comgr3 | grep -E 'libamd_comgr\.so'
       readelf -d /usr/lib/*/libamd_comgr.so.3 | grep SONAME
       # Expect: SONAME = libamd_comgr.so.3

  5. Compile a trivial HIP program against hipcc to exercise comgr and
     the device-libs end-to-end (no GPU required for `--cuda-device-only
     -emit-llvm` style checks):

       cat > /tmp/hip_smoke.hip <<'EOF'
       #include <hip/hip_runtime.h>
       __global__ void k(int *p) { p[threadIdx.x] = threadIdx.x; }
       int main() {
         int *p = nullptr;
         hipMalloc(&p, 64*sizeof(int));
         hipLaunchKernelGGL(k, dim3(1), dim3(64), 0, 0, p);
         hipDeviceSynchronize();
         return 0;
       }
       EOF
       hipcc --offload-arch=gfx1100 -c -o /tmp/hip_smoke.o /tmp/hip_smoke.hip
       # Repeat with --offload-arch=gfx1250 to exercise the new GFX12.5
       # path that motivated the LLVM 22 bump. Expect a clean compile.

  7. (Optional, with a supported GPU) Run the rocr-runtime / rocminfo
     smoke test from -proposed to confirm runtime loading still works
     against the rebuilt libamd-comgr3.

  8. Run the package's own autopkgtests:
       autopkgtest -U rocm-llvm=7.2.3+dfsg-0ubuntu1 -- lxd ubuntu:resolute

The update is considered verified when steps 4, 5, 6 and 8 all pass on
the -proposed candidate and the equivalent steps still pass on the
release pocket after the SRU is published.

[Where problems could occur]

The change is a new upstream version, not a targeted patch, so the
blast radius is the union of (a) the upstream 7.1.1 -> 7.2.3 delta in
comgr / device-libs / hipcc and (b) the toolchain move from LLVM 21 to
LLVM 22. Concretely, problems could appear in:

 * libamd-comgr3: the comgr action API is consumed at runtime by every
   ROCm component that JIT-compiles or relocates GPU code (rocr-runtime,
   HIP, OpenCL, rocBLAS tuning, etc.). The SONAME stays at
   libamd_comgr.so.3, but any silent behavioural change in
   action-codegen / action-link / metadata parsing would surface as
   runtime failures in those consumers. The new
   llvm22-options-header-rename.patch reaches into clang's driver
   internals (GetResourcesPath / clang/Options/Options.h); if the LLVM
   22 packaging in resolute differs subtly from what the patch assumes,
   comgr could mis-resolve the clang resource directory and fail to
   find builtin headers or device-libs at runtime.

 * rocm-device-libs-22 (renamed from rocm-device-libs-21): consumers
   that hard-coded a dependency on rocm-device-libs-21 will not pull
   the new binary automatically. The rest of the ROCm 7.2.x stack
   needs to be rebuilt against the renamed package; until those
   rebuilds land in -proposed alongside this SRU, mixed installs could
   end up with neither -21 nor -22 satisfied.

 * hipcc: changes to the driver wrapper or to default include / link
   paths can break out-of-tree HIP builds in non-obvious ways
   (e.g. picking up the wrong libclang_rt, or losing a default
   --offload-arch). Regressions here usually present as link-time
   "undefined reference" errors or as runtime "no kernel image
   available" errors on previously-working GPUs.

 * Architecture coverage: the build is exercised on amd64, arm64 and
   ppc64el. arm64 and ppc64el have historically been the long pole for
   ROCm rebuilds; a successful build there does not guarantee that
   downstream rebuilds against this rocm-llvm will succeed on the same
   architectures.

Mitigations: the SRU is gated behind successful builds on all three
release architectures, the autopkgtests above, and verification of the
end-to-end hipcc -> device-libs -> comgr path on a real GPU. The
upstream 7.2.3 tag is the same source the rest of the ROCm 7.2.x SRU
train is built against, so any divergence between this package and its
consumers is bounded by the packaging delta, which is small and
reviewed (4 patches, see debian/patches/series).

** Affects: rocm-llvm (Ubuntu)
     Importance: Undecided
     Assignee: Talha Can Havadar (tchavadar)
         Status: New

** Affects: rocm-llvm (Ubuntu Resolute)
     Importance: Undecided
         Status: New

** Summary changed:

- [SRU] rocm-llvm 7.2.3+dfsg-0ubuntu1 for resolute
+ [SRU] rocm-llvm 7.2.3

** Also affects: rocm-llvm (Ubuntu Resolute)
   Importance: Undecided
       Status: New

** Changed in: rocm-llvm (Ubuntu)
     Assignee: (unassigned) => Talha Can Havadar (tchavadar)

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2153424

Title:
  [SRU] rocm-llvm 7.2.3

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rocm-llvm/+bug/2153424/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to