Public bug reported:

[Impact]

The rocr-runtime package provides the HSA Runtime API and ROCm core
runtime used by GPU compute workloads. The 7.1.1 version currently
shipped is missing several upstream fixes available in ROCm 7.2.3:

1. Trap handler regression on gfx120x (RDNA 4) GPUs - the upstream
   fix restores exception-handling behavior that regressed in 7.1.x.
2. Potential buffer overflow in rocr - defensive bounds fix.
3. Uninitialized Prefetch range - the Prefetch range was not
   initialized in its constructor, leading to non-deterministic
   behavior.
4. Uninitialized IPC import results - hsaKmtHandleImport could be
   invoked with a non-zeroed result struct, potentially returning
   stale data on partial failures.
5. mwaitx enabled by default - improves host-side wait performance
   on AMD CPUs.
6. libhsakmt: correct printf format specifiers in fmm.c for unsigned
   integer types, eliminating undefined behavior in log output.

Packaging changes in this upload:
- d/watch updated to point at the new upstream rocm-systems mono-repo.
- d/salsa-ci.yml and d/ci/ updated for Ubuntu.
- d/p/0017-enable-building-rocrtst-as-part-of-rocr.patch refreshed for
  the new upstream CMakeLists.
- d/not-installed: hsa-rocr/LICENSE.md added (now ships in the upstream
  tarball under hsa-rocr/).
- d/control: drop redundant Rules-Requires-Root: no (lintian).

The public libhsa-runtime64 ABI is unchanged in this upload; the
symbols file is enforced with DPKG_GENSYMBOLS_CHECK_LEVEL=4.

[Test Plan]

1. Build the package in a clean chroot for all supported
   architectures (amd64, arm64, ppc64el).
2. Install from -proposed:
     apt install libhsa-runtime64-1 libhsa-runtime-dev
3. Run autopkgtests:
   - libhsa-runtime64-tests (rocrtst suite) on amd64
   - run-tests binary
4. Confirm dpkg-gensymbols reports no ABI regressions
   (DPKG_GENSYMBOLS_CHECK_LEVEL=4 enforced via d/rules).
5. Smoke-test reverse dependencies that link against
   libhsa-runtime64-1 (e.g. hipcc, rocBLAS) to confirm they still
   load and run a trivial kernel.

[Where Problems Could Occur]

- Trap handler change for gfx120x touches GPU exception handling.
  A subtle defect could surface as application crashes or hangs on
  RDNA 4 hardware. Mitigation: rocrtst + autopkgtest coverage. And 
https://canonical.github.io/rocm-qa/ application tests
- mwaitx-by-default changes host-side waiting. On systems with
  quirky CPU mwaitx implementations this could regress latency or,
  worst case, hang a worker thread. Mitigation: behavior is gated
  by an existing runtime knob and can be disabled.
- Patch 0017 refresh: upstream restructured rocrtst CMakeLists.
  A missed hunk could break the test package (libhsa-runtime64-tests)
  build or ship stale binaries. Mitigation: autopkgtest builds and
  exercises the tests.
- Upstream moved to the rocm-systems mono-repo. Tarball layout
  shifted (hence the d/not-installed addition). Other path changes
  could leave runtime files unshipped. Mitigation: lintian and the
  .install files; rocrtst exercises the runtime paths.
- Prefetch-init and zero-on-import fixes change control flow in
  hot paths. The most likely failure mode is a latent caller bug
  exposed by the now-correct initial state. Mitigation: rocrtst
  regression suite.
- Buffer overflow fix introduces a bounds check that could in
  theory reject previously-tolerated inputs. Mitigation: same as
  above.

** Affects: rocr-runtime (Ubuntu)
     Importance: Undecided
     Assignee: Talha Can Havadar (tchavadar)
         Status: New

** Affects: rocr-runtime (Ubuntu Resolute)
     Importance: Undecided
         Status: New

** Description changed:

  [Impact]
  
  The rocr-runtime package provides the HSA Runtime API and ROCm core
  runtime used by GPU compute workloads. The 7.1.1 version currently
  shipped is missing several upstream fixes available in ROCm 7.2.3:
  
  1. Trap handler regression on gfx120x (RDNA 4) GPUs - the upstream
-    fix restores exception-handling behavior that regressed in 7.1.x.
+    fix restores exception-handling behavior that regressed in 7.1.x.
  2. Potential buffer overflow in rocr - defensive bounds fix.
  3. Uninitialized Prefetch range - the Prefetch range was not
-    initialized in its constructor, leading to non-deterministic
-    behavior.
+    initialized in its constructor, leading to non-deterministic
+    behavior.
  4. Uninitialized IPC import results - hsaKmtHandleImport could be
-    invoked with a non-zeroed result struct, potentially returning
-    stale data on partial failures.
+    invoked with a non-zeroed result struct, potentially returning
+    stale data on partial failures.
  5. mwaitx enabled by default - improves host-side wait performance
-    on AMD CPUs.
+    on AMD CPUs.
  6. libhsakmt: correct printf format specifiers in fmm.c for unsigned
-    integer types, eliminating undefined behavior in log output.
+    integer types, eliminating undefined behavior in log output.
  
  Packaging changes in this upload:
  - d/watch updated to point at the new upstream rocm-systems mono-repo.
  - d/salsa-ci.yml and d/ci/ updated for Ubuntu.
  - d/p/0017-enable-building-rocrtst-as-part-of-rocr.patch refreshed for
-   the new upstream CMakeLists.
+   the new upstream CMakeLists.
  - d/not-installed: hsa-rocr/LICENSE.md added (now ships in the upstream
-   tarball under hsa-rocr/).
+   tarball under hsa-rocr/).
  - d/control: drop redundant Rules-Requires-Root: no (lintian).
  
  The public libhsa-runtime64 ABI is unchanged in this upload; the
  symbols file is enforced with DPKG_GENSYMBOLS_CHECK_LEVEL=4.
  
  [Test Plan]
  
  1. Build the package in a clean chroot for all supported
-    architectures (amd64, arm64, ppc64el).
+    architectures (amd64, arm64, ppc64el).
  2. Install from -proposed:
-      apt install libhsa-runtime64-1 libhsa-runtime-dev
+      apt install libhsa-runtime64-1 libhsa-runtime-dev
  3. Run autopkgtests:
-    - libhsa-runtime64-tests (rocrtst suite) on amd64
-    - run-tests binary
+    - libhsa-runtime64-tests (rocrtst suite) on amd64
+    - run-tests binary
  4. Confirm dpkg-gensymbols reports no ABI regressions
-    (DPKG_GENSYMBOLS_CHECK_LEVEL=4 enforced via d/rules).
+    (DPKG_GENSYMBOLS_CHECK_LEVEL=4 enforced via d/rules).
  5. Smoke-test reverse dependencies that link against
-    libhsa-runtime64-1 (e.g. hipcc, rocBLAS) to confirm they still
-    load and run a trivial kernel.
- 6. On gfx120x (RDNA 4) hardware, exercise a workload that previously
-    hit the trap-handler regression and confirm it is resolved.
+    libhsa-runtime64-1 (e.g. hipcc, rocBLAS) to confirm they still
+    load and run a trivial kernel.
  
  [Where Problems Could Occur]
  
  - Trap handler change for gfx120x touches GPU exception handling.
-   A subtle defect could surface as application crashes or hangs on
-   RDNA 4 hardware. Mitigation: rocrtst + autopkgtest coverage.
+   A subtle defect could surface as application crashes or hangs on
+   RDNA 4 hardware. Mitigation: rocrtst + autopkgtest coverage. And 
https://canonical.github.io/rocm-qa/ application tests
  - mwaitx-by-default changes host-side waiting. On systems with
-   quirky CPU mwaitx implementations this could regress latency or,
-   worst case, hang a worker thread. Mitigation: behavior is gated
-   by an existing runtime knob and can be disabled.
+   quirky CPU mwaitx implementations this could regress latency or,
+   worst case, hang a worker thread. Mitigation: behavior is gated
+   by an existing runtime knob and can be disabled.
  - Patch 0017 refresh: upstream restructured rocrtst CMakeLists.
-   A missed hunk could break the test package (libhsa-runtime64-tests)
-   build or ship stale binaries. Mitigation: autopkgtest builds and
-   exercises the tests.
+   A missed hunk could break the test package (libhsa-runtime64-tests)
+   build or ship stale binaries. Mitigation: autopkgtest builds and
+   exercises the tests.
  - Upstream moved to the rocm-systems mono-repo. Tarball layout
-   shifted (hence the d/not-installed addition). Other path changes
-   could leave runtime files unshipped. Mitigation: lintian and the
-   .install files; rocrtst exercises the runtime paths.
+   shifted (hence the d/not-installed addition). Other path changes
+   could leave runtime files unshipped. Mitigation: lintian and the
+   .install files; rocrtst exercises the runtime paths.
  - Prefetch-init and zero-on-import fixes change control flow in
-   hot paths. The most likely failure mode is a latent caller bug
-   exposed by the now-correct initial state. Mitigation: rocrtst
-   regression suite.
+   hot paths. The most likely failure mode is a latent caller bug
+   exposed by the now-correct initial state. Mitigation: rocrtst
+   regression suite.
  - Buffer overflow fix introduces a bounds check that could in
-   theory reject previously-tolerated inputs. Mitigation: same as
-   above.
- 
- [Other Info]
- 
- - Upload target: stonking (next Ubuntu release). Follows the same
-   workflow as LP: #2150430 (7.1.1 upload).
- - Upstream tag: rocm-7.2.3 (https://github.com/ROCm/ROCR-Runtime/).
- - Co-author credit on patch refreshes: Mario Limonciello (AMD).
- - Build-Depends on rocm-device-libs-21 (>= 7.1.0~), already in the
-   archive.
- - Full list of packaging changes is in d/changelog for
-   7.2.3+dfsg-0ubuntu1.
+   theory reject previously-tolerated inputs. Mitigation: same as
+   above.

** Changed in: rocr-runtime (Ubuntu)
     Assignee: (unassigned) => Talha Can Havadar (tchavadar)

** Also affects: rocr-runtime (Ubuntu Resolute)
   Importance: Undecided
       Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2153419

Title:
  SRU: New Upstream Version 7.2.3

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2153419/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to