Public bug reported: [Impact]
The rocr-runtime package provides the HSA Runtime API and ROCm core runtime used by GPU compute workloads. The 7.1.1 version currently shipped is missing several upstream fixes available in ROCm 7.2.3: 1. Trap handler regression on gfx120x (RDNA 4) GPUs - the upstream fix restores exception-handling behavior that regressed in 7.1.x. 2. Potential buffer overflow in rocr - defensive bounds fix. 3. Uninitialized Prefetch range - the Prefetch range was not initialized in its constructor, leading to non-deterministic behavior. 4. Uninitialized IPC import results - hsaKmtHandleImport could be invoked with a non-zeroed result struct, potentially returning stale data on partial failures. 5. mwaitx enabled by default - improves host-side wait performance on AMD CPUs. 6. libhsakmt: correct printf format specifiers in fmm.c for unsigned integer types, eliminating undefined behavior in log output. Packaging changes in this upload: - d/watch updated to point at the new upstream rocm-systems mono-repo. - d/salsa-ci.yml and d/ci/ updated for Ubuntu. - d/p/0017-enable-building-rocrtst-as-part-of-rocr.patch refreshed for the new upstream CMakeLists. - d/not-installed: hsa-rocr/LICENSE.md added (now ships in the upstream tarball under hsa-rocr/). - d/control: drop redundant Rules-Requires-Root: no (lintian). The public libhsa-runtime64 ABI is unchanged in this upload; the symbols file is enforced with DPKG_GENSYMBOLS_CHECK_LEVEL=4. [Test Plan] 1. Build the package in a clean chroot for all supported architectures (amd64, arm64, ppc64el). 2. Install from -proposed: apt install libhsa-runtime64-1 libhsa-runtime-dev 3. Run autopkgtests: - libhsa-runtime64-tests (rocrtst suite) on amd64 - run-tests binary 4. Confirm dpkg-gensymbols reports no ABI regressions (DPKG_GENSYMBOLS_CHECK_LEVEL=4 enforced via d/rules). 5. Smoke-test reverse dependencies that link against libhsa-runtime64-1 (e.g. hipcc, rocBLAS) to confirm they still load and run a trivial kernel. [Where Problems Could Occur] - Trap handler change for gfx120x touches GPU exception handling. A subtle defect could surface as application crashes or hangs on RDNA 4 hardware. Mitigation: rocrtst + autopkgtest coverage. And https://canonical.github.io/rocm-qa/ application tests - mwaitx-by-default changes host-side waiting. On systems with quirky CPU mwaitx implementations this could regress latency or, worst case, hang a worker thread. Mitigation: behavior is gated by an existing runtime knob and can be disabled. - Patch 0017 refresh: upstream restructured rocrtst CMakeLists. A missed hunk could break the test package (libhsa-runtime64-tests) build or ship stale binaries. Mitigation: autopkgtest builds and exercises the tests. - Upstream moved to the rocm-systems mono-repo. Tarball layout shifted (hence the d/not-installed addition). Other path changes could leave runtime files unshipped. Mitigation: lintian and the .install files; rocrtst exercises the runtime paths. - Prefetch-init and zero-on-import fixes change control flow in hot paths. The most likely failure mode is a latent caller bug exposed by the now-correct initial state. Mitigation: rocrtst regression suite. - Buffer overflow fix introduces a bounds check that could in theory reject previously-tolerated inputs. Mitigation: same as above. ** Affects: rocr-runtime (Ubuntu) Importance: Undecided Assignee: Talha Can Havadar (tchavadar) Status: New ** Affects: rocr-runtime (Ubuntu Resolute) Importance: Undecided Status: New ** Description changed: [Impact] The rocr-runtime package provides the HSA Runtime API and ROCm core runtime used by GPU compute workloads. The 7.1.1 version currently shipped is missing several upstream fixes available in ROCm 7.2.3: 1. Trap handler regression on gfx120x (RDNA 4) GPUs - the upstream - fix restores exception-handling behavior that regressed in 7.1.x. + fix restores exception-handling behavior that regressed in 7.1.x. 2. Potential buffer overflow in rocr - defensive bounds fix. 3. Uninitialized Prefetch range - the Prefetch range was not - initialized in its constructor, leading to non-deterministic - behavior. + initialized in its constructor, leading to non-deterministic + behavior. 4. Uninitialized IPC import results - hsaKmtHandleImport could be - invoked with a non-zeroed result struct, potentially returning - stale data on partial failures. + invoked with a non-zeroed result struct, potentially returning + stale data on partial failures. 5. mwaitx enabled by default - improves host-side wait performance - on AMD CPUs. + on AMD CPUs. 6. libhsakmt: correct printf format specifiers in fmm.c for unsigned - integer types, eliminating undefined behavior in log output. + integer types, eliminating undefined behavior in log output. Packaging changes in this upload: - d/watch updated to point at the new upstream rocm-systems mono-repo. - d/salsa-ci.yml and d/ci/ updated for Ubuntu. - d/p/0017-enable-building-rocrtst-as-part-of-rocr.patch refreshed for - the new upstream CMakeLists. + the new upstream CMakeLists. - d/not-installed: hsa-rocr/LICENSE.md added (now ships in the upstream - tarball under hsa-rocr/). + tarball under hsa-rocr/). - d/control: drop redundant Rules-Requires-Root: no (lintian). The public libhsa-runtime64 ABI is unchanged in this upload; the symbols file is enforced with DPKG_GENSYMBOLS_CHECK_LEVEL=4. [Test Plan] 1. Build the package in a clean chroot for all supported - architectures (amd64, arm64, ppc64el). + architectures (amd64, arm64, ppc64el). 2. Install from -proposed: - apt install libhsa-runtime64-1 libhsa-runtime-dev + apt install libhsa-runtime64-1 libhsa-runtime-dev 3. Run autopkgtests: - - libhsa-runtime64-tests (rocrtst suite) on amd64 - - run-tests binary + - libhsa-runtime64-tests (rocrtst suite) on amd64 + - run-tests binary 4. Confirm dpkg-gensymbols reports no ABI regressions - (DPKG_GENSYMBOLS_CHECK_LEVEL=4 enforced via d/rules). + (DPKG_GENSYMBOLS_CHECK_LEVEL=4 enforced via d/rules). 5. Smoke-test reverse dependencies that link against - libhsa-runtime64-1 (e.g. hipcc, rocBLAS) to confirm they still - load and run a trivial kernel. - 6. On gfx120x (RDNA 4) hardware, exercise a workload that previously - hit the trap-handler regression and confirm it is resolved. + libhsa-runtime64-1 (e.g. hipcc, rocBLAS) to confirm they still + load and run a trivial kernel. [Where Problems Could Occur] - Trap handler change for gfx120x touches GPU exception handling. - A subtle defect could surface as application crashes or hangs on - RDNA 4 hardware. Mitigation: rocrtst + autopkgtest coverage. + A subtle defect could surface as application crashes or hangs on + RDNA 4 hardware. Mitigation: rocrtst + autopkgtest coverage. And https://canonical.github.io/rocm-qa/ application tests - mwaitx-by-default changes host-side waiting. On systems with - quirky CPU mwaitx implementations this could regress latency or, - worst case, hang a worker thread. Mitigation: behavior is gated - by an existing runtime knob and can be disabled. + quirky CPU mwaitx implementations this could regress latency or, + worst case, hang a worker thread. Mitigation: behavior is gated + by an existing runtime knob and can be disabled. - Patch 0017 refresh: upstream restructured rocrtst CMakeLists. - A missed hunk could break the test package (libhsa-runtime64-tests) - build or ship stale binaries. Mitigation: autopkgtest builds and - exercises the tests. + A missed hunk could break the test package (libhsa-runtime64-tests) + build or ship stale binaries. Mitigation: autopkgtest builds and + exercises the tests. - Upstream moved to the rocm-systems mono-repo. Tarball layout - shifted (hence the d/not-installed addition). Other path changes - could leave runtime files unshipped. Mitigation: lintian and the - .install files; rocrtst exercises the runtime paths. + shifted (hence the d/not-installed addition). Other path changes + could leave runtime files unshipped. Mitigation: lintian and the + .install files; rocrtst exercises the runtime paths. - Prefetch-init and zero-on-import fixes change control flow in - hot paths. The most likely failure mode is a latent caller bug - exposed by the now-correct initial state. Mitigation: rocrtst - regression suite. + hot paths. The most likely failure mode is a latent caller bug + exposed by the now-correct initial state. Mitigation: rocrtst + regression suite. - Buffer overflow fix introduces a bounds check that could in - theory reject previously-tolerated inputs. Mitigation: same as - above. - - [Other Info] - - - Upload target: stonking (next Ubuntu release). Follows the same - workflow as LP: #2150430 (7.1.1 upload). - - Upstream tag: rocm-7.2.3 (https://github.com/ROCm/ROCR-Runtime/). - - Co-author credit on patch refreshes: Mario Limonciello (AMD). - - Build-Depends on rocm-device-libs-21 (>= 7.1.0~), already in the - archive. - - Full list of packaging changes is in d/changelog for - 7.2.3+dfsg-0ubuntu1. + theory reject previously-tolerated inputs. Mitigation: same as + above. ** Changed in: rocr-runtime (Ubuntu) Assignee: (unassigned) => Talha Can Havadar (tchavadar) ** Also affects: rocr-runtime (Ubuntu Resolute) Importance: Undecided Status: New -- You received this bug notification because you are a member of Ubuntu Bugs, which is subscribed to Ubuntu. https://bugs.launchpad.net/bugs/2153419 Title: SRU: New Upstream Version 7.2.3 To manage notifications about this bug go to: https://bugs.launchpad.net/ubuntu/+source/rocr-runtime/+bug/2153419/+subscriptions -- ubuntu-bugs mailing list [email protected] https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs
