Public bug reported:

[Impact]
rocblas is the AMD ROCm Basic Linear Algebra Subprograms library, providing
GPU-accelerated BLAS routines. This update from 7.1.0 to 7.2.4 is part of the
coordinated ROCm stack update in Ubuntu Resolute.

Key improvements:
- Level 2 optimizations for tpmv and sbmv functions (performance)
- New rocblas_syrk_ex API enabling mixed-precision symmetric rank-k updates
  (bf16/f16 input with f32 accumulation, f32 input with f64 accumulation).
  This is required by downstream consumers (MIOpen, hipBLAS) that use
  mixed-precision GEMM-like operations for AI/ML training workloads on AMD GPUs.
- Memory allocation behavior change: the default allocation strategy is now
  standard hipMalloc instead of stream-order allocation (hipMallocAsync). The
  previous default could cause issues with certain HIP runtime versions and
  multi-stream workloads. Users who relied on stream-order allocation can
  restore it by setting ROCBLAS_STREAM_ORDER_ALLOC=1.

ABI analysis (abipkgdiff): 0 removed functions, 0 changed functions, 8 added
symbols (syrk_ex templates + device_allocator). The only removed symbols are
453 __hip_cuid_* variables which are auto-generated compile-unit identifiers
from the LLVM/HIP toolchain — not part of any public or stable ABI.

Reverse dependencies of librocblas5 in resolute:
- ROCm stack internal: librocsolver0, libmiopen1, libhipsolver1, libhipblas3,
  librocblas5-bench, librocblas5-tests, librocwmma-tests-validate,
  librocsolver0-bench, librocsolver0-tests, libmiopen1-tests
- External consumers: libtorch-rocm-2.9, libggml0-backend-hip

All reverse dependencies are either part of the same coordinated ROCm update
or are PyTorch/GGML backends that link against the stable C API (which has no
removals). The memory allocation default change uses standard hipMalloc which
is the more conservative/compatible path — no external consumer would have
depended on the stream-order behavior as it was internal to the handle.

[Test Plan]
1. Build rocblas 7.2.4 in the resolute PPA and verify it produces librocblas5.
2. Run the rocblas-test suite (librocblas5-tests) on a system with a supported
   AMD GPU to verify GEMM, TRSM, SYRK correctness.
3. Verify all reverse dependencies (rocsolver, miopen, hipsolver, hipblas)
   build successfully against the updated librocblas5.
4. Confirm ABI compatibility via abipkgdiff (already done — no function
   removals or changes).

[Where problems could occur]
- Memory allocation default change: applications that created/destroyed
  rocblas handles at high frequency may see different allocation timing
  characteristics. Mitigation: set ROCBLAS_STREAM_ORDER_ALLOC=1 to restore
  previous behavior. Risk is low since the new default (hipMalloc) is the
  more traditional/conservative approach.
- Tensile codegen updated (4.44.0 to 4.45.0): kernel selection YAML tuning
  files changed for gfx942, gfx1103, and strixhalo targets. Could
  theoretically select different kernels for certain problem sizes, though
  correctness tests cover this.
- New syrk_ex kernels: new code path, but additive only and gated behind
  explicit API calls — cannot affect existing workloads.

Full abigail report: https://pastebin.ubuntu.com/p/fcxCsZBPNn/

** Affects: rocblas (Ubuntu)
     Importance: Undecided
         Status: New

** Affects: rocblas (Ubuntu Resolute)
     Importance: Undecided
         Status: New

** Also affects: rocblas (Ubuntu Resolute)
   Importance: Undecided
       Status: New

** Description changed:

  [Impact]
  rocblas is the AMD ROCm Basic Linear Algebra Subprograms library, providing
  GPU-accelerated BLAS routines. This update from 7.1.0 to 7.2.4 is part of the
  coordinated ROCm stack update in Ubuntu Resolute.
  
  Key improvements:
  - Level 2 optimizations for tpmv and sbmv functions (performance)
  - New rocblas_syrk_ex API enabling mixed-precision symmetric rank-k updates
-   (bf16/f16 input with f32 accumulation, f32 input with f64 accumulation).
-   This is required by downstream consumers (MIOpen, hipBLAS) that use
-   mixed-precision GEMM-like operations for AI/ML training workloads on AMD 
GPUs.
+   (bf16/f16 input with f32 accumulation, f32 input with f64 accumulation).
+   This is required by downstream consumers (MIOpen, hipBLAS) that use
+   mixed-precision GEMM-like operations for AI/ML training workloads on AMD 
GPUs.
  - Memory allocation behavior change: the default allocation strategy is now
-   standard hipMalloc instead of stream-order allocation (hipMallocAsync). The
-   previous default could cause issues with certain HIP runtime versions and
-   multi-stream workloads. Users who relied on stream-order allocation can
-   restore it by setting ROCBLAS_STREAM_ORDER_ALLOC=1.
+   standard hipMalloc instead of stream-order allocation (hipMallocAsync). The
+   previous default could cause issues with certain HIP runtime versions and
+   multi-stream workloads. Users who relied on stream-order allocation can
+   restore it by setting ROCBLAS_STREAM_ORDER_ALLOC=1.
  
  ABI analysis (abipkgdiff): 0 removed functions, 0 changed functions, 8 added
  symbols (syrk_ex templates + device_allocator). The only removed symbols are
  453 __hip_cuid_* variables which are auto-generated compile-unit identifiers
  from the LLVM/HIP toolchain — not part of any public or stable ABI.
  
  Reverse dependencies of librocblas5 in resolute:
  - ROCm stack internal: librocsolver0, libmiopen1, libhipsolver1, libhipblas3,
-   librocblas5-bench, librocblas5-tests, librocwmma-tests-validate,
-   librocsolver0-bench, librocsolver0-tests, libmiopen1-tests
+   librocblas5-bench, librocblas5-tests, librocwmma-tests-validate,
+   librocsolver0-bench, librocsolver0-tests, libmiopen1-tests
  - External consumers: libtorch-rocm-2.9, libggml0-backend-hip
  
  All reverse dependencies are either part of the same coordinated ROCm update
  or are PyTorch/GGML backends that link against the stable C API (which has no
  removals). The memory allocation default change uses standard hipMalloc which
  is the more conservative/compatible path — no external consumer would have
  depended on the stream-order behavior as it was internal to the handle.
  
  [Test Plan]
  1. Build rocblas 7.2.4 in the resolute PPA and verify it produces librocblas5.
  2. Run the rocblas-test suite (librocblas5-tests) on a system with a supported
-    AMD GPU to verify GEMM, TRSM, SYRK correctness.
+    AMD GPU to verify GEMM, TRSM, SYRK correctness.
  3. Verify all reverse dependencies (rocsolver, miopen, hipsolver, hipblas)
-    build successfully against the updated librocblas5.
+    build successfully against the updated librocblas5.
  4. Confirm ABI compatibility via abipkgdiff (already done — no function
-    removals or changes).
+    removals or changes).
  
  [Where problems could occur]
  - Memory allocation default change: applications that created/destroyed
-   rocblas handles at high frequency may see different allocation timing
-   characteristics. Mitigation: set ROCBLAS_STREAM_ORDER_ALLOC=1 to restore
-   previous behavior. Risk is low since the new default (hipMalloc) is the
-   more traditional/conservative approach.
+   rocblas handles at high frequency may see different allocation timing
+   characteristics. Mitigation: set ROCBLAS_STREAM_ORDER_ALLOC=1 to restore
+   previous behavior. Risk is low since the new default (hipMalloc) is the
+   more traditional/conservative approach.
  - Tensile codegen updated (4.44.0 to 4.45.0): kernel selection YAML tuning
-   files changed for gfx942, gfx1103, and strixhalo targets. Could
-   theoretically select different kernels for certain problem sizes, though
-   correctness tests cover this.
+   files changed for gfx942, gfx1103, and strixhalo targets. Could
+   theoretically select different kernels for certain problem sizes, though
+   correctness tests cover this.
  - New syrk_ex kernels: new code path, but additive only and gated behind
-   explicit API calls — cannot affect existing workloads.
+   explicit API calls — cannot affect existing workloads.
+ 
+ Full abigail report: https://pastebin.ubuntu.com/p/fcxCsZBPNn/

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2155118

Title:
  [SRU] Update rocblas to 7.2.4 in resolute

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rocblas/+bug/2155118/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to