[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)

Bryan Fraschetti Tue, 25 Nov 2025 13:30:48 -0800

** Description changed:

  [Impact]
  
- - When multiplying complex-valued matrices in Numpy using OpenBLAS as
- the optimization / computation engine, if OpenBLAS was compiled with
- DYNAMIC_ARCH=1 (as is done in the Noble deb) the real-valued component
- is not calculated correctly on Nvidia GH200 and GB200 machines.
+ - When multiplying complex-valued matrices in Numpy using OpenBLAS
+ compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
+ optimization / computation engine on machines with Neoverse V2
+ architecture (eg. Nvidia GH200 and GB200 machines) then real-valued
+ component of the matrix product is not calculated correctly.
  
  - Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
  and GB200 machines could hit this bug. Particularly AI / ML workloads
  may be affected and this bug can affect the computational accuracy of
  their results.
  
  [RCA]
  
  - The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
  auto-detects the cpu and determines the SVE kernel path at runtime. The
  GB200 and GH200 use Neoverse V2 CPUs (ARMv8) and unfortunately, this
  dynamic detection doesn't work on that CPU and the wrong instruction
  path is chosen as the existing deb doesn't have dynamic support for the
  Neoverse V2. This was fixed upstream in [1]
  
  - The correct hardware detection was added in 0.3.27, while Noble is on
  0.3.26. All currently supported releases newer than Noble have greater
  versions than 0.3.27 and as a result, nothing needs to be done for
  Plucky, Questing, or Resolute
  
  - The issue can be worked around by setting the environment variable
  before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
  optimizations, reducing overall performance and prevents users from
  leveraging all of their hardware's features
  
  [Test Plan]
  
  To reproduce, run the following commands on Noble in a Python3.12
  environment with NumPy version: 1.26.4 installed (these are the defaults
  versions on Noble).
  
  a = np.array([2 +3j, 3], dtype=np.complex64)
  b = np.array([5, 6], dtype=np.complex64)
  result = np.dot(a, b)
  print(f"np.dot(a, b) = {result}")
  
  This produces the output:
  
  np.dot(a, b) = (73+15j)
  
  which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
  
  With the patched OpenBLAS package installed, the correct result must be
  produced to pass verification
  
  [What can go wrong]
  
  - The dynamic arch detection for GH200 / GB200, which use Neoverse V2,
  may not work perfectly. In such a case, the most likely scenario is that
  the fallback arch is chosen and this bug would be hit
  
  - Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
  performance on the Neoverse V2 hardware may not be completely optimal,
  but at least correctness will be guaranteed and the performance will be
  better than disabling SVE altogether.
  
  [Extra Info]
  
  - Customer has confirmed that this patch produces the correct
  computation in their testing environment
  
  - PPA to demonstrate build success on amd and arm: [2]
  
  [1] 
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
  [2] 
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages


-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131041

Title:
  [SRU] Incorrect Computation Result on Noble When Multiplying Complex-
  Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines
  (Neoverse V2 CPU)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/2131041/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)

Reply via email to