[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)

Bryan Fraschetti Mon, 24 Nov 2025 09:51:56 -0800

** Description changed:

  [Impact]
  
- - When multiplying complex-valued matrices in Numpy using OpenBLAS
- compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
- optimization / computation engine on the Nvidia GH200 and GB200
- machines, the real-valued component is not calculated correctly.
+ - When multiplying complex-valued matrices in Numpy using OpenBLAS as
+ the optimization / computation engine, if OpenBLAS was compiled with
+ DYNAMIC_ARCH=1 (as is done in the Noble deb) the real-valued component
+ is not calculated correctly on Nvidia GH200 and GB200 machines.
  
  - Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
  and GB200 machines could hit this bug. Particularly AI / ML workloads
  may be affected and this bug can affect the computational accuracy of
  their results.
  
  [RCA]
  
  - The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
  auto-detects the cpu and determines the SVE kernel path at runtime. The
  GB200 and GH200 use Neoverse V2 CPUs (ARMv8) and unfortunately, this
  dynamic detection doesn't work on that CPU and the wrong instruction
  path is chosen as the existing deb doesn't have dynamic support for the
  Neoverse V2. This was fixed upstream in [1]
  
  - The correct hardware detection was added in 0.3.27, while Noble is on
  0.3.26. All currently supported releases newer than Noble have greater
  versions than 0.3.27 and as a result, nothing needs to be done for
  Plucky, Questing, or Resolute
  
  - The issue can be worked around by setting the environment variable
  before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
  optimizations, reducing overall performance and prevents users from
  leveraging all of their hardware's features
  
  [Test Plan]
  
  To reproduce, run the following commands on Noble in a Python3.12
  environment with NumPy version: 1.26.4 installed (these are the defaults
  versions on Noble).
  
  a = np.array([2 +3j, 3], dtype=np.complex64)
  b = np.array([5, 6], dtype=np.complex64)
  result = np.dot(a, b)
  print(f"np.dot(a, b) = {result}")
  
  This produces the output:
  
  np.dot(a, b) = (73+15j)
  
  which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
  
  With the patched OpenBLAS package installed, the correct result must be
  produced to pass verification
  
  [What can go wrong]
  
  - The dynamic arch detection for GH200 / GB200, which use the Neoverse
  V2, may not work perfectly. In such a case, the most likely scenario is
  that the fallback arch is chosen and this bug would be hit
  
  - Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
  performance on the Neoverse V2 hardware may not be completely optimal,
  but at least correctness will be guaranteed and the performance should
  be better than disabling SVE.
  
  [Extra Info]
  
  - Customer has confirmed that this patch produces the correct
  computation in their testing environment
  
  - PPA to demonstrate build success on amd and arm: [2]
  
  [1] 
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
  [2] 
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages


** Description changed:

  [Impact]
  
  - When multiplying complex-valued matrices in Numpy using OpenBLAS as
  the optimization / computation engine, if OpenBLAS was compiled with
  DYNAMIC_ARCH=1 (as is done in the Noble deb) the real-valued component
  is not calculated correctly on Nvidia GH200 and GB200 machines.
  
  - Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
  and GB200 machines could hit this bug. Particularly AI / ML workloads
  may be affected and this bug can affect the computational accuracy of
  their results.
  
  [RCA]
  
  - The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
  auto-detects the cpu and determines the SVE kernel path at runtime. The
  GB200 and GH200 use Neoverse V2 CPUs (ARMv8) and unfortunately, this
  dynamic detection doesn't work on that CPU and the wrong instruction
  path is chosen as the existing deb doesn't have dynamic support for the
  Neoverse V2. This was fixed upstream in [1]
  
  - The correct hardware detection was added in 0.3.27, while Noble is on
  0.3.26. All currently supported releases newer than Noble have greater
  versions than 0.3.27 and as a result, nothing needs to be done for
  Plucky, Questing, or Resolute
  
  - The issue can be worked around by setting the environment variable
  before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
  optimizations, reducing overall performance and prevents users from
  leveraging all of their hardware's features
  
  [Test Plan]
  
  To reproduce, run the following commands on Noble in a Python3.12
  environment with NumPy version: 1.26.4 installed (these are the defaults
  versions on Noble).
  
  a = np.array([2 +3j, 3], dtype=np.complex64)
  b = np.array([5, 6], dtype=np.complex64)
  result = np.dot(a, b)
  print(f"np.dot(a, b) = {result}")
  
  This produces the output:
  
  np.dot(a, b) = (73+15j)
  
  which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
  
  With the patched OpenBLAS package installed, the correct result must be
  produced to pass verification
  
  [What can go wrong]
  
- - The dynamic arch detection for GH200 / GB200, which use the Neoverse
- V2, may not work perfectly. In such a case, the most likely scenario is
- that the fallback arch is chosen and this bug would be hit
+ - The dynamic arch detection for GH200 / GB200, which use Neoverse V2,
+ may not work perfectly. In such a case, the most likely scenario is that
+ the fallback arch is chosen and this bug would be hit
  
  - Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
  performance on the Neoverse V2 hardware may not be completely optimal,
- but at least correctness will be guaranteed and the performance should
- be better than disabling SVE.
+ but at least correctness will be guaranteed and the performance will be
+ better than disabling SVE altogether.
  
  [Extra Info]
  
  - Customer has confirmed that this patch produces the correct
  computation in their testing environment
  
  - PPA to demonstrate build success on amd and arm: [2]
  
  [1] 
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
  [2] 
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131041

Title:
  [SRU] Incorrect Computation Result on Noble When Multiplying Complex-
  Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines
  (Neoverse V2 CPU)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/2131041/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)

Reply via email to