[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)

Bryan Fraschetti Mon, 10 Nov 2025 13:45:35 -0800

** Summary changed:

- [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued 
NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines
+ [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued 
NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)


** Description changed:

  [Impact]
  
  - When multiplying complex-valued matrices in Numpy using OpenBLAS
  compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
  optimization / computation engine on the Nvidia GH200 and GB200
  machines, the real-valued component is not calculated correctly.
  
  - Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
  and GB200 machines could hit this bug. Particularly AI / ML workloads
  may be affected and this bug can affect the computational accuracy of
  their results.
  
  [RCA]
  
  - The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
  auto-detects the cpu and determines the SVE kernel path at runtime. The
  GB200 and GH200 use Neoverse V2 CPUs and unfortunately, this dynamic
- detection doesn't work on the GH200 or GB200 and the wrong instruction
- path is chosen as the existing deb doesn't have dynamic support for the
- Neoverse V2. This was fixed upstream in [1]
+ detection doesn't work on that CPU and the wrong instruction path is
+ chosen as the existing deb doesn't have dynamic support for the Neoverse
+ V2. This was fixed upstream in [1]
  
  - The issue can be worked around by setting the environment variable
  before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
  optimizations, reducing overall performance and prevents users from
  leveraging all of their hardware's features
  
  [Test Plan]
  
  To reproduce, run the following commands on Noble in a Python3.12
  environment with NumPy version: 1.26.4 installed (these are the defaults
  versions on Noble).
  
  a = np.array([2 +3j, 3], dtype=np.complex64)
  b = np.array([5, 6], dtype=np.complex64)
  result = np.dot(a, b)
  print(f"np.dot(a, b) = {result}")
  
  This produces the output:
  
  np.dot(a, b) = (73+15j)
  
  which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
  
  With the patched OpenBLAS package installed, the correct result must be
  produced to pass verification
  
  [What can go wrong]
  
  - The dynamic arch detection for GH200 / GB200, which use the Neoverse
  V2, may not work perfectly. In such a case, the most likely scenario is
  that the fallback arch is chosen and this bug would be hit
  
  - Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
  performance on the Neoverse V2 hardware may not be completely optimal,
  but at least correctness will be guaranteed and the performance should
  be better than disabling SVE.
  
  [Extra Info]
  
  - Customer has confirmed that this patch produces the correct
  computation in their testing environment
  
  - PPA to demonstrate build success on amd and arm: [2]
  
  [1] 
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
  [2] 
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages

** Description changed:

  [Impact]
  
  - When multiplying complex-valued matrices in Numpy using OpenBLAS
  compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
  optimization / computation engine on the Nvidia GH200 and GB200
  machines, the real-valued component is not calculated correctly.
  
  - Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
  and GB200 machines could hit this bug. Particularly AI / ML workloads
  may be affected and this bug can affect the computational accuracy of
  their results.
  
  [RCA]
  
  - The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
  auto-detects the cpu and determines the SVE kernel path at runtime. The
- GB200 and GH200 use Neoverse V2 CPUs and unfortunately, this dynamic
- detection doesn't work on that CPU and the wrong instruction path is
- chosen as the existing deb doesn't have dynamic support for the Neoverse
- V2. This was fixed upstream in [1]
+ GB200 and GH200 use Neoverse V2 CPUs (Arm CPUs) and unfortunately, this
+ dynamic detection doesn't work on that CPU and the wrong instruction
+ path is chosen as the existing deb doesn't have dynamic support for the
+ Neoverse V2. This was fixed upstream in [1]
  
  - The issue can be worked around by setting the environment variable
  before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
  optimizations, reducing overall performance and prevents users from
  leveraging all of their hardware's features
  
  [Test Plan]
  
  To reproduce, run the following commands on Noble in a Python3.12
  environment with NumPy version: 1.26.4 installed (these are the defaults
  versions on Noble).
  
  a = np.array([2 +3j, 3], dtype=np.complex64)
  b = np.array([5, 6], dtype=np.complex64)
  result = np.dot(a, b)
  print(f"np.dot(a, b) = {result}")
  
  This produces the output:
  
  np.dot(a, b) = (73+15j)
  
  which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
  
  With the patched OpenBLAS package installed, the correct result must be
  produced to pass verification
  
  [What can go wrong]
  
  - The dynamic arch detection for GH200 / GB200, which use the Neoverse
  V2, may not work perfectly. In such a case, the most likely scenario is
  that the fallback arch is chosen and this bug would be hit
  
  - Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
  performance on the Neoverse V2 hardware may not be completely optimal,
  but at least correctness will be guaranteed and the performance should
  be better than disabling SVE.
  
  [Extra Info]
  
  - Customer has confirmed that this patch produces the correct
  computation in their testing environment
  
  - PPA to demonstrate build success on amd and arm: [2]
  
  [1] 
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
  [2] 
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages

** Description changed:

  [Impact]
  
  - When multiplying complex-valued matrices in Numpy using OpenBLAS
  compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
  optimization / computation engine on the Nvidia GH200 and GB200
  machines, the real-valued component is not calculated correctly.
  
  - Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
  and GB200 machines could hit this bug. Particularly AI / ML workloads
  may be affected and this bug can affect the computational accuracy of
  their results.
  
  [RCA]
  
  - The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
  auto-detects the cpu and determines the SVE kernel path at runtime. The
- GB200 and GH200 use Neoverse V2 CPUs (Arm CPUs) and unfortunately, this
+ GB200 and GH200 use Neoverse V2 CPUs (ARMv8) and unfortunately, this
  dynamic detection doesn't work on that CPU and the wrong instruction
  path is chosen as the existing deb doesn't have dynamic support for the
  Neoverse V2. This was fixed upstream in [1]
  
  - The issue can be worked around by setting the environment variable
  before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
  optimizations, reducing overall performance and prevents users from
  leveraging all of their hardware's features
  
  [Test Plan]
  
  To reproduce, run the following commands on Noble in a Python3.12
  environment with NumPy version: 1.26.4 installed (these are the defaults
  versions on Noble).
  
  a = np.array([2 +3j, 3], dtype=np.complex64)
  b = np.array([5, 6], dtype=np.complex64)
  result = np.dot(a, b)
  print(f"np.dot(a, b) = {result}")
  
  This produces the output:
  
  np.dot(a, b) = (73+15j)
  
  which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
  
  With the patched OpenBLAS package installed, the correct result must be
  produced to pass verification
  
  [What can go wrong]
  
  - The dynamic arch detection for GH200 / GB200, which use the Neoverse
  V2, may not work perfectly. In such a case, the most likely scenario is
  that the fallback arch is chosen and this bug would be hit
  
  - Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
  performance on the Neoverse V2 hardware may not be completely optimal,
  but at least correctness will be guaranteed and the performance should
  be better than disabling SVE.
  
  [Extra Info]
  
  - Customer has confirmed that this patch produces the correct
  computation in their testing environment
  
  - PPA to demonstrate build success on amd and arm: [2]
  
  [1] 
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
  [2] 
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131041

Title:
  [SRU] Incorrect Computation Result on Noble When Multiplying Complex-
  Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines
  (Neoverse V2 CPU)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/2131041/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

[Bug 2131041] Re: [SRU] Incorrect Computation Result on Noble When Multiplying Complex-Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines (Neoverse V2 CPU)

Reply via email to