Public bug reported:

[Impact]

- When multiplying complex-valued matrices in Numpy using OpenBLAS
compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
optimization / computation engine on the GH200 and GB200, the real-
valued component is not calculated correctly.

- Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
and GB200 machines could hit this bug. Particularly AI / ML workloads
may be affected and this bug can affect the computational accuracy of
their results.

[RCA]

- The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
auto-detects the cpu and determines the SVE kernel path at runtime. The
GB200 and GH200 use Neoverse V2 CPUs and unfortunately, this dynamic
detection doesn't work on the GH200 or GB200 and the wrong instruction
path is chosen as the existing deb doesn't have dynamic support for the
Neoverse V2. This was fixed upstream in [1]

- The issue can be worked around by setting the environment variable
before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
optimizations, reducing overall performance and prevents users from
leveraging all of their hardware's features

[Test Plan]

To reproduce, run the following commands on Noble in a Python3.12
environment with NumPy version: 1.26.4 installed (these are the defaults
versions on Noble).

a = np.array([2 +3j, 3], dtype=np.complex64)
b = np.array([5, 6], dtype=np.complex64)
result = np.dot(a, b)
print(f"np.dot(a, b) = {result}")

This produces the output:

np.dot(a, b) = (73+15j)

which is incorrect. The correct computation is np.dot(a, b) = (28+15j)

With the patched OpenBLAS package installed, the correct result will be
produced in order to accept.

[What can go wrong]

- The dynamic arch detection for GH200 / GB200, which use the Neoverse
V2, may not work perfectly. In such a case, the most likely scenario is
that the fallback arch is chosen and this bug would be hit

- Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
performance on the Neoverse V2 hardware may not be completely optimal,
but at least correctness will be guaranteed and the performance should
be better than disabling SVE.

[Extra Info]

- Customer has confirmed that this patch produces the correct
computation in their testing environment

- PPA to demonstrate build success on amd and arm: [2]

[1] 
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
[2] 
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages

** Affects: openblas (Ubuntu)
     Importance: Undecided
         Status: New

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131041

Title:
  [SRU] Incorrect Computation Result on Noble When Multiplying Complex-
  Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/2131041/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to