** Description changed:
[Impact]
- - When multiplying complex-valued matrices in Numpy using OpenBLAS as
- the optimization / computation engine, if OpenBLAS was compiled with
- DYNAMIC_ARCH=1 (as is done in the Noble deb) the real-valued component
- is not calculated correctly on Nvidia GH200 and GB200 machines.
+ - When multiplying complex-valued matrices in Numpy using OpenBLAS
+ compiled with DYNAMIC_ARCH=1 (as is done in the Noble deb) as the
+ optimization / computation engine on machines with Neoverse V2
+ architecture (eg. Nvidia GH200 and GB200 machines) then real-valued
+ component of the matrix product is not calculated correctly.
- Anyone using NumPy / OpenBLAS to multiply matrices on Noble on GH200
and GB200 machines could hit this bug. Particularly AI / ML workloads
may be affected and this bug can affect the computational accuracy of
their results.
[RCA]
- The root cause is that OpenBLAS is compiled with DYNAMIC_ARCH=1, which
auto-detects the cpu and determines the SVE kernel path at runtime. The
GB200 and GH200 use Neoverse V2 CPUs (ARMv8) and unfortunately, this
dynamic detection doesn't work on that CPU and the wrong instruction
path is chosen as the existing deb doesn't have dynamic support for the
Neoverse V2. This was fixed upstream in [1]
- The correct hardware detection was added in 0.3.27, while Noble is on
0.3.26. All currently supported releases newer than Noble have greater
versions than 0.3.27 and as a result, nothing needs to be done for
Plucky, Questing, or Resolute
- The issue can be worked around by setting the environment variable
before computation: OPENBLAS_CORETYPE=ARMV8, but this disables the SVE
optimizations, reducing overall performance and prevents users from
leveraging all of their hardware's features
[Test Plan]
To reproduce, run the following commands on Noble in a Python3.12
environment with NumPy version: 1.26.4 installed (these are the defaults
versions on Noble).
a = np.array([2 +3j, 3], dtype=np.complex64)
b = np.array([5, 6], dtype=np.complex64)
result = np.dot(a, b)
print(f"np.dot(a, b) = {result}")
This produces the output:
np.dot(a, b) = (73+15j)
which is incorrect. The correct computation is np.dot(a, b) = (28+15j)
With the patched OpenBLAS package installed, the correct result must be
produced to pass verification
[What can go wrong]
- The dynamic arch detection for GH200 / GB200, which use Neoverse V2,
may not work perfectly. In such a case, the most likely scenario is that
the fallback arch is chosen and this bug would be hit
- Since Neoverse V2 is mapped to the existing Neoverse V1 kernels,
performance on the Neoverse V2 hardware may not be completely optimal,
but at least correctness will be guaranteed and the performance will be
better than disabling SVE altogether.
[Extra Info]
- Customer has confirmed that this patch produces the correct
computation in their testing environment
- PPA to demonstrate build success on amd and arm: [2]
[1]
https://github.com/OpenMathLib/OpenBLAS/commit/aaf65210ccba0c53408c242a2e0f5ad5d798d532
[2]
https://launchpad.net/~bryanfraschetti/+archive/ubuntu/lws-openblas/+packages
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2131041
Title:
[SRU] Incorrect Computation Result on Noble When Multiplying Complex-
Valued NumPy Matrices (via OpenBLAS) on GH200 and GB200 machines
(Neoverse V2 CPU)
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/openblas/+bug/2131041/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs