Public bug reported:
Upon investigation and testing on machine with gfx1151 ROCm_ISA, there
are rocsolver autopkgtests that are failing. Here is log of 1 run:
```
[ FAILED ] 29 tests, listed below:
[ FAILED ] daily_lapack/POSV.strided_batched__float/23, where GetParam() = ({
1000, 2000, 2000, 0 }, { 524, 1 })
[ FAILED ] daily_lapack/POTRF_64.strided_batched__float/9, where GetParam() =
({ 2000, 2000, 0 }, U)
- [ FAILED ] checkin_lapack/GESVDX.__double/162, where GetParam() = ({ 20,
20, 0, 0, 0 }, { 0, 0, 0, 0, 0, 0, 0 })
- [ FAILED ] checkin_lapack/GESVDX.batched__double/167, where GetParam() = ({
20, 20, 0, 0, 0 }, { 0, 1, 1, 5, 12, 0, 0 })
[ FAILED ] daily_lapack/SYGV.strided_batched__float/0, where GetParam() = ({
192, 192, 192, 0 }, { 1, N, U })
[ FAILED ] daily_lapack/SYGVJ.strided_batched__float/16, where GetParam() =
({ 300, 300, 310, 0 }, { 2, V, U })
- [ FAILED ] daily_lapack/HEEVX.__float_complex/2, where GetParam() = ({ 192,
192, 192, 5, 15, 100, 170 }, { V, V, L })
- [ FAILED ] daily_lapack/HEEVX.batched__float_complex/2, where GetParam() =
({ 192, 192, 192, 5, 15, 100, 170 }, { V, V, L })
- [ FAILED ] daily_lapack/HEEVX.batched__float_complex/3, where GetParam() =
({ 192, 192, 192, 5, 15, 100, 170 }, { V, I, U })
- [ FAILED ] daily_lapack/SYEVDX_INPLACE.__float/8, where GetParam() = ({
300, 300, 330, -15, -5, 200, 300 }, { N, V, L })
- [ FAILED ] daily_lapack/SYGVX.batched__float/8, where GetParam() = ({ 256,
270, 256, 260, -10, 10, 1, 100, 0 }, { 3, N, I, U })
- [ FAILED ] daily_lapack/SYGVX.batched__float/10, where GetParam() = ({ 256,
270, 256, 260, -10, 10, 1, 100, 0 }, { 2, V, I, U })
- [ FAILED ] daily_lapack/SYGVX.strided_batched__float/14, where GetParam() =
({ 300, 300, 310, 320, -15, -5, 200, 300, 0 }, { 3, N, I, U })
- [ FAILED ] daily_lapack/HEGVX.__float_complex/6, where GetParam() = ({ 256,
270, 256, 260, -10, 10, 1, 100, 0 }, { 1, N, A, U })
- [ FAILED ] daily_lapack/HEGVX.__float_complex/12, where GetParam() = ({
300, 300, 310, 320, -15, -5, 200, 300, 0 }, { 1, N, A, U })
- [ FAILED ] daily_lapack/HEGVX.batched__float_complex/1, where GetParam() =
({ 192, 192, 192, 192, 5, 15, 100, 150, 0 }, { 2, N, V, L })
- [ FAILED ] daily_lapack/HEGVX.batched__float_complex/2, where GetParam() =
({ 192, 192, 192, 192, 5, 15, 100, 150, 0 }, { 3, N, I, U })
[ FAILED ] daily_lapack/SYGVDX.batched__float/0, where GetParam() = ({ 192,
192, 192, 192, 5, 10, 10, 15, 0 }, { 1, N, A, U })
[ FAILED ] daily_lapack/SYGVDX.batched__float/6, where GetParam() = ({ 256,
270, 256, 260, -10, 10, 1, 100, 0 }, { 1, N, A, U })
[ FAILED ] daily_lapack/SYGVDX.strided_batched__float/6, where GetParam() =
({ 256, 270, 256, 260, -10, 10, 1, 100, 0 }, { 1, N, A, U })
[ FAILED ] daily_lapack/HEGVDX.batched__float_complex/17, where GetParam() =
({ 300, 300, 310, 320, -15, -10, 20, 30, 0 }, { 3, V, A, L })
[ FAILED ] daily_lapack/HEGVDX.strided_batched__float_complex/8, where
GetParam() = ({ 256, 270, 256, 260, -10, 10, 1, 100, 0 }, { 3, N, I, U })
[ FAILED ] checkin_lapack/SYGVDX_INPLACE.__float/41, where GetParam() = ({
35, 35, 35, 35, -10, 10, 3, 15, 0 }, { 3, V, A, L })
- [ FAILED ] checkin_lapack/BDSVDX.__double/83, where GetParam() = (U, { 64,
128, 0 }, { 2, 0, 0, 1, 5 })
- [ FAILED ] checkin_lapack/BDSVDX.__double/85, where GetParam() = (U, { 64,
128, 0 }, { 2, 0, 0, 7, 12 })
- [ FAILED ] checkin_lapack/BDSVDX.__double/86, where GetParam() = (U, { 64,
128, 0 }, { 0, 0, 0, 0, 0 })
- [ FAILED ] checkin_lapack/BDSVDX.__double/87, where GetParam() = (U, { 64,
128, 0 }, { 1, 5, 15, 0, 0 })
- [ FAILED ] checkin_lapack/BDSVDX.__double/88, where GetParam() = (U, { 64,
128, 0 }, { 1, 0, 15, 0, 0 })
- [ FAILED ] checkin_lapack/BDSVDX.__double/179, where GetParam() = (L, { 64,
128, 0 }, { 1, 0, 15, 0, 0 })
29 FAILED TESTS
```
(Expanded log: https://paste.ubuntu.com/p/Wq4Xjdmhy8/)
1) The failing GESVDX, BDSVDX, HEEVX, SYGVX, SYEVDX_INPLACE, HEGVX tests (with
- prefix in the log above) have been solved.
The fix can be mostly credited to the STEBZ
(https://github.com/ROCm/rocm-libraries/pull/4735) and GETF2
(https://github.com/ROCm/rocm-libraries/pull/3743) synchronization bug fix
patches.
It might also be worth mentioning the introduction of the increase hegvdx test
tolerance (https://github.com/ROCm/rocm-libraries/pull/2380) patch, even though
the above mentioned tests were passing without it, but on some environments
they might not.
2) The occasional failing of SYGVDX_INPLACE during some test runs
```
[ FAILED ] checkin_lapack/SYGVDX_INPLACE.__float/41, where GetParam() = ({
35, 35, 35, 35, -10, 10, 3, 15, 0 }, { 3, V, A, L })
```
has been resolved with introducing a increase sygvdx inplace test tolerance
patch (https://github.com/ROCm/rocm-libraries/pull/4436) where instead of 8 * n
it has been changed to 10 * n.
__
Additionally, due to the possibility of various testing environments:
- skip-test-if-vram-is-insufficient.patch
(https://github.com/ROCm/rocm-libraries/pull/3886)
- fix-buffer-overflow-causing-test-fails.patch
(https://github.com/ROCm/rocm-libraries/commit/5ecfb5741a1f0584f1d9b249d4a952e183803c90)
- fix-getri-in-rocsolver-failing.patch
(https://github.com/ROCm/rocm-libraries/pull/1954)
Have been applied.
The current status is that there are couple (2 to 5 depending on the run)
tests failing. From multiple test runs, it can be determined that the failing
tests are always batched OR strided_batched versions of some of the following:
- POSV
- POTRF_64
- SYGV
- SYGVDJ
- SYGVDX
- HEGVDX
Examples:
Run 1
```
[ RUN ] daily_lapack/POSV.strided_batched__float_complex/22
clients/common/lapack/testing_posv.hpp:503: Failure
Expected: ((max_error)) <= ((n)*get_epsilon<T>()), actual: 0.025169280421464928
vs 0.00011920928955078125
[ FAILED ] daily_lapack/POSV.strided_batched__float_complex/22, where
GetParam() = ({ 1000, 2000, 2000, 0 }, { 200, 1 }) (126 ms)
[ RUN ] daily_lapack/POTRF_64.strided_batched__float_complex/5
clients/common/lapack/testing_potf2_potrf.hpp:475: Failure
Expected: ((max_error)) <= ((n)*get_epsilon<T>()), actual:
0.00026901860011981017 vs 0.00011920928955078125
[ FAILED ] daily_lapack/POTRF_64.strided_batched__float_complex/5,
where GetParam() = ({ 1000, 1000, 0 }, U) (130 ms)
[ RUN ] daily_lapack/SYGV.strided_batched__float/17
clients/common/lapack/testing_sygv_hegv.hpp:706: Failure
Expected: ((max_error)) <= ((n)*get_epsilon<T>()), actual: 0.012406964249268786
vs 3.5762786865234375e-05
[ FAILED ] daily_lapack/SYGV.strided_batched__float/17, where GetParam() = ({
300, 300, 310, 0 }, { 3, V, L }) (237 ms)
```
Run 2
```
[ RUN ] checkin_lapack/POTRF_64.strided_batched__float_complex/11
clients/common/lapack/testing_potf2_potrf.hpp:475: Failure
Expected: ((max_error)) <= ((n)*get_epsilon<T>()), actual:
0.0015920922572822285 vs 5.9604644775390625e-06
[ FAILED ] checkin_lapack/POTRF_64.strided_batched__float_complex/11,
where GetParam() = ({ 50, 50, 1 }, U) (6 ms)
[ RUN ] daily_lapack/SYGVDJ.strided_batched__float/16
clients/common/lapack/testing_sygvdj_hegvdj.hpp:668: Failure
Expected: ((max_error)) <= ((2 * n)*get_epsilon<T>()), actual:
0.030438767109022009 vs 7.152557373046875e-05
[ FAILED ] daily_lapack/SYGVDJ.strided_batched__float/16, where
GetParam() = ({ 300, 300, 310, 0 }, { 2, V, U }) (130 ms)
[ RUN ] daily_lapack/SYGVDX.strided_batched__float/17
clients/common/lapack/testing_sygvdx_hegvdx.hpp:1109: Failure
Expected: ((max_error)) <= ((8 * n)*get_epsilon<T>()), actual:
0.0072121107950806618 vs 0.000286102294921875
[ FAILED ] daily_lapack/SYGVDX.strided_batched__float/17, where GetParam() =
({ 300, 300, 310, 320, -15, -10, 20, 30, 0 }, { 3, V, A, L }) (266 ms)
```
Test are failing due to an error threshold well above the tolerance (CPU vs GPU
calculation comparison). It appears that floating point imprecision is one
possible cause, perhaps with lossy math optimizations.
Upstream issues have been opened by users [1 -
https://github.com/ROCm/rocm-libraries/issues/3169, 2 -
https://github.com/ROCm/rocm-libraries/issues/3171, 3 -
https://github.com/ROCm/rocm-libraries/issues/3380] and not addressed yet
meaning probably even newer versions have the same problem.
As rocsolver depends on rocblas (rocblas = building blocks, rocsolver =
LAPACK algorithms assembled from those blocks), moving forward we might
want to see rocblas tests passing.
** Affects: rocsolver (Ubuntu)
Importance: Undecided
Assignee: Bojan Aleksovski (b0b0a)
Status: In Progress
** Changed in: rocsolver (Ubuntu)
Status: New => In Progress
** Changed in: rocsolver (Ubuntu)
Assignee: (unassigned) => Bojan Aleksovski (b0b0a)
--
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2144027
Title:
Fix tests for 7.1.0
To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/rocsolver/+bug/2144027/+subscriptions
--
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs