On Mon, 18 Mar 2024, Pierre Jolivet wrote: > > And here we go: > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/jobs/6420606887__;!!G_uCfscf7eWS!alfBlmyFQ5JJUYKxxFdETav6xjHOl5W54BPrmJEyXdSakVXnj8eYIRZdknOI-FK4uiaPdL4zSdJlD2zrcw$ > > 20 minutes in, and still in the dm_* tests with timeouts right, left, and > center. > For reference, this prior job > https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/jobs/6418468279__;!!G_uCfscf7eWS!alfBlmyFQ5JJUYKxxFdETav6xjHOl5W54BPrmJEyXdSakVXnj8eYIRZdknOI-FK4uiaPdL4zSdJj83LENQ$ > completed in 3 minutes (OK, maybe add a couple of minutes to rebuild the > packages to have a fair comparison). > What did they do to OpenBLAS? Add a sleep() in their axpy?
(gdb) r Starting program: /home/petsc/petsc/src/dm/dt/tests/ex13 ^C Program received signal SIGINT, Interrupt. 0x0000fffff331ad10 in dgemm_otcopy (m=m@entry=8, n=n@entry=7, a=a@entry=0x58f150, lda=lda@entry=15, b=b@entry=0xffffefae0000) at ../kernel/arm64/../generic/gemm_tcopy_2.c:69 69 *(b_offset1 + 3) = *(a_offset2 + 1); (gdb) where #0 0x0000fffff331ad10 in dgemm_otcopy (m=m@entry=8, n=n@entry=7, a=a@entry=0x58f150, lda=lda@entry=15, b=b@entry=0xffffefae0000) at ../kernel/arm64/../generic/gemm_tcopy_2.c:69 #1 0x0000fffff3342e68 in dgetrf_single (args=args@entry=0xffffffffe9d8, range_m=range_m@entry=0x0, range_n=range_n@entry=0x0, sa=sa@entry=0xffffefae0000, sb=<optimized out>, myid=myid@entry=0) at getrf_single.c:157 #2 0x0000fffff3255ec4 in dgetrf_ (M=<optimized out>, N=<optimized out>, a=<optimized out>, ldA=<optimized out>, ipiv=<optimized out>, Info=0xffffffffeaa8) at lapack/getrf.c:110 #3 0x0000fffff50b8dd8 in MatLUFactor_SeqDense (A=0x598360, row=0x0, col=0x0, minfo=0xffffffffeba8) at /home/petsc/petsc/src/mat/impls/dense/seq/dense.c:801 #4 0x0000fffff559b8b4 in MatLUFactor (mat=0x598360, row=0x0, col=0x0, info=0xffffffffeba8) at /home/petsc/petsc/src/mat/interface/matrix.c:3087 #5 0x00000000004149e0 in test (dim=2, deg=3, form=-1, jetDegree=3, cond=PETSC_FALSE) at ex13.c:141 #6 0x0000000000418f20 in main (argc=1, argv=0xfffffffff158) at ex13.c:303 (gdb) It appears to get stuck in a loop here. This test runs fine - if I remove "--download-openblas-make-options=TARGET=GENERIC" option. Ok - trying out "git bisect" ea6c5f3cf553a23f8e2e787307805e7874e1f9c6 is the first bad commit commit ea6c5f3cf553a23f8e2e787307805e7874e1f9c6 Author: Martin Kroeker <mar...@ruby.chemie.uni-freiburg.de> Date: Sun Oct 30 12:55:23 2022 +0100 Add option RELAPACK_REPLACE Makefile.rule | 5 ++++- Makefile.system | 4 ++++ 2 files changed, 8 insertions(+), 1 deletion(-) Don't really understand why this change is triggering this hang. Or the correct way to build latest openblas [do we need "BUILD_RELAPACK=1"?] Satish