Thanks,
Pierre
http://joliv.et/irene-rome-configure.log
<http://joliv.et/irene-rome-configure.log>
$ /usr/bin/gmake -f gmakefile test test-fail=1
Using MAKEFLAGS: test-fail=1
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
ok snes_tutorials-ex12_quad_hpddm_reuse_baij
ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
ok ksp_ksp_tutorials-ex50_tut_2 # SKIP PETSC_HAVE_SUPERLU_DIST
requirement not met
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
ok snes_tutorials-ex56_hypre
ok diff-snes_tutorials-ex56_hypre
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
ok snes_tutorials-ex17_3d_q3_trig_elas
ok diff-snes_tutorials-ex17_3d_q3_trig_elas
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
ok snes_tutorials-ex19_tut_3
ok diff-snes_tutorials-ex19_tut_3
TEST
arch-linux2-c-opt-ompi/tests/counts/mat_tests-ex242_3.counts
not ok mat_tests-ex242_3 # Error code: 137
#[1]PETSC ERROR:
------------------------------------------------------------------------
#[1]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
Violation, probably memory access out of range
#[1]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
#[1]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
<https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
#[1]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/>
on GNU/linux and Apple Mac OS X to find memory corruption errors
#[1]PETSC ERROR: configure using --with-debugging=yes, recompile,
link, and run
#[1]PETSC ERROR: to get more information on the crash.
#[1]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
#[1]PETSC ERROR: Signal received
#[1]PETSC ERROR: See
https://www.mcs.anl.gov/petsc/documentation/faq.html
<https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
shooting.
#[1]PETSC ERROR: Petsc Development GIT revision:
v3.14.4-733-g7ab9467ef9 GIT Date: 2021-03-02 16:15:11 +0000
#[2]PETSC ERROR:
------------------------------------------------------------------------
#[2]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
Violation, probably memory access out of range
#[2]PETSC ERROR: Try option -start_in_debugger or
-on_error_attach_debugger
#[2]PETSC ERROR: or see
https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
<https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
#[2]PETSC ERROR: or try http://valgrind.org <http://valgrind.org/>
on GNU/linux and Apple Mac OS X to find memory corruption errors
#[2]PETSC ERROR: configure using --with-debugging=yes, recompile,
link, and run
#[2]PETSC ERROR: to get more information on the crash.
#[2]PETSC ERROR: --------------------- Error Message
--------------------------------------------------------------
#[2]PETSC ERROR: Signal received
#[2]PETSC ERROR: See
https://www.mcs.anl.gov/petsc/documentation/faq.html
<https://www.mcs.anl.gov/petsc/documentation/faq.html> for trouble
shooting.
#[2]PETSC ERROR: Petsc Development GIT revision:
v3.14.4-733-g7ab9467ef9 GIT Date: 2021-03-02 16:15:11 +0000
#[2]PETSC ERROR:
/ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242
on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar 3
08:21:20 2021
#[2]PETSC ERROR: Configure options --download-hpddm
--download-hpddm-commit=origin/main --download-hypre
--download-metis --download-mumps --download-parmetis
--download-ptscotch --download-slepc
--download-slepc-commit=origin/main --download-tetgen
--known-mpi-c-double-complex --known-mpi-int64_t
--known-mpi-long-double --with-avx512-kernels=1
--with-blaslapack-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64
--with-cc=mpicc --with-cxx=mpicxx --with-debugging=0
--with-fc=mpifort --with-fortran-bindings=0 --with-make-np=40
--with-mkl_cpardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281
--with-mkl_cpardiso=1
--with-mkl_pardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl
--with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1
--with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/
--with-scalapack-include=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/include
--with-scalapack-lib="[/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_scalapack_lp64.so,/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so]"
--with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast
-mavx2" CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3
-fp-model fast -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
#[2]PETSC ERROR: #1 User provided function() line 0 in unknown file
#[2]PETSC ERROR: Run with -malloc_debug to check if memory
corruption is causing the crash.
#[1]PETSC ERROR:
/ccc/work/cont003/rndm/rndm/petsc/arch-linux2-c-opt-ompi/tests/mat/tests/runex242_3/../ex242
on a arch-linux2-c-opt-ompi named irene4047 by jolivetp Wed Mar 3
08:21:20 2021
#[1]PETSC ERROR: Configure options --download-hpddm
--download-hpddm-commit=origin/main --download-hypre
--download-metis --download-mumps --download-parmetis
--download-ptscotch --download-slepc
--download-slepc-commit=origin/main --download-tetgen
--known-mpi-c-double-complex --known-mpi-int64_t
--known-mpi-long-double --with-avx512-kernels=1
--with-blaslapack-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64
--with-cc=mpicc --with-cxx=mpicxx --with-debugging=0
--with-fc=mpifort --with-fortran-bindings=0 --with-make-np=40
--with-mkl_cpardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281
--with-mkl_cpardiso=1
--with-mkl_pardiso-dir=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl
--with-mkl_pardiso=1 --with-mpiexec=ccc_mprun --with-openmp=1
--with-packages-download-dir=/ccc/cont003/home/enseeiht/jolivetp/Dude/externalpackages/
--with-scalapack-include=/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/include
--with-scalapack-lib="[/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_scalapack_lp64.so,/ccc/products/mkl-19.0.5.281/intel--19.0.5.281__openmpi--4.0.1/default/19.0.5.281/mkl/lib/intel64/libmkl_blacs_openmpi_lp64.so]"
--with-scalar-type=real --with-x=0 COPTFLAGS="-O3 -fp-model fast
-mavx2" CXXOPTFLAGS="-O3 -fp-model fast -mavx2" FOPTFLAGS="-O3
-fp-model fast -mavx2" PETSC_ARCH=arch-linux2-c-opt-ompi
#[1]PETSC ERROR: #1 User provided function() line 0 in unknown file
#[1]PETSC ERROR: Run with -malloc_debug to check if memory
corruption is causing the crash.
#--------------------------------------------------------------------------
#MPI_ABORT was invoked on rank 2 in communicator MPI_COMM_WORLD
#with errorcode 50176059.
#
#NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
#You may or may not see output from other processes, depending on
#exactly when Open MPI kills them.
#--------------------------------------------------------------------------
#--------------------------------------------------------------------------
#MPI_ABORT was invoked on rank 1 in communicator MPI_COMM_WORLD
#with errorcode 50176059.
#
#NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
#You may or may not see output from other processes, depending on
#exactly when Open MPI kills them.
#--------------------------------------------------------------------------
#srun: Job step aborted: Waiting up to 302 seconds for job step to
finish.
#slurmstepd-irene4047: error: *** STEP 1374176.36 ON irene4047
CANCELLED AT 2021-03-03T08:21:20 ***
#srun: error: irene4047: task 0: Killed
#srun: error: irene4047: tasks 1-2: Exited with exit code 16
ok mat_tests-ex242_3 # SKIP Command failed so no diff
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
ok snes_tutorials-ex17_3d_q3_trig_vlap
ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
ok snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
ok
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
ok ksp_ksp_tutorials-ex49_hypre_nullspace
ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
TEST
arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xper_ref.counts
ok ts_tutorials-ex18_p1p1_xper_ref
ok diff-ts_tutorials-ex18_p1p1_xper_ref
TEST
arch-linux2-c-opt-ompi/tests/counts/ts_tutorials-ex18_p1p1_xyper_ref.counts
ok ts_tutorials-ex18_p1p1_xyper_ref
ok diff-ts_tutorials-ex18_p1p1_xyper_ref
TEST
arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
ok snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
ok
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
TEST
arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
ok ksp_ksp_tutorials-ex64_1 # SKIP PETSC_HAVE_SUPERLU_DIST
requirement not met
On 3 Mar 2021, at 6:21 AM, Eric Chamberland
<[email protected]
<mailto:[email protected]>> wrote:
Just started a discussion on the side:
https://community.intel.com/t5/Intel-oneAPI-Math-Kernel-Library/Intel-MKL-Link-Line-Advisor-as-external-tool/m-p/1260895#M30974
Eric
On 2021-03-02 3:50 p.m., Pierre Jolivet wrote:
Hello Eric,
src/mat/tests/ex237.c is a recent test with some code paths that
should be disabled for “old” MKL versions. It’s tricky to check
directly in the source (we do check in BuildSystem) because there
is no such thing as PETSC_PKG_MKL_VERSION_LT, but I guess we can
change if defined(PETSC_HAVE_MKL) to if defined(PETSC_HAVE_MKL) &&
defined(PETSC_HAVE_MKL_SPARSE_OPTIMIZE), I’ll make a MR, thanks
for reporting this.
For the other issues, I’m sensing this is a problem with gomp +
intel_gnu_thread, but this is pure speculation… sorry.
I’ll try to reproduce some of these problems if you are not given
a more meaningful answer.
Thanks,
Pierre
On 2 Mar 2021, at 9:14 PM, Eric Chamberland
<[email protected]
<mailto:[email protected]>> wrote:
Hi,
It all started when I wanted to test PETSC/CUDA compatibility for
our code.
I had to activate --with-openmp to configure with --with-cuda=1
successfully.
I then saw that PETSC_HAVE_OPENMP is used at least in MUMPS (and
some other places).
So, I configured and tested petsc with openmp activated, without
CUDA.
The first thing I see is that our code CI pipelines now fails for
many tests.
After looking deeper, it seems that PETSc itself fails many tests
when I activate openmp!
Here are all the configurations I have results for, after/before
activating OpenMP for PETSc:
==============================================================================
==============================================================================
For petsc/master + OpenMPI 4.0.4 + MKL 2019.4.243:
With OpenMP=1
https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_make_test.log
https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.03.02.02h00m02s_configure.log
# -------------
# Summary
# -------------
# FAILED snes_tutorials-ex12_quad_hpddm_reuse_baij
diff-ksp_ksp_tests-ex33_superlu_dist_2
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
ksp_ksp_tutorials-ex50_tut_2 diff-ksp_ksp_tests-ex33_superlu_dist
diff-snes_tutorials-ex56_hypre snes_tutorials-ex17_3d_q3_trig_elas
snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
ksp_ksp_tutorials-ex5_superlu_dist_3 ksp_ksp_tutorials-ex5f_superlu_dist
snes_tutorials-ex12_tri_parmetis_hpddm_baij diff-snes_tutorials-ex19_tut_3
mat_tests-ex242_3 snes_tutorials-ex17_3d_q3_trig_vlap
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex19_superlu_dist
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
diff-ksp_ksp_tutorials-ex49_hypre_nullspace ts_tutorials-ex18_p1p1_xper_ref
ts_tutorials-ex18_p1p1_xyper_ref snes_tutorials-ex19_superlu_dist_2
ksp_ksp_tutorials-ex5_superlu_dist_2
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
ksp_ksp_tutorials-ex64_1 ksp_ksp_tutorials-ex5_superlu_dist
ksp_ksp_tutorials-ex5f_superlu_dist_2
# success 8275/10003 tests (82.7%)
#*failed 33/10003* tests (0.3%)
With OpenMP=0
https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_make_test.log
https://giref.ulaval.ca/~cmpgiref/petsc-master-debug/2021.02.26.02h00m16s_configure.log
# -------------
# Summary
# -------------
# FAILED tao_constrained_tutorials-tomographyADMM_6
snes_tutorials-ex17_3d_q3_trig_elas mat_tests-ex242_3
snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1
tao_constrained_tutorials-tomographyADMM_5
# success 8262/9983 tests (82.8%)
#*failed 6/9983* tests (0.1%)
==============================================================================
==============================================================================
For OpenMPI 3.1.x/master:
With OpenMP=1:
https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_make_test.log
https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.03.01.22h00m01s_configure.log
# -------------
# Summary
# -------------
# FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1
diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex5_superlu_dist_3
diff-ksp_ksp_tutorials-ex49_hypre_nullspace
ksp_ksp_tutorials-ex5f_superlu_dist_3 snes_tutorials-ex17_3d_q3_trig_vlap
diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
diff-snes_tutorials-ex19_tut_3 diff-snes_tutorials-ex56_hypre
diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
tao_leastsquares_tutorials-tomography_1
tao_constrained_tutorials-tomographyADMM_4
tao_constrained_tutorials-tomographyADMM_6 diff-tao_constrained_tutorials-toyf_1
# success 8142/9765 tests (83.4%)
#*failed 16/9765* tests (0.2%)
With OpenMP=0:
https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_make_test.log
https://giref.ulaval.ca/~cmpgiref/ompi_3.x/2021.02.28.22h00m02s_configure.log
# -------------
# Summary
# -------------
# FAILED mat_tests-ex242_3 mat_tests-ex242_2 diff-mat_tests-ex219f_1
diff-dm_tutorials-ex11f90_1 ksp_ksp_tutorials-ex56_2
snes_tutorials-ex17_3d_q3_trig_vlap tao_leastsquares_tutorials-tomography_1
tao_constrained_tutorials-tomographyADMM_4 diff-tao_constrained_tutorials-toyf_1
# success 8151/9767 tests (83.5%)
#*failed 9/9767* tests (0.1%)
==============================================================================
==============================================================================
For OpenMPI 4.0.x/master:
With OpenMP=1:
https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_make_test.log
https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.03.01.20h00m01s_configure.log
# FAILED snes_tutorials-ex17_3d_q3_trig_elas snes_tutorials-ex19_hypre
ksp_ksp_tutorials-ex56_2 tao_leastsquares_tutorials-tomography_1
tao_constrained_tutorials-tomographyADMM_5 mat_tests-ex242_3
ksp_ksp_tutorials-ex55_hypre ksp_ksp_tutorials-ex5_superlu_dist_2
tao_constrained_tutorials-tomographyADMM_6 snes_tutorials-ex56_hypre
snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
ksp_ksp_tutorials-ex5f_superlu_dist_3 ksp_ksp_tutorials-ex34_hyprestruct
diff-ksp_ksp_tutorials-ex49_hypre_nullspace
snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
ksp_ksp_tutorials-ex5f_superlu_dist ksp_ksp_tutorials-ex5f_superlu_dist_2
ksp_ksp_tutorials-ex5_superlu_dist snes_tutorials-ex19_tut_3
snes_tutorials-ex19_superlu_dist ksp_ksp_tutorials-ex50_tut_2
snes_tutorials-ex17_3d_q3_trig_vlap ksp_ksp_tutorials-ex5_superlu_dist_3
snes_tutorials-ex19_superlu_dist_2 tao_constrained_tutorials-tomographyADMM_4
ts_tutorials-ex26_2
# success 8125/9753 tests (83.3%)
#*failed 26/9753* tests (0.3%)
With OpenMP=0
https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_make_test.log
https://giref.ulaval.ca/~cmpgiref/ompi_4.x/2021.02.28.20h00m04s_configure.log
# FAILED mat_tests-ex242_3
# success 8174/9777 tests (83.6%)
#*failed 1/9777* tests (0.0%)
==============================================================================
==============================================================================
Is that known and normal?
In all cases, I am using MKL and I suspect it may come from
there... :/
I also saw a second problem, "make test" fails to compile petsc
examples on older versions of MKL (but that's less important for
me, I just upgraded to OneAPI to avoid this, but you may want to
know):
https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_make_test.log
https://giref.ulaval.ca/~cmpgiref/dernier_ompi/2021.03.02.02h16m01s_configure.log
Thanks,
Eric
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
(418) 656-2131 poste 41 22 42