On Thu, Mar 18, 2021 at 11:51 PM Jed Brown <[email protected]> wrote:
> Note that this is specific to the node numbering, and that node numbering > tends to produce poor results even for MatMult due to poor cache reuse of > the vector. It's good practice after partitioning to use a > locality-preserving ordering of dofs on a process (e.g., RCM if you use > MatOrdering). This was shown in the PETSc-FUN3D papers circa 1999 and has > been confirmed multiple times over the years by various members of this > list (including me). I believe FEniCS and libMesh now do this by default > (or at least have an option) and it was shown to perform better. It's a > notable weakness of DMPlex that it does not apply such an ordering of dofs > and I've complained to Matt about it many times over the years, but any > blame rests solely with me for not carving out time to implement it here. > Jesus. Of course Plex can do this. It is the default for PyLith. Less complaining, more looking. Matt > Better SGS/SOR smoothing factors with simple OpenMP partitioning is an > additional bonus, though I'm not a fan of using OpenMP in this way. > > Eric Chamberland <[email protected]> writes: > > > Hi, > > > > For the knowledge of readers, I just read section 7.3 here: > > > > > https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing > > > > And it is explained why multi-threading gives a poor result with the > > Hybrid−SGS smoother... > > > > Eric > > > > > > On 2021-03-15 2:50 p.m., Barry Smith wrote: > >> > >> I posted some information at the issue. > >> > >> IMHO it is likely a bug in one or more of hypre's smoothers that > >> use OpenMP. We have never tested them before (and likely hypre has not > >> tested all the combinations) and so would not have seen the bug. > >> Hopefully they can just fix it. > >> > >> Barry > >> > >> I got the problem to occur with ex56 with 2 MPI ranks and 4 OpenMP > >> threads, if I used less than 4 threads it did not generate an > >> indefinite preconditioner. > >> > >> > >>> On Mar 14, 2021, at 1:18 PM, Eric Chamberland > >>> <[email protected] > >>> <mailto:[email protected]>> wrote: > >>> > >>> Done: > >>> > >>> https://github.com/hypre-space/hypre/issues/303 > >>> > >>> Maybe I will need some help about PETSc to answer their questions... > >>> > >>> Eric > >>> > >>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote: > >>>> Eric > >>>> > >>>> You should report these HYPRE issues upstream > >>>> https://github.com/hypre-space/hypre/issues > >>>> <https://github.com/hypre-space/hypre/issues> > >>>> > >>>> > >>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland > >>>>> <[email protected] > >>>>> <mailto:[email protected]>> wrote: > >>>>> > >>>>> For us it clearly creates problems in real computations... > >>>>> > >>>>> I understand the need to have clean test for PETSc, but for me, it > >>>>> reveals that hypre isn't usable with more than one thread for now... > >>>>> > >>>>> Another solution: force single-threaded configuration for hypre > >>>>> until this is fixed? > >>>>> > >>>>> Eric > >>>>> > >>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote: > >>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi => > >>>>>> Linear solve did not converge due to DIVERGED_INDEFINITE_PC > >>>>>> iterations 3 > >>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi => > >>>>>> OK, independently of the architecture it seems (Eric Docker image > >>>>>> with 1 or 2 threads or my macOS), but contraction factor is higher > >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 8 > >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 24 > >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 26 > >>>>>> v. currently > >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 7 > >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 9 > >>>>>> Linear solve converged due to CONVERGED_RTOL iterations 10 > >>>>>> > >>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make > test? > >>>>>> > >>>>>> Thanks, > >>>>>> Pierre > >>>>>> > >>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <[email protected] > >>>>>>> <mailto:[email protected]>> wrote: > >>>>>>> > >>>>>>> Hypre uses a multiplicative smoother by default. It has a > >>>>>>> chebyshev smoother. That with a Jacobi PC should be thread > >>>>>>> invariant. > >>>>>>> Mark > >>>>>>> > >>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <[email protected] > >>>>>>> <mailto:[email protected]>> wrote: > >>>>>>> > >>>>>>> > >>>>>>>> On 13 Mar 2021, at 9:17 AM, Pierre Jolivet <[email protected] > >>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>> > >>>>>>>> Hello Eric, > >>>>>>>> I’ve made an “interesting” discovery, so I’ll put back the > >>>>>>>> list in c/c. > >>>>>>>> It appears the following snippet of code which uses > >>>>>>>> Allreduce() + lambda function + MPI_IN_PLACE is: > >>>>>>>> - Valgrind-clean with MPICH; > >>>>>>>> - Valgrind-clean with OpenMPI 4.0.5; > >>>>>>>> - not Valgrind-clean with OpenMPI 4.1.0. > >>>>>>>> I’m not sure who is to blame here, I’ll need to look at the > >>>>>>>> MPI specification for what is required by the implementors > >>>>>>>> and users in that case. > >>>>>>>> > >>>>>>>> In the meantime, I’ll do the following: > >>>>>>>> - update config/BuildSystem/config/packages/OpenMPI.py to > >>>>>>>> use OpenMPI 4.1.0, see if any other error appears; > >>>>>>>> - provide a hotfix to bypass the segfaults; > >>>>>>> > >>>>>>> I can confirm that splitting the single Allreduce with my own > >>>>>>> MPI_Op into two Allreduce with MAX and BAND fixes the > >>>>>>> segfaults with OpenMPI (*). > >>>>>>> > >>>>>>>> - look at the hypre issue and whether they should be > >>>>>>>> deferred to the hypre team. > >>>>>>> > >>>>>>> I don’t know if there is something wrong in hypre threading > >>>>>>> or if it’s just a side effect of threading, but it seems that > >>>>>>> the number of threads has a drastic effect on the quality of > >>>>>>> the PC. > >>>>>>> By default, it looks that there are two threads per process > >>>>>>> with your Docker image. > >>>>>>> If I force OMP_NUM_THREADS=1, then I get the same convergence > >>>>>>> as in the output file. > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Pierre > >>>>>>> > >>>>>>> (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712 > >>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3712> > >>>>>>> > >>>>>>>> Thank you for the Docker files, they were really useful. > >>>>>>>> If you want to avoid oversubscription failures, you can edit > >>>>>>>> the file /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and > >>>>>>>> append the line: > >>>>>>>> localhost slots=12 > >>>>>>>> If you want to increase the timeout limit of PETSc test > >>>>>>>> suite for each test, you can add the extra flag in your > >>>>>>>> command line TIMEOUT=180 (default is 60, units are seconds). > >>>>>>>> > >>>>>>>> Thanks, I’ll ping you on GitLab when I’ve got something > >>>>>>>> ready for you to try, > >>>>>>>> Pierre > >>>>>>>> > >>>>>>>> <ompi.cxx> > >>>>>>>> > >>>>>>>>> On 12 Mar 2021, at 8:54 PM, Eric Chamberland > >>>>>>>>> <[email protected] > >>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>> > >>>>>>>>> Hi Pierre, > >>>>>>>>> > >>>>>>>>> I now have a docker container reproducing the problems here. > >>>>>>>>> > >>>>>>>>> Actually, if I look at > >>>>>>>>> snes_tutorials-ex12_quad_singular_hpddm it fails like this: > >>>>>>>>> > >>>>>>>>> not ok snes_tutorials-ex12_quad_singular_hpddm # Error code: > 59 > >>>>>>>>> # Initial guess > >>>>>>>>> # L_2 Error: 0.00803099 > >>>>>>>>> # Initial Residual > >>>>>>>>> # L_2 Residual: 1.09057 > >>>>>>>>> # Au - b = Au + F(0) > >>>>>>>>> # Linear L_2 Residual: 1.09057 > >>>>>>>>> # [d470c54ce086:14127] Read -1, expected 4096, errno = 1 > >>>>>>>>> # [d470c54ce086:14128] Read -1, expected 4096, errno = 1 > >>>>>>>>> # [d470c54ce086:14129] Read -1, expected 4096, errno = 1 > >>>>>>>>> # [3]PETSC ERROR: > >>>>>>>>> > ------------------------------------------------------------------------ > >>>>>>>>> # [3]PETSC ERROR: Caught signal number 11 SEGV: > >>>>>>>>> Segmentation Violation, probably memory access out of range > >>>>>>>>> # [3]PETSC ERROR: Try option -start_in_debugger or > >>>>>>>>> -on_error_attach_debugger > >>>>>>>>> # [3]PETSC ERROR: or see > >>>>>>>>> > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind > >>>>>>>>> < > https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind> > >>>>>>>>> # [3]PETSC ERROR: or try http://valgrind.org > >>>>>>>>> <http://valgrind.org/> on GNU/linux and Apple Mac OS X to > >>>>>>>>> find memory corruption errors > >>>>>>>>> # [3]PETSC ERROR: likely location of problem given in stack > >>>>>>>>> below > >>>>>>>>> # [3]PETSC ERROR: --------------------- Stack Frames > >>>>>>>>> ------------------------------------ > >>>>>>>>> # [3]PETSC ERROR: Note: The EXACT line numbers in the stack > >>>>>>>>> are not available, > >>>>>>>>> # [3]PETSC ERROR: INSTEAD the line number of the start of > >>>>>>>>> the function > >>>>>>>>> # [3]PETSC ERROR: is given. > >>>>>>>>> # [3]PETSC ERROR: [3] buildTwo line 987 > >>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp > >>>>>>>>> # [3]PETSC ERROR: [3] next line 1130 > >>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp > >>>>>>>>> # [3]PETSC ERROR: --------------------- Error Message > >>>>>>>>> > -------------------------------------------------------------- > >>>>>>>>> # [3]PETSC ERROR: Signal received > >>>>>>>>> # [3]PETSC ERROR: [0]PETSC ERROR: > >>>>>>>>> > ------------------------------------------------------------------------ > >>>>>>>>> > >>>>>>>>> also ex12_quad_hpddm_reuse_baij fails with a lot more "Read > >>>>>>>>> -1, expected ..." which I don't know where they come from...? > >>>>>>>>> > >>>>>>>>> Hypre (like in diff-snes_tutorials-ex56_hypre) is also > >>>>>>>>> having DIVERGED_INDEFINITE_PC failures... > >>>>>>>>> > >>>>>>>>> Please see the 3 attached docker files: > >>>>>>>>> > >>>>>>>>> 1) fedora_mkl_and_devtools : the DockerFile which install > >>>>>>>>> fedore 33 with gnu compilers and MKL and everything to > develop. > >>>>>>>>> > >>>>>>>>> 2) openmpi: the DockerFile to bluid OpenMPI > >>>>>>>>> > >>>>>>>>> 3) petsc: The las DockerFile that build/install and test > PETSc > >>>>>>>>> > >>>>>>>>> I build the 3 like this: > >>>>>>>>> > >>>>>>>>> docker build -t fedora_mkl_and_devtools -f > >>>>>>>>> fedora_mkl_and_devtools . > >>>>>>>>> > >>>>>>>>> docker build -t openmpi -f openmpi . > >>>>>>>>> > >>>>>>>>> docker build -t petsc -f petsc . > >>>>>>>>> > >>>>>>>>> Disclaimer: I am not a docker expert, so I may do things > >>>>>>>>> that are not docker-stat-of-the-art but I am opened to > >>>>>>>>> suggestions... ;) > >>>>>>>>> > >>>>>>>>> I have just ran it on my portable (long) which have not > >>>>>>>>> enough cores, so many more tests failed (should force > >>>>>>>>> --oversubscribe but don't know how to). I will relaunch on > >>>>>>>>> my workstation in a few minutes. > >>>>>>>>> > >>>>>>>>> I will now test your branch! (sorry for the delay). > >>>>>>>>> > >>>>>>>>> Thanks, > >>>>>>>>> > >>>>>>>>> Eric > >>>>>>>>> > >>>>>>>>> On 2021-03-11 9:03 a.m., Eric Chamberland wrote: > >>>>>>>>>> > >>>>>>>>>> Hi Pierre, > >>>>>>>>>> > >>>>>>>>>> ok, that's interesting! > >>>>>>>>>> > >>>>>>>>>> I will try to build a docker image until tomorrow and give > >>>>>>>>>> you the exact recipe to reproduce the bugs. > >>>>>>>>>> > >>>>>>>>>> Eric > >>>>>>>>>> > >>>>>>>>>> > >>>>>>>>>> On 2021-03-11 2:46 a.m., Pierre Jolivet wrote: > >>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>>>> On 11 Mar 2021, at 6:16 AM, Barry Smith > >>>>>>>>>>>> <[email protected] <mailto:[email protected]>> wrote: > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>> Eric, > >>>>>>>>>>>> > >>>>>>>>>>>> Sorry about not being more immediate. We still have > >>>>>>>>>>>> this in our active email so you don't need to submit > >>>>>>>>>>>> individual issues. We'll try to get to them as soon as > >>>>>>>>>>>> we can. > >>>>>>>>>>> > >>>>>>>>>>> Indeed, I’m still trying to figure this out. > >>>>>>>>>>> I realized that some of my configure flags were different > >>>>>>>>>>> than yours, e.g., no --with-memalign. > >>>>>>>>>>> I’ve also added SuperLU_DIST to my installation. > >>>>>>>>>>> Still, I can’t reproduce any issue. > >>>>>>>>>>> I will continue looking into this, it appears I’m seeing > >>>>>>>>>>> some valgrind errors, but I don’t know if this is some > >>>>>>>>>>> side effect of OpenMPI not being valgrind-clean (last > >>>>>>>>>>> time I checked, there was no error with MPICH). > >>>>>>>>>>> > >>>>>>>>>>> Thank you for your patience, > >>>>>>>>>>> Pierre > >>>>>>>>>>> > >>>>>>>>>>> /usr/bin/gmake -f gmakefile test test-fail=1 > >>>>>>>>>>> Using MAKEFLAGS: test-fail=1 > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts > >>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_baij > >>>>>>>>>>> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts > >>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist_2 > >>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist_2 > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0 > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1 > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0 > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1 > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0 > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1 > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0 > >>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 > >>>>>>>>>>> ok > diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1 > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts > >>>>>>>>>>> ok ksp_ksp_tutorials-ex50_tut_2 > >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex50_tut_2 > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts > >>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist > >>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts > >>>>>>>>>>> ok snes_tutorials-ex56_hypre > >>>>>>>>>>> ok diff-snes_tutorials-ex56_hypre > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts > >>>>>>>>>>> ok ksp_ksp_tutorials-ex56_2 > >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex56_2 > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts > >>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_elas > >>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_elas > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts > >>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij > >>>>>>>>>>> ok > diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts > >>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1 > >>>>>>>>>>> #srun: error: Unable to create step for job 1426755: More > >>>>>>>>>>> processors requested than permitted > >>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command > >>>>>>>>>>> failed so no diff > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts > >>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran > >>>>>>>>>>> required for this test > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts > >>>>>>>>>>> ok snes_tutorials-ex12_tri_parmetis_hpddm_baij > >>>>>>>>>>> ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts > >>>>>>>>>>> ok snes_tutorials-ex19_tut_3 > >>>>>>>>>>> ok diff-snes_tutorials-ex19_tut_3 > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts > >>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_vlap > >>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_vlap > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts > >>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran > >>>>>>>>>>> required for this test > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts > >>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist > >>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts > >>>>>>>>>>> ok > >>>>>>>>>>> > snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre > >>>>>>>>>>> ok > >>>>>>>>>>> > diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts > >>>>>>>>>>> ok ksp_ksp_tutorials-ex49_hypre_nullspace > >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts > >>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist_2 > >>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist_2 > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts > >>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1 > >>>>>>>>>>> #srun: error: Unable to create step for job 1426755: More > >>>>>>>>>>> processors requested than permitted > >>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command > >>>>>>>>>>> failed so no diff > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts > >>>>>>>>>>> ok > >>>>>>>>>>> > snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre > >>>>>>>>>>> ok > >>>>>>>>>>> > diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts > >>>>>>>>>>> ok ksp_ksp_tutorials-ex64_1 > >>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex64_1 > >>>>>>>>>>> TEST > >>>>>>>>>>> > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts > >>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1 > >>>>>>>>>>> #srun: error: Unable to create step for job 1426755: More > >>>>>>>>>>> processors requested than permitted > >>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command > >>>>>>>>>>> failed so no diff > >>>>>>>>>>> TEST > >>>>>>>>>>> > > arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts > >>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran > >>>>>>>>>>> required for this test > >>>>>>>>>>> > >>>>>>>>>>>> Barry > >>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>>>> On Mar 10, 2021, at 11:03 PM, Eric Chamberland > >>>>>>>>>>>>> <[email protected] > >>>>>>>>>>>>> <mailto:[email protected]>> wrote: > >>>>>>>>>>>>> > >>>>>>>>>>>>> Barry, > >>>>>>>>>>>>> > >>>>>>>>>>>>> to get a some follow up on --with-openmp=1 failures, > >>>>>>>>>>>>> shall I open gitlab issues for: > >>>>>>>>>>>>> > >>>>>>>>>>>>> a) all hypre failures giving DIVERGED_INDEFINITE_PC > >>>>>>>>>>>>> > >>>>>>>>>>>>> b) all superlu_dist failures giving different results > >>>>>>>>>>>>> with initia and "Exceeded timeout limit of 60 s" > >>>>>>>>>>>>> > >>>>>>>>>>>>> c) hpddm failures "free(): invalid next size (fast)" > >>>>>>>>>>>>> and "Segmentation Violation" > >>>>>>>>>>>>> > >>>>>>>>>>>>> d) all tao's "Exceeded timeout limit of 60 s" > >>>>>>>>>>>>> > >>>>>>>>>>>>> I don't see how I could do all these debugging by > myself... > >>>>>>>>>>>>> > >>>>>>>>>>>>> Thanks, > >>>>>>>>>>>>> > >>>>>>>>>>>>> Eric > >>>>>>>>>>>>> > >>>>>>>>>>>>> > >>>>>>>>>>>> > >>>>>>>>>>> > >>>>>>>>>> -- > >>>>>>>>>> Eric Chamberland, ing., M. Ing > >>>>>>>>>> Professionnel de recherche > >>>>>>>>>> GIREF/Université Laval > >>>>>>>>>> (418) 656-2131 poste 41 22 42 > >>>>>>>>> -- > >>>>>>>>> Eric Chamberland, ing., M. Ing > >>>>>>>>> Professionnel de recherche > >>>>>>>>> GIREF/Université Laval > >>>>>>>>> (418) 656-2131 poste 41 22 42 > >>>>>>>>> <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt> > >>>>>>>> > >>>>>>> > >>>>>> > >>>>> -- > >>>>> Eric Chamberland, ing., M. Ing > >>>>> Professionnel de recherche > >>>>> GIREF/Université Laval > >>>>> (418) 656-2131 poste 41 22 42 > >>>> > >>> -- > >>> Eric Chamberland, ing., M. Ing > >>> Professionnel de recherche > >>> GIREF/Université Laval > >>> (418) 656-2131 poste 41 22 42 > >> > > -- > > Eric Chamberland, ing., M. Ing > > Professionnel de recherche > > GIREF/Université Laval > > (418) 656-2131 poste 41 22 42 > -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>
