Eric,
How are things going on this OpenMP front? Any bug fixes from hypre or
SuperLU_DIST?
BTW: we have upgraded to OpenMPI 4.1 perhaps this resolves some issues?
Barry
> On Mar 22, 2021, at 2:07 PM, Eric Chamberland
> <[email protected]> wrote:
>
> I added some information here:
>
> https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719
> <https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719>
> Maybe someone can say more than I on what PETSc tries to do with the 2
> mentioned tutorials that are timing out...
>
> Thanks,
>
> Eric
>
>
>
> On 2021-03-15 11:31 a.m., Eric Chamberland wrote:
>> Reported timeout bugs to SuperLU_dist too:
>>
>> https://github.com/xiaoyeli/superlu_dist/issues/69
>> <https://github.com/xiaoyeli/superlu_dist/issues/69>
>> Eric
>>
>>
>>
>> On 2021-03-14 2:18 p.m., Eric Chamberland wrote:
>>> Done:
>>>
>>> https://github.com/hypre-space/hypre/issues/303
>>> <https://github.com/hypre-space/hypre/issues/303>
>>> Maybe I will need some help about PETSc to answer their questions...
>>>
>>> Eric
>>>
>>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
>>>> Eric
>>>>
>>>> You should report these HYPRE issues upstream
>>>> https://github.com/hypre-space/hypre/issues
>>>> <https://github.com/hypre-space/hypre/issues>
>>>>
>>>>
>>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland
>>>>> <[email protected]
>>>>> <mailto:[email protected]>> wrote:
>>>>>
>>>>> For us it clearly creates problems in real computations...
>>>>>
>>>>> I understand the need to have clean test for PETSc, but for me, it
>>>>> reveals that hypre isn't usable with more than one thread for now...
>>>>>
>>>>> Another solution: force single-threaded configuration for hypre until
>>>>> this is fixed?
>>>>>
>>>>> Eric
>>>>>
>>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
>>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi =>
>>>>>> Linear solve did not converge due to DIVERGED_INDEFINITE_PC iterations
>>>>>> 3
>>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi =>
>>>>>> OK, independently of the architecture it seems (Eric Docker image with 1
>>>>>> or 2 threads or my macOS), but contraction factor is higher
>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 8
>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 24
>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 26
>>>>>> v. currently
>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 7
>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 9
>>>>>> Linear solve converged due to CONVERGED_RTOL iterations 10
>>>>>>
>>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make test?
>>>>>>
>>>>>> Thanks,
>>>>>> Pierre
>>>>>>
>>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <[email protected]
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>> Hypre uses a multiplicative smoother by default. It has a chebyshev
>>>>>>> smoother. That with a Jacobi PC should be thread invariant.
>>>>>>> Mark
>>>>>>>
>>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <[email protected]
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>
>>>>>>>> On 13 Mar 2021, at 9:17 AM, Pierre Jolivet <[email protected]
>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>
>>>>>>>> Hello Eric,
>>>>>>>> I’ve made an “interesting” discovery, so I’ll put back the list in c/c.
>>>>>>>> It appears the following snippet of code which uses Allreduce() +
>>>>>>>> lambda function + MPI_IN_PLACE is:
>>>>>>>> - Valgrind-clean with MPICH;
>>>>>>>> - Valgrind-clean with OpenMPI 4.0.5;
>>>>>>>> - not Valgrind-clean with OpenMPI 4.1.0.
>>>>>>>> I’m not sure who is to blame here, I’ll need to look at the MPI
>>>>>>>> specification for what is required by the implementors and users in
>>>>>>>> that case.
>>>>>>>>
>>>>>>>> In the meantime, I’ll do the following:
>>>>>>>> - update config/BuildSystem/config/packages/OpenMPI.py to use OpenMPI
>>>>>>>> 4.1.0, see if any other error appears;
>>>>>>>> - provide a hotfix to bypass the segfaults;
>>>>>>>
>>>>>>> I can confirm that splitting the single Allreduce with my own MPI_Op
>>>>>>> into two Allreduce with MAX and BAND fixes the segfaults with OpenMPI
>>>>>>> (*).
>>>>>>>
>>>>>>>> - look at the hypre issue and whether they should be deferred to the
>>>>>>>> hypre team.
>>>>>>>
>>>>>>> I don’t know if there is something wrong in hypre threading or if it’s
>>>>>>> just a side effect of threading, but it seems that the number of
>>>>>>> threads has a drastic effect on the quality of the PC.
>>>>>>> By default, it looks that there are two threads per process with your
>>>>>>> Docker image.
>>>>>>> If I force OMP_NUM_THREADS=1, then I get the same convergence as in the
>>>>>>> output file.
>>>>>>>
>>>>>>> Thanks,
>>>>>>> Pierre
>>>>>>>
>>>>>>> (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712
>>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
>>>>>>>> Thank you for the Docker files, they were really useful.
>>>>>>>> If you want to avoid oversubscription failures, you can edit the file
>>>>>>>> /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and append the line:
>>>>>>>> localhost slots=12
>>>>>>>> If you want to increase the timeout limit of PETSc test suite for each
>>>>>>>> test, you can add the extra flag in your command line TIMEOUT=180
>>>>>>>> (default is 60, units are seconds).
>>>>>>>>
>>>>>>>> Thanks, I’ll ping you on GitLab when I’ve got something ready for you
>>>>>>>> to try,
>>>>>>>> Pierre
>>>>>>>>
>>>>>>>> <ompi.cxx>
>>>>>>>>
>>>>>>>>> On 12 Mar 2021, at 8:54 PM, Eric Chamberland
>>>>>>>>> <[email protected]
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>
>>>>>>>>> Hi Pierre,
>>>>>>>>>
>>>>>>>>> I now have a docker container reproducing the problems here.
>>>>>>>>>
>>>>>>>>> Actually, if I look at snes_tutorials-ex12_quad_singular_hpddm it
>>>>>>>>> fails like this:
>>>>>>>>>
>>>>>>>>> not ok snes_tutorials-ex12_quad_singular_hpddm # Error code: 59
>>>>>>>>> # Initial guess
>>>>>>>>> # L_2 Error: 0.00803099
>>>>>>>>> # Initial Residual
>>>>>>>>> # L_2 Residual: 1.09057
>>>>>>>>> # Au - b = Au + F(0)
>>>>>>>>> # Linear L_2 Residual: 1.09057
>>>>>>>>> # [d470c54ce086:14127] Read -1, expected 4096, errno = 1
>>>>>>>>> # [d470c54ce086:14128] Read -1, expected 4096, errno = 1
>>>>>>>>> # [d470c54ce086:14129] Read -1, expected 4096, errno = 1
>>>>>>>>> # [3]PETSC ERROR:
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>> # [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation
>>>>>>>>> Violation, probably memory access out of range
>>>>>>>>> # [3]PETSC ERROR: Try option -start_in_debugger or
>>>>>>>>> -on_error_attach_debugger
>>>>>>>>> # [3]PETSC ERROR: or see
>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
>>>>>>>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>>>>>>>> # [3]PETSC ERROR: or try http://valgrind.org
>>>>>>>>> <http://valgrind.org/> on GNU/linux and Apple Mac OS X to find memory
>>>>>>>>> corruption errors
>>>>>>>>> # [3]PETSC ERROR: likely location of problem given in stack
>>>>>>>>> below
>>>>>>>>> # [3]PETSC ERROR: --------------------- Stack Frames
>>>>>>>>> ------------------------------------
>>>>>>>>> # [3]PETSC ERROR: Note: The EXACT line numbers in the stack are
>>>>>>>>> not available,
>>>>>>>>> # [3]PETSC ERROR: INSTEAD the line number of the start of
>>>>>>>>> the function
>>>>>>>>> # [3]PETSC ERROR: is given.
>>>>>>>>> # [3]PETSC ERROR: [3] buildTwo line 987
>>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>> # [3]PETSC ERROR: [3] next line 1130
>>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>> # [3]PETSC ERROR: --------------------- Error Message
>>>>>>>>> --------------------------------------------------------------
>>>>>>>>> # [3]PETSC ERROR: Signal received
>>>>>>>>> # [3]PETSC ERROR: [0]PETSC ERROR:
>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>
>>>>>>>>> also ex12_quad_hpddm_reuse_baij fails with a lot more "Read -1,
>>>>>>>>> expected ..." which I don't know where they come from...?
>>>>>>>>>
>>>>>>>>> Hypre (like in diff-snes_tutorials-ex56_hypre) is also having
>>>>>>>>> DIVERGED_INDEFINITE_PC failures...
>>>>>>>>>
>>>>>>>>> Please see the 3 attached docker files:
>>>>>>>>>
>>>>>>>>> 1) fedora_mkl_and_devtools : the DockerFile which install fedore 33
>>>>>>>>> with gnu compilers and MKL and everything to develop.
>>>>>>>>>
>>>>>>>>> 2) openmpi: the DockerFile to bluid OpenMPI
>>>>>>>>>
>>>>>>>>> 3) petsc: The las DockerFile that build/install and test PETSc
>>>>>>>>>
>>>>>>>>> I build the 3 like this:
>>>>>>>>>
>>>>>>>>> docker build -t fedora_mkl_and_devtools -f fedora_mkl_and_devtools .
>>>>>>>>>
>>>>>>>>> docker build -t openmpi -f openmpi .
>>>>>>>>>
>>>>>>>>> docker build -t petsc -f petsc .
>>>>>>>>>
>>>>>>>>> Disclaimer: I am not a docker expert, so I may do things that are not
>>>>>>>>> docker-stat-of-the-art but I am opened to suggestions... ;)
>>>>>>>>>
>>>>>>>>> I have just ran it on my portable (long) which have not enough cores,
>>>>>>>>> so many more tests failed (should force --oversubscribe but don't
>>>>>>>>> know how to). I will relaunch on my workstation in a few minutes.
>>>>>>>>>
>>>>>>>>> I will now test your branch! (sorry for the delay).
>>>>>>>>>
>>>>>>>>> Thanks,
>>>>>>>>>
>>>>>>>>> Eric
>>>>>>>>>
>>>>>>>>> On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
>>>>>>>>>> Hi Pierre,
>>>>>>>>>>
>>>>>>>>>> ok, that's interesting!
>>>>>>>>>>
>>>>>>>>>> I will try to build a docker image until tomorrow and give you the
>>>>>>>>>> exact recipe to reproduce the bugs.
>>>>>>>>>>
>>>>>>>>>> Eric
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>>>> On 11 Mar 2021, at 6:16 AM, Barry Smith <[email protected]
>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>> Eric,
>>>>>>>>>>>>
>>>>>>>>>>>> Sorry about not being more immediate. We still have this in our
>>>>>>>>>>>> active email so you don't need to submit individual issues. We'll
>>>>>>>>>>>> try to get to them as soon
>>>>>>>>>>>> as we can.
>>>>>>>>>>>
>>>>>>>>>>> Indeed, I’m still trying to figure this out.
>>>>>>>>>>> I realized that some of my configure flags were different than
>>>>>>>>>>> yours, e.g., no --with-memalign.
>>>>>>>>>>> I’ve also added SuperLU_DIST to my installation.
>>>>>>>>>>> Still, I can’t reproduce any issue.
>>>>>>>>>>> I will continue looking into this, it appears I’m seeing some
>>>>>>>>>>> valgrind errors, but I don’t know if this is some side effect of
>>>>>>>>>>> OpenMPI not being valgrind-clean (last time I checked, there was no
>>>>>>>>>>> error with MPICH).
>>>>>>>>>>>
>>>>>>>>>>> Thank you for your patience,
>>>>>>>>>>> Pierre
>>>>>>>>>>>
>>>>>>>>>>> /usr/bin/gmake -f gmakefile test test-fail=1
>>>>>>>>>>> Using MAKEFLAGS: test-fail=1
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>> ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>>>>>>>>>> ok ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>>>>>>>>>>> ok ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>> ok diff-ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>>>>>>>>>> ok snes_tutorials-ex56_hypre
>>>>>>>>>>> ok diff-snes_tutorials-ex56_hypre
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>>>>>>>>>>> ok ksp_ksp_tutorials-ex56_2
>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex56_2
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>>>>>>>>>> ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>> ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
>>>>>>>>>>> # srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>> processors requested than permitted
>>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command failed so
>>>>>>>>>>> no diff
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran required for
>>>>>>>>>>> this test
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>>>>>>>>>> ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>> ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>>>>>>>>>> ok snes_tutorials-ex19_tut_3
>>>>>>>>>>> ok diff-snes_tutorials-ex19_tut_3
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>>>>>>>>>> ok snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>> ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran required
>>>>>>>>>>> for this test
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist
>>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>>>>>>>>>> ok snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>> ok
>>>>>>>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>>>>>>>>>> ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>>>>>>>>>>> ok snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>> ok diff-snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
>>>>>>>>>>> # srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>> processors requested than permitted
>>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command failed so
>>>>>>>>>>> no diff
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>>>>>>>>>> ok snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>> ok
>>>>>>>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>>>>>>>>>> ok ksp_ksp_tutorials-ex64_1
>>>>>>>>>>> ok diff-ksp_ksp_tutorials-ex64_1
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
>>>>>>>>>>> # srun: error: Unable to create step for job 1426755: More
>>>>>>>>>>> processors requested than permitted
>>>>>>>>>>> ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command failed so no
>>>>>>>>>>> diff
>>>>>>>>>>> TEST
>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>>>>>>>>>>> ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran required
>>>>>>>>>>> for this test
>>>>>>>>>>>
>>>>>>>>>>>> Barry
>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>>> On Mar 10, 2021, at 11:03 PM, Eric Chamberland
>>>>>>>>>>>>> <[email protected]
>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>
>>>>>>>>>>>>> Barry,
>>>>>>>>>>>>>
>>>>>>>>>>>>> to get a some follow up on --with-openmp=1 failures, shall I open
>>>>>>>>>>>>> gitlab issues for:
>>>>>>>>>>>>>
>>>>>>>>>>>>> a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>>>>>>>>>>>>
>>>>>>>>>>>>> b) all superlu_dist failures giving different results with initia
>>>>>>>>>>>>> and "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>
>>>>>>>>>>>>> c) hpddm failures "free(): invalid next size (fast)" and
>>>>>>>>>>>>> "Segmentation Violation"
>>>>>>>>>>>>>
>>>>>>>>>>>>> d) all tao's "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>
>>>>>>>>>>>>> I don't see how I could do all these debugging by myself...
>>>>>>>>>>>>>
>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>
>>>>>>>>>>>>> Eric
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>
>>>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>>>>> Professionnel de recherche
>>>>>>>>>> GIREF/Université Laval
>>>>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>>>> --
>>>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>>>> Professionnel de recherche
>>>>>>>>> GIREF/Université Laval
>>>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>>>> <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
>>>>>>>>
>>>>>>>
>>>>>>
>>>>> --
>>>>> Eric Chamberland, ing., M. Ing
>>>>> Professionnel de recherche
>>>>> GIREF/Université Laval
>>>>> (418) 656-2131 poste 41 22 42
>>>>
>>> --
>>> Eric Chamberland, ing., M. Ing
>>> Professionnel de recherche
>>> GIREF/Université Laval
>>> (418) 656-2131 poste 41 22 42
>> --
>> Eric Chamberland, ing., M. Ing
>> Professionnel de recherche
>> GIREF/Université Laval
>> (418) 656-2131 poste 41 22 42
> --
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42