Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Barry Smith Wed, 31 Mar 2021 18:32:58 -0700


> On Mar 30, 2021, at 10:18 PM, Eric Chamberland 
> <[email protected]> wrote:
> 
> Hi Barry,
> 
> Here is what I have:
> 
> 1. The hpddm issues have been all solved (you can't see no more hpddm 
> failures here: 
> https://giref.ulaval.ca/~cmpgiref/petsc-main-debug/2021.03.29.02h00m02s_make_test.log
>  
> <https://giref.ulaval.ca/~cmpgiref/petsc-main-debug/2021.03.29.02h00m02s_make_test.log>)
> 
> 
  Great.


> 2. For Hypre, I think it is indeed not a bug but a feature, as far as I can 
> see what has been told on the hypre discussion
> 
> list it is said " It still depends on the number of threads, that can’t be 
> avoided" ( 
> https://github.com/hypre-space/hypre/issues/303#issuecomment-800442755 
> <https://github.com/hypre-space/hypre/issues/303#issuecomment-800442755> ) 
> 

  This is nonsense, they know better. Sure the convergence "decays", but no 
longer producing a positive definite preconditioner when the problem is 
positive definite is not due to "convergence decaying" it is much more 
fundamental; they are all good numerical analysts, they know this. They are 
basically saying that if you start with a positive definite problem which 
supports use of CG but use OpenMP threading then you need to switch to GMRES. 
That is a high price to pay; I suspect there is a bug in the code or that it is 
just not designed correctly but they don't want to deal with hunting down the 
issue. 

  The point is even if the smoother does absolutely nothing to improve the 
solution (it just copies the current value) it will not make the preconditioner 
operator no longer positive definite. So my conclusion is the smoother is 
broken since it does worse than nothing.  

  Do they propose a solution? Just not use  OpenMP threading for positive 
definite problems or always use GMRES when using OpenMP?

  I am not sure what to with your (and PETSc's) test cases in this situation. I 
guess the PETSc test could switch to GMRES when hypre is using OpenMP with a 
number of threads great than 1. But kind of cumbersome and annoying. 

  Junchao and Scott have some ideas on adding OpenMP threading to our CI tests. 
If we make sure this particular problem is in their then we will need to add a 
switch to handle it.


> and here 
> https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing
>  
> <https://www.researchgate.net/publication/220411740_Multigrid_Smoothers_for_Ultraparallel_Computing>,
>  into section 7.3, we have some interesting informations, as:
> 
> Figure 7.6 clearly illustrates that convergence degrades with the addition of 
> threads for hybrid SGS; 
> 
> .... 
> 
> The 3D sphere problem is the most extreme example because AMG-CG with hybrid 
> SGS no longer converges with the addition of threading.
> 
> but I might have misunderstood since I am not an expert for that...
> 
> 3. For SuperLU_Dist, I have tried to build SuperLU_dist out of PETSc to run 
> the tests from superlu itself: sadly the bug is not showing up 
> (seehttps://github.com/xiaoyeli/superlu_dist/issues/69 
> <https://github.com/xiaoyeli/superlu_dist/issues/69>).  
> 
> I would like to build a reproducer superlu_dist example from what is done in 
> the faulty test:
> 
> ksp_ksp_tutorials-ex5
> that is buggy when called from PETSc: what bugs me, is that many other PETSc 
> tests are running fine with superlu_dist: maybe something is uniquely done in 
> ksp_ksp_tutorials-ex5 ?
> 
> So I think it worth digging into #3: the simple thing I have not yet done is 
> retreiving the stack when it fails (timeout).
> 

  I wish I had infinite time to fix these things. One could run it for a while 
until it "hangs" and then attach a debugger to the hanging process to see where 
it is. This would help determine the problem.

> And a question: when you state that you upgraded to OpenMPI 4.1 you mean for 
> one of your automated (docker?) compilation into the gitlab pipelines?
> 
> 
  Both for our testing and for our --download-openmpi configure option. I do 
not know if this is related to the problem at hand or not.

Barry


> Thanks for taking news! :)
> 
> Eric
> 
> 
> 
> On 2021-03-30 1:47 p.m., Barry Smith wrote:
>> 
>>   Eric,
>> 
>>     How are things going on this OpenMP  front? Any bug fixes from hypre or 
>> SuperLU_DIST?
>> 
>>     BTW: we have upgraded to OpenMPI 4.1 perhaps this resolves some issues?
>> 
>>    Barry
>> 
>> 
>>> On Mar 22, 2021, at 2:07 PM, Eric Chamberland 
>>> <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> I added some information here:
>>> 
>>> https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719 
>>> <https://github.com/xiaoyeli/superlu_dist/issues/69#issuecomment-804318719>
>>> Maybe someone can say more than I on what PETSc tries to do with the 2 
>>> mentioned tutorials that are timing out...
>>> 
>>> Thanks,
>>> 
>>> Eric
>>> 
>>> 
>>> 
>>> On 2021-03-15 11:31 a.m., Eric Chamberland wrote:
>>>> Reported timeout bugs to SuperLU_dist too:
>>>> 
>>>> https://github.com/xiaoyeli/superlu_dist/issues/69 
>>>> <https://github.com/xiaoyeli/superlu_dist/issues/69>
>>>> Eric
>>>> 
>>>> 
>>>> 
>>>> On 2021-03-14 2:18 p.m., Eric Chamberland wrote:
>>>>> Done:
>>>>> 
>>>>> https://github.com/hypre-space/hypre/issues/303 
>>>>> <https://github.com/hypre-space/hypre/issues/303>
>>>>> Maybe I will need some help about PETSc to answer their questions...
>>>>> 
>>>>> Eric
>>>>> 
>>>>> On 2021-03-14 3:44 a.m., Stefano Zampini wrote:
>>>>>> Eric
>>>>>> 
>>>>>> You should report these HYPRE issues upstream 
>>>>>> https://github.com/hypre-space/hypre/issues 
>>>>>> <https://github.com/hypre-space/hypre/issues>
>>>>>> 
>>>>>> 
>>>>>>> On Mar 14, 2021, at 3:44 AM, Eric Chamberland 
>>>>>>> <[email protected] 
>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>> 
>>>>>>> For us it clearly creates problems in real computations...
>>>>>>> 
>>>>>>> I understand the need to have clean test for PETSc, but for me, it 
>>>>>>> reveals that hypre isn't usable with more than one thread for now...
>>>>>>> 
>>>>>>> Another solution:  force single-threaded configuration for hypre until 
>>>>>>> this is fixed?
>>>>>>> 
>>>>>>> Eric
>>>>>>> 
>>>>>>> On 2021-03-13 8:50 a.m., Pierre Jolivet wrote:
>>>>>>>> -pc_hypre_boomeramg_relax_type_all Jacobi => 
>>>>>>>>   Linear solve did not converge due to DIVERGED_INDEFINITE_PC 
>>>>>>>> iterations 3
>>>>>>>> -pc_hypre_boomeramg_relax_type_all l1scaled-Jacobi => 
>>>>>>>> OK, independently of the architecture it seems (Eric Docker image with 
>>>>>>>> 1 or 2 threads or my macOS), but contraction factor is higher
>>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 8
>>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 24
>>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 26
>>>>>>>> v. currently
>>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 7
>>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 9
>>>>>>>>   Linear solve converged due to CONVERGED_RTOL iterations 10
>>>>>>>> 
>>>>>>>> Do we change this? Or should we force OMP_NUM_THREADS=1 for make test?
>>>>>>>> 
>>>>>>>> Thanks,
>>>>>>>> Pierre
>>>>>>>> 
>>>>>>>>> On 13 Mar 2021, at 2:26 PM, Mark Adams <[email protected] 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> 
>>>>>>>>> Hypre uses a multiplicative smoother by default. It has a chebyshev 
>>>>>>>>> smoother. That with a Jacobi PC should be thread invariant.
>>>>>>>>> Mark
>>>>>>>>> 
>>>>>>>>> On Sat, Mar 13, 2021 at 8:18 AM Pierre Jolivet <[email protected] 
>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>> 
>>>>>>>>>> On 13 Mar 2021, at 9:17 AM, Pierre Jolivet <[email protected] 
>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>> 
>>>>>>>>>> Hello Eric,
>>>>>>>>>> I’ve made an “interesting” discovery, so I’ll put back the list in 
>>>>>>>>>> c/c.
>>>>>>>>>> It appears the following snippet of code which uses Allreduce() + 
>>>>>>>>>> lambda function + MPI_IN_PLACE is:
>>>>>>>>>> - Valgrind-clean with MPICH;
>>>>>>>>>> - Valgrind-clean with OpenMPI 4.0.5;
>>>>>>>>>> - not Valgrind-clean with OpenMPI 4.1.0.
>>>>>>>>>> I’m not sure who is to blame here, I’ll need to look at the MPI 
>>>>>>>>>> specification for what is required by the implementors and users in 
>>>>>>>>>> that case.
>>>>>>>>>> 
>>>>>>>>>> In the meantime, I’ll do the following:
>>>>>>>>>> - update config/BuildSystem/config/packages/OpenMPI.py to use 
>>>>>>>>>> OpenMPI 4.1.0, see if any other error appears;
>>>>>>>>>> - provide a hotfix to bypass the segfaults;
>>>>>>>>> 
>>>>>>>>> I can confirm that splitting the single Allreduce with my own MPI_Op 
>>>>>>>>> into two Allreduce with MAX and BAND fixes the segfaults with OpenMPI 
>>>>>>>>> (*).
>>>>>>>>> 
>>>>>>>>>> - look at the hypre issue and whether they should be deferred to the 
>>>>>>>>>> hypre team.
>>>>>>>>> 
>>>>>>>>> I don’t know if there is something wrong in hypre threading or if 
>>>>>>>>> it’s just a side effect of threading, but it seems that the number of 
>>>>>>>>> threads has a drastic effect on the quality of the PC.
>>>>>>>>> By default, it looks that there are two threads per process with your 
>>>>>>>>> Docker image.
>>>>>>>>> If I force OMP_NUM_THREADS=1, then I get the same convergence as in 
>>>>>>>>> the output file.
>>>>>>>>> 
>>>>>>>>> Thanks,
>>>>>>>>> Pierre
>>>>>>>>> 
>>>>>>>>> (*) https://gitlab.com/petsc/petsc/-/merge_requests/3712 
>>>>>>>>> <https://gitlab.com/petsc/petsc/-/merge_requests/3712>
>>>>>>>>>> Thank you for the Docker files, they were really useful.
>>>>>>>>>> If you want to avoid oversubscription failures, you can edit the 
>>>>>>>>>> file /opt/openmpi-4.1.0/etc/openmpi-default-hostfile and append the 
>>>>>>>>>> line:
>>>>>>>>>> localhost slots=12
>>>>>>>>>> If you want to increase the timeout limit of PETSc test suite for 
>>>>>>>>>> each test, you can add the extra flag in your command line 
>>>>>>>>>> TIMEOUT=180 (default is 60, units are seconds).
>>>>>>>>>> 
>>>>>>>>>> Thanks, I’ll ping you on GitLab when I’ve got something ready for 
>>>>>>>>>> you to try,
>>>>>>>>>> Pierre
>>>>>>>>>> 
>>>>>>>>>> <ompi.cxx>
>>>>>>>>>> 
>>>>>>>>>>> On 12 Mar 2021, at 8:54 PM, Eric Chamberland 
>>>>>>>>>>> <[email protected] 
>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>> 
>>>>>>>>>>> Hi Pierre,
>>>>>>>>>>> 
>>>>>>>>>>> I now have a docker container reproducing the problems here.
>>>>>>>>>>> 
>>>>>>>>>>> Actually, if I look at snes_tutorials-ex12_quad_singular_hpddm  it 
>>>>>>>>>>> fails like this:
>>>>>>>>>>> 
>>>>>>>>>>> not ok snes_tutorials-ex12_quad_singular_hpddm # Error code: 59
>>>>>>>>>>> #       Initial guess
>>>>>>>>>>> #       L_2 Error: 0.00803099
>>>>>>>>>>> #       Initial Residual
>>>>>>>>>>> #       L_2 Residual: 1.09057
>>>>>>>>>>> #       Au - b = Au + F(0)
>>>>>>>>>>> #       Linear L_2 Residual: 1.09057
>>>>>>>>>>> #       [d470c54ce086:14127] Read -1, expected 4096, errno = 1
>>>>>>>>>>> #       [d470c54ce086:14128] Read -1, expected 4096, errno = 1
>>>>>>>>>>> #       [d470c54ce086:14129] Read -1, expected 4096, errno = 1
>>>>>>>>>>> #       [3]PETSC ERROR: 
>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>> #       [3]PETSC ERROR: Caught signal number 11 SEGV: Segmentation 
>>>>>>>>>>> Violation, probably memory access out of range
>>>>>>>>>>> #       [3]PETSC ERROR: Try option -start_in_debugger or 
>>>>>>>>>>> -on_error_attach_debugger
>>>>>>>>>>> #       [3]PETSC ERROR: or see 
>>>>>>>>>>> https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind 
>>>>>>>>>>> <https://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind>
>>>>>>>>>>> #       [3]PETSC ERROR: or try http://valgrind.org 
>>>>>>>>>>> <http://valgrind.org/> on GNU/linux and Apple Mac OS X to find 
>>>>>>>>>>> memory corruption errors
>>>>>>>>>>> #       [3]PETSC ERROR: likely location of problem given in stack 
>>>>>>>>>>> below
>>>>>>>>>>> #       [3]PETSC ERROR: ---------------------  Stack Frames 
>>>>>>>>>>> ------------------------------------
>>>>>>>>>>> #       [3]PETSC ERROR: Note: The EXACT line numbers in the stack 
>>>>>>>>>>> are not available,
>>>>>>>>>>> #       [3]PETSC ERROR:       INSTEAD the line number of the start 
>>>>>>>>>>> of the function
>>>>>>>>>>> #       [3]PETSC ERROR:       is given.
>>>>>>>>>>> #       [3]PETSC ERROR: [3] buildTwo line 987 
>>>>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>>> #       [3]PETSC ERROR: [3] next line 1130 
>>>>>>>>>>> /opt/petsc-main/include/HPDDM_schwarz.hpp
>>>>>>>>>>> #       [3]PETSC ERROR: --------------------- Error Message 
>>>>>>>>>>> --------------------------------------------------------------
>>>>>>>>>>> #       [3]PETSC ERROR: Signal received
>>>>>>>>>>> #       [3]PETSC ERROR: [0]PETSC ERROR: 
>>>>>>>>>>> ------------------------------------------------------------------------
>>>>>>>>>>> 
>>>>>>>>>>> also ex12_quad_hpddm_reuse_baij fails with a lot more "Read -1, 
>>>>>>>>>>> expected ..." which I don't know where they come from...?
>>>>>>>>>>> 
>>>>>>>>>>> Hypre (like in diff-snes_tutorials-ex56_hypre)  is also having 
>>>>>>>>>>> DIVERGED_INDEFINITE_PC failures...
>>>>>>>>>>> 
>>>>>>>>>>> Please see the 3 attached docker files:
>>>>>>>>>>> 
>>>>>>>>>>> 1) fedora_mkl_and_devtools : the DockerFile which install fedore 33 
>>>>>>>>>>> with gnu compilers and MKL and everything to develop.
>>>>>>>>>>> 
>>>>>>>>>>> 2) openmpi: the DockerFile to bluid OpenMPI
>>>>>>>>>>> 
>>>>>>>>>>> 3) petsc: The las DockerFile that build/install and test PETSc
>>>>>>>>>>> 
>>>>>>>>>>> I build the 3 like this:
>>>>>>>>>>> 
>>>>>>>>>>> docker build -t fedora_mkl_and_devtools -f fedora_mkl_and_devtools .
>>>>>>>>>>> 
>>>>>>>>>>> docker build -t openmpi -f openmpi .
>>>>>>>>>>> 
>>>>>>>>>>> docker build -t petsc -f petsc .
>>>>>>>>>>> 
>>>>>>>>>>> Disclaimer: I am not a docker expert, so I may do things that are 
>>>>>>>>>>> not docker-stat-of-the-art but I am opened to suggestions... ;)
>>>>>>>>>>> 
>>>>>>>>>>> I have just ran it on my portable (long) which have not enough 
>>>>>>>>>>> cores, so many more tests failed (should force --oversubscribe but 
>>>>>>>>>>> don't know how to).  I will relaunch on my workstation in a few 
>>>>>>>>>>> minutes.
>>>>>>>>>>> 
>>>>>>>>>>> I will now test your branch! (sorry for the delay).
>>>>>>>>>>> 
>>>>>>>>>>> Thanks,
>>>>>>>>>>> 
>>>>>>>>>>> Eric
>>>>>>>>>>> 
>>>>>>>>>>> On 2021-03-11 9:03 a.m., Eric Chamberland wrote:
>>>>>>>>>>>> Hi Pierre,
>>>>>>>>>>>> 
>>>>>>>>>>>> ok, that's interesting!
>>>>>>>>>>>> 
>>>>>>>>>>>> I will try to build a docker image until tomorrow and give you the 
>>>>>>>>>>>> exact recipe to reproduce the bugs.
>>>>>>>>>>>> 
>>>>>>>>>>>> Eric
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> 
>>>>>>>>>>>> On 2021-03-11 2:46 a.m., Pierre Jolivet wrote:
>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>>>> On 11 Mar 2021, at 6:16 AM, Barry Smith <[email protected] 
>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>   Eric,
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>    Sorry about not being more immediate. We still have this in 
>>>>>>>>>>>>>> our active email so you don't need to submit individual issues. 
>>>>>>>>>>>>>> We'll try to get to them as soon as we can.
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Indeed, I’m still trying to figure this out.
>>>>>>>>>>>>> I realized that some of my configure flags were different than 
>>>>>>>>>>>>> yours, e.g., no --with-memalign.
>>>>>>>>>>>>> I’ve also added SuperLU_DIST to my installation.
>>>>>>>>>>>>> Still, I can’t reproduce any issue.
>>>>>>>>>>>>> I will continue looking into this, it appears I’m seeing some 
>>>>>>>>>>>>> valgrind errors, but I don’t know if this is some side effect of 
>>>>>>>>>>>>> OpenMPI not being valgrind-clean (last time I checked, there was 
>>>>>>>>>>>>> no error with MPICH).
>>>>>>>>>>>>> 
>>>>>>>>>>>>> Thank you for your patience,
>>>>>>>>>>>>> Pierre
>>>>>>>>>>>>> 
>>>>>>>>>>>>> /usr/bin/gmake -f gmakefile test test-fail=1
>>>>>>>>>>>>> Using MAKEFLAGS: test-fail=1
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_baij.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_baij
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist_2.counts
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex33_superlu_dist_2
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex49_superlu_dist.counts
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-0
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-0_conv-1
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-0
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-1herm-1_conv-1
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-0
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-0_conv-1
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-0
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex49_superlu_dist+nsize-4herm-1_conv-1
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex50_tut_2.counts
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>>>  ok diff-ksp_ksp_tutorials-ex50_tut_2
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tests-ex33_superlu_dist.counts
>>>>>>>>>>>>>  ok ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>>>  ok diff-ksp_ksp_tests-ex33_superlu_dist
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_hypre.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex56_hypre
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex56_hypre
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex56_2.counts
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>>>  ok diff-ksp_ksp_tutorials-ex56_2
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_elas.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex17_3d_q3_trig_elas
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex12_quad_hpddm_reuse_threshold_baij
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_3.counts
>>>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_3 # Error code: 1
>>>>>>>>>>>>> # srun: error: Unable to create step for job 1426755: More 
>>>>>>>>>>>>> processors requested than permitted
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex5_superlu_dist_3 # SKIP Command failed so 
>>>>>>>>>>>>> no diff
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist.counts
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex5f_superlu_dist # SKIP Fortran required 
>>>>>>>>>>>>> for this test
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex12_tri_parmetis_hpddm_baij.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex12_tri_parmetis_hpddm_baij
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_tut_3.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex19_tut_3
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex19_tut_3
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex17_3d_q3_trig_vlap.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex17_3d_q3_trig_vlap
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_3.counts
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex5f_superlu_dist_3 # SKIP Fortran required 
>>>>>>>>>>>>> for this test
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex19_superlu_dist
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre.counts
>>>>>>>>>>>>>  ok 
>>>>>>>>>>>>> snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>>>  ok 
>>>>>>>>>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-1_bddc_approx_hypre
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex49_hypre_nullspace.counts
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>>>  ok diff-ksp_ksp_tutorials-ex49_hypre_nullspace
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex19_superlu_dist_2.counts
>>>>>>>>>>>>>  ok snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>>>  ok diff-snes_tutorials-ex19_superlu_dist_2
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist_2.counts
>>>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist_2 # Error code: 1
>>>>>>>>>>>>> # srun: error: Unable to create step for job 1426755: More 
>>>>>>>>>>>>> processors requested than permitted
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex5_superlu_dist_2 # SKIP Command failed so 
>>>>>>>>>>>>> no diff
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre.counts
>>>>>>>>>>>>>  ok 
>>>>>>>>>>>>> snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>>>  ok 
>>>>>>>>>>>>> diff-snes_tutorials-ex56_attach_mat_nearnullspace-0_bddc_approx_hypre
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex64_1.counts
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>>>  ok diff-ksp_ksp_tutorials-ex64_1
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5_superlu_dist.counts
>>>>>>>>>>>>> not ok ksp_ksp_tutorials-ex5_superlu_dist # Error code: 1
>>>>>>>>>>>>> # srun: error: Unable to create step for job 1426755: More 
>>>>>>>>>>>>> processors requested than permitted
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex5_superlu_dist # SKIP Command failed so 
>>>>>>>>>>>>> no diff
>>>>>>>>>>>>>         TEST 
>>>>>>>>>>>>> arch-linux2-c-opt-ompi/tests/counts/ksp_ksp_tutorials-ex5f_superlu_dist_2.counts
>>>>>>>>>>>>>  ok ksp_ksp_tutorials-ex5f_superlu_dist_2 # SKIP Fortran required 
>>>>>>>>>>>>> for this test
>>>>>>>>>>>>> 
>>>>>>>>>>>>>>    Barry
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> On Mar 10, 2021, at 11:03 PM, Eric Chamberland 
>>>>>>>>>>>>>>> <[email protected] 
>>>>>>>>>>>>>>> <mailto:[email protected]>> wrote:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Barry,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> to get a some follow up on --with-openmp=1 failures, shall I 
>>>>>>>>>>>>>>> open gitlab issues for:
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> a) all hypre failures giving DIVERGED_INDEFINITE_PC
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> b) all superlu_dist failures giving different results with 
>>>>>>>>>>>>>>> initia and "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> c) hpddm failures "free(): invalid next size (fast)" and 
>>>>>>>>>>>>>>> "Segmentation Violation"
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> d) all tao's "Exceeded timeout limit of 60 s"
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> I don't see how I could do all these debugging by myself...
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Thanks,
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> Eric
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>>> 
>>>>>>>>>>>>>> 
>>>>>>>>>>>>> 
>>>>>>>>>>>> -- 
>>>>>>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>>>>>>> Professionnel de recherche
>>>>>>>>>>>> GIREF/Université Laval
>>>>>>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>>>>>> -- 
>>>>>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>>>>>> Professionnel de recherche
>>>>>>>>>>> GIREF/Université Laval
>>>>>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>>>>>>> <fedora_mkl_and_devtools.txt><openmpi.txt><petsc.txt>
>>>>>>>>>> 
>>>>>>>>> 
>>>>>>>> 
>>>>>>> -- 
>>>>>>> Eric Chamberland, ing., M. Ing
>>>>>>> Professionnel de recherche
>>>>>>> GIREF/Université Laval
>>>>>>> (418) 656-2131 poste 41 22 42
>>>>>> 
>>>>> -- 
>>>>> Eric Chamberland, ing., M. Ing
>>>>> Professionnel de recherche
>>>>> GIREF/Université Laval
>>>>> (418) 656-2131 poste 41 22 42
>>>> -- 
>>>> Eric Chamberland, ing., M. Ing
>>>> Professionnel de recherche
>>>> GIREF/Université Laval
>>>> (418) 656-2131 poste 41 22 42
>>> -- 
>>> Eric Chamberland, ing., M. Ing
>>> Professionnel de recherche
>>> GIREF/Université Laval
>>> (418) 656-2131 poste 41 22 42
>> 
> -- 
> Eric Chamberland, ing., M. Ing
> Professionnel de recherche
> GIREF/Université Laval
> (418) 656-2131 poste 41 22 42

Re: [petsc-dev] Petsc "make test" have more failures for --with-openmp=1

Reply via email to