Re: [petsc-users] MatAssemblyEnd taking too long

Barry Smith Fri, 21 Aug 2020 12:58:59 -0700


   True, the bug reports come to us and we get the blame.


> On Aug 21, 2020, at 2:50 PM, Matthew Knepley <knep...@gmail.com> wrote:
> 
> On Fri, Aug 21, 2020 at 3:32 PM Barry Smith <bsm...@petsc.dev 
> <mailto:bsm...@petsc.dev>> wrote:
> 
>   Yes, absolutely a test suite will not solve all problems. In the PETSc 
> model, which is not uncommon, each bug/problem found is suppose to result in 
> another test to detect that problem, thus the test suite can find repeats of 
> the problem without again all the hard work from scratch.
> 
>    So this OpenMPI suite, if it gets off the ground, will be valuable ONLY if 
> they accept community additions efficiently and happily. For example would 
> the test suite detect the problem reported by the PETSc user? It should be 
> trivial to have the user run the suite on their system (which is why it needs 
> be very easy to run) and determine. If it does not detect the problem then 
> working with the appropriate "test suite" community we could submit a MR to 
> the test suite that looks for the problem and finds it. Now the test suite is 
> better and we have one less hassle that comes up multiple times for us. In 
> addition the OpenMPI, MPICH developers etc should do the same thing, each 
> time they fix a bug that was not detected by testing they should donate to 
> the universal test suite the code to reproduce the bug.
> 
>   The question is would our effort in helping the MPI test suite community be 
> more than our "wasted" effort dealing with buggy MPIs? 
> 
>    Barry
> 
>   It is a bit curious that after 25 years no friendly extensible universal 
> MPI test suite community has emerged. Perhaps it is because each MPI 
> implementation has its own test processes and suites and cannot form the 
> wider community to have a single friendly extensible universal MPI test 
> suite. Looking back one could say this was a mistake of the MPI forum, they 
> should have started that in motion in 1995, would have saved a lot of 
> duplication of effort and would be very very good now.
> 
> I think they do not do it because people do not hold implementors 
> accountable, only the packages using MPI.
> 
>    Matt
>  
>> On Aug 21, 2020, at 2:17 PM, Junchao Zhang <junchao.zh...@gmail.com 
>> <mailto:junchao.zh...@gmail.com>> wrote:
>> 
>> Barry,
>>   I mentioned a test suite from MPICH at 
>> https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html 
>> <https://lists.mcs.anl.gov/pipermail/petsc-users/2020-July/041738.html>. 
>> Since it is not easy to use, I did not put it on PETSc FAQ.
>>   I also asked in the OpenMPI mailing list. An OpenMPI developer said he 
>> could make their tests public, and is in the process of checking with all 
>> authors to have a license :). If it is done,  it will be at 
>> https://github.com/open-mpi/ompi-tests-public 
>> <https://github.com/open-mpi/ompi-tests-public> 
>> 
>>   A test suite will be helpful but I doubt it will solve the problem.  
>> User's particular case (number of ranks, message size, communication pattern 
>> etc) might not be covered by a test suite. 
>> --Junchao Zhang
>> 
>> 
>> On Fri, Aug 21, 2020 at 12:33 PM Barry Smith <bsm...@petsc.dev 
>> <mailto:bsm...@petsc.dev>> wrote:
>> 
>>   There really needs to be a usable extensive MPI test suite that can find 
>> these performance issues, we spend time helping users with these problems 
>> when it is really the MPI communities job.
>> 
>> 
>> 
>>> On Aug 21, 2020, at 11:55 AM, Manav Bhatia <bhatiama...@gmail.com 
>>> <mailto:bhatiama...@gmail.com>> wrote:
>>> 
>>> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 
>>> and the test is finishing at my end. 
>>> 
>>> So, it appears that there is some issue with openmpi-4.0.1 on this machine. 
>>> 
>>> I will now build all my dependency toolchain with mpich and hopefully 
>>> things will work for my application code. 
>>> 
>>> Thank you again for your help. 
>>> 
>>> Regards, 
>>> Manav
>>> 
>>> 
>>>> On Aug 20, 2020, at 10:45 PM, Junchao Zhang <junchao.zh...@gmail.com 
>>>> <mailto:junchao.zh...@gmail.com>> wrote:
>>>> 
>>>> Manav,
>>>>  I downloaded your petsc_mat.tgz but could not reproduce the problem, on 
>>>> both Linux and Mac. I used the petsc commit id df0e4300 you mentioned.
>>>>  On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured  
>>>> --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort 
>>>> --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" 
>>>> --PETSC_ARCH=linux-host-dbg
>>>>  On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured 
>>>> --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort 
>>>> --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" 
>>>> PETSC_ARCH=mac-clang-dbg
>>>> 
>>>> mpirun -n 8 ./test
>>>> rank: 1 : stdout.processor.1
>>>> rank: 4 : stdout.processor.4
>>>> rank: 0 : stdout.processor.0
>>>> rank: 5 : stdout.processor.5
>>>> rank: 6 : stdout.processor.6
>>>> rank: 7 : stdout.processor.7
>>>> rank: 3 : stdout.processor.3
>>>> rank: 2 : stdout.processor.2
>>>> rank: 1 : Beginning reading nnz...
>>>> rank: 4 : Beginning reading nnz...
>>>> rank: 0 : Beginning reading nnz...
>>>> rank: 5 : Beginning reading nnz...
>>>> rank: 7 : Beginning reading nnz...
>>>> rank: 2 : Beginning reading nnz...
>>>> rank: 3 : Beginning reading nnz...
>>>> rank: 6 : Beginning reading nnz...
>>>> rank: 5 : Finished reading nnz
>>>> rank: 5 : Beginning mat preallocation...
>>>> rank: 3 : Finished reading nnz
>>>> rank: 3 : Beginning mat preallocation...
>>>> rank: 4 : Finished reading nnz
>>>> rank: 4 : Beginning mat preallocation...
>>>> rank: 7 : Finished reading nnz
>>>> rank: 7 : Beginning mat preallocation...
>>>> rank: 1 : Finished reading nnz
>>>> rank: 1 : Beginning mat preallocation...
>>>> rank: 0 : Finished reading nnz
>>>> rank: 0 : Beginning mat preallocation...
>>>> rank: 2 : Finished reading nnz
>>>> rank: 2 : Beginning mat preallocation...
>>>> rank: 6 : Finished reading nnz
>>>> rank: 6 : Beginning mat preallocation...
>>>> rank: 5 : Finished preallocation
>>>> rank: 5 : Beginning reading and setting matrix values...
>>>> rank: 1 : Finished preallocation
>>>> rank: 1 : Beginning reading and setting matrix values...
>>>> rank: 7 : Finished preallocation
>>>> rank: 7 : Beginning reading and setting matrix values...
>>>> rank: 2 : Finished preallocation
>>>> rank: 2 : Beginning reading and setting matrix values...
>>>> rank: 4 : Finished preallocation
>>>> rank: 4 : Beginning reading and setting matrix values...
>>>> rank: 0 : Finished preallocation
>>>> rank: 0 : Beginning reading and setting matrix values...
>>>> rank: 3 : Finished preallocation
>>>> rank: 3 : Beginning reading and setting matrix values...
>>>> rank: 6 : Finished preallocation
>>>> rank: 6 : Beginning reading and setting matrix values...
>>>> rank: 1 : Finished reading and setting matrix values
>>>> rank: 1 : Beginning mat assembly...
>>>> rank: 5 : Finished reading and setting matrix values
>>>> rank: 5 : Beginning mat assembly...
>>>> rank: 4 : Finished reading and setting matrix values
>>>> rank: 4 : Beginning mat assembly...
>>>> rank: 2 : Finished reading and setting matrix values
>>>> rank: 2 : Beginning mat assembly...
>>>> rank: 3 : Finished reading and setting matrix values
>>>> rank: 3 : Beginning mat assembly...
>>>> rank: 7 : Finished reading and setting matrix values
>>>> rank: 7 : Beginning mat assembly...
>>>> rank: 6 : Finished reading and setting matrix values
>>>> rank: 6 : Beginning mat assembly...
>>>> rank: 0 : Finished reading and setting matrix values
>>>> rank: 0 : Beginning mat assembly...
>>>> rank: 1 : Finished mat assembly
>>>> rank: 3 : Finished mat assembly
>>>> rank: 7 : Finished mat assembly
>>>> rank: 0 : Finished mat assembly
>>>> rank: 5 : Finished mat assembly
>>>> rank: 2 : Finished mat assembly
>>>> rank: 4 : Finished mat assembly
>>>> rank: 6 : Finished mat assembly
>>>> 
>>>> --Junchao Zhang
>>>> 
>>>> 
>>>> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang <junchao.zh...@gmail.com 
>>>> <mailto:junchao.zh...@gmail.com>> wrote:
>>>> I will have a look and report back to you. Thanks.
>>>> --Junchao Zhang
>>>> 
>>>> 
>>>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia <bhatiama...@gmail.com 
>>>> <mailto:bhatiama...@gmail.com>> wrote:
>>>> I have created a standalone test that demonstrates the problem at my end. 
>>>> I have stored the indices, etc.  from my problem in a text file for each 
>>>> rank, which I use to initialize the matrix.
>>>> Please note that the test is specifically for 8 ranks. 
>>>> 
>>>> The .tgz file is on my google drive: 
>>>> https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing
>>>>  
>>>> <https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing>
>>>>  
>>>> 
>>>> This contains a README file with instructions on running. Please note that 
>>>> the work directory needs the index files. 
>>>> 
>>>> Please let me know if I can provide any further information. 
>>>> 
>>>> Thank you all for your help. 
>>>> 
>>>> Regards,
>>>> Manav
>>>> 
>>>>> On Aug 20, 2020, at 12:54 PM, Jed Brown <j...@jedbrown.org 
>>>>> <mailto:j...@jedbrown.org>> wrote:
>>>>> 
>>>>> Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>> writes:
>>>>> 
>>>>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <bhatiama...@gmail.com 
>>>>>> <mailto:bhatiama...@gmail.com>> wrote:
>>>>>> 
>>>>>>> 
>>>>>>> 
>>>>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zamp...@gmail.com 
>>>>>>> <mailto:stefano.zamp...@gmail.com>>
>>>>>>> wrote:
>>>>>>> 
>>>>>>> Can you add a MPI_Barrier before
>>>>>>> 
>>>>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>>>>>>> 
>>>>>>> 
>>>>>>> With a MPI_Barrier before this function call:
>>>>>>> —  three of the processes have already hit this barrier,
>>>>>>> —  the other 5 are inside MatStashScatterGetMesg_Private ->
>>>>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3
>>>>>>> processes)
>>>>> 
>>>>> This is not itself evidence of inconsistent state.  You can use
>>>>> 
>>>>>  -build_twosided allreduce
>>>>> 
>>>>> to avoid the nonblocking sparse algorithm.
>>>>> 
>>>>>> 
>>>>>> Okay, you should run this with -matstash_legacy just to make sure it is 
>>>>>> not
>>>>>> a bug in your MPI implementation. But it looks like
>>>>>> there is inconsistency in the parallel state. This can happen because we
>>>>>> have a bug, or it could be that you called a collective
>>>>>> operation on a subset of the processes. Is there any way you could cut 
>>>>>> down
>>>>>> the example (say put all 1s in the matrix, etc) so
>>>>>> that you could give it to us to run?
>>>> 
>>> 
>> 
> 
> 
> 
> -- 
> What most experimenters take for granted before they begin their experiments 
> is infinitely more interesting than any results to which their experiments 
> lead.
> -- Norbert Wiener
> 
> https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>

Re: [petsc-users] MatAssemblyEnd taking too long

Reply via email to