There really needs to be a usable extensive MPI test suite that can find 
these performance issues, we spend time helping users with these problems when 
it is really the MPI communities job.



> On Aug 21, 2020, at 11:55 AM, Manav Bhatia <bhatiama...@gmail.com> wrote:
> 
> I built petsc with mpich-3.3.2 on my MacBook Pro with Apple clang 11.0.3 and 
> the test is finishing at my end. 
> 
> So, it appears that there is some issue with openmpi-4.0.1 on this machine. 
> 
> I will now build all my dependency toolchain with mpich and hopefully things 
> will work for my application code. 
> 
> Thank you again for your help. 
> 
> Regards, 
> Manav
> 
> 
>> On Aug 20, 2020, at 10:45 PM, Junchao Zhang <junchao.zh...@gmail.com 
>> <mailto:junchao.zh...@gmail.com>> wrote:
>> 
>> Manav,
>>  I downloaded your petsc_mat.tgz but could not reproduce the problem, on 
>> both Linux and Mac. I used the petsc commit id df0e4300 you mentioned.
>>  On Linux, I have openmpi-4.0.2 + gcc-8.3.0, and petsc is configured  
>> --with-debugging --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort 
>> --COPTFLAGS="-g -O0" --FOPTFLAGS="-g -O0" --CXXOPTFLAGS="-g -O0" 
>> --PETSC_ARCH=linux-host-dbg
>>  On Mac, I have mpich-3.3.1 + clang-11.0.0-apple, and petsc is configured 
>> --with-debugging=1 --with-cc=mpicc --with-cxx=mpicxx --with-fc=mpifort 
>> --with-ctable=0 COPTFLAGS="-O0 -g" CXXOPTFLAGS="-O0 -g" 
>> PETSC_ARCH=mac-clang-dbg
>> 
>> mpirun -n 8 ./test
>> rank: 1 : stdout.processor.1
>> rank: 4 : stdout.processor.4
>> rank: 0 : stdout.processor.0
>> rank: 5 : stdout.processor.5
>> rank: 6 : stdout.processor.6
>> rank: 7 : stdout.processor.7
>> rank: 3 : stdout.processor.3
>> rank: 2 : stdout.processor.2
>> rank: 1 : Beginning reading nnz...
>> rank: 4 : Beginning reading nnz...
>> rank: 0 : Beginning reading nnz...
>> rank: 5 : Beginning reading nnz...
>> rank: 7 : Beginning reading nnz...
>> rank: 2 : Beginning reading nnz...
>> rank: 3 : Beginning reading nnz...
>> rank: 6 : Beginning reading nnz...
>> rank: 5 : Finished reading nnz
>> rank: 5 : Beginning mat preallocation...
>> rank: 3 : Finished reading nnz
>> rank: 3 : Beginning mat preallocation...
>> rank: 4 : Finished reading nnz
>> rank: 4 : Beginning mat preallocation...
>> rank: 7 : Finished reading nnz
>> rank: 7 : Beginning mat preallocation...
>> rank: 1 : Finished reading nnz
>> rank: 1 : Beginning mat preallocation...
>> rank: 0 : Finished reading nnz
>> rank: 0 : Beginning mat preallocation...
>> rank: 2 : Finished reading nnz
>> rank: 2 : Beginning mat preallocation...
>> rank: 6 : Finished reading nnz
>> rank: 6 : Beginning mat preallocation...
>> rank: 5 : Finished preallocation
>> rank: 5 : Beginning reading and setting matrix values...
>> rank: 1 : Finished preallocation
>> rank: 1 : Beginning reading and setting matrix values...
>> rank: 7 : Finished preallocation
>> rank: 7 : Beginning reading and setting matrix values...
>> rank: 2 : Finished preallocation
>> rank: 2 : Beginning reading and setting matrix values...
>> rank: 4 : Finished preallocation
>> rank: 4 : Beginning reading and setting matrix values...
>> rank: 0 : Finished preallocation
>> rank: 0 : Beginning reading and setting matrix values...
>> rank: 3 : Finished preallocation
>> rank: 3 : Beginning reading and setting matrix values...
>> rank: 6 : Finished preallocation
>> rank: 6 : Beginning reading and setting matrix values...
>> rank: 1 : Finished reading and setting matrix values
>> rank: 1 : Beginning mat assembly...
>> rank: 5 : Finished reading and setting matrix values
>> rank: 5 : Beginning mat assembly...
>> rank: 4 : Finished reading and setting matrix values
>> rank: 4 : Beginning mat assembly...
>> rank: 2 : Finished reading and setting matrix values
>> rank: 2 : Beginning mat assembly...
>> rank: 3 : Finished reading and setting matrix values
>> rank: 3 : Beginning mat assembly...
>> rank: 7 : Finished reading and setting matrix values
>> rank: 7 : Beginning mat assembly...
>> rank: 6 : Finished reading and setting matrix values
>> rank: 6 : Beginning mat assembly...
>> rank: 0 : Finished reading and setting matrix values
>> rank: 0 : Beginning mat assembly...
>> rank: 1 : Finished mat assembly
>> rank: 3 : Finished mat assembly
>> rank: 7 : Finished mat assembly
>> rank: 0 : Finished mat assembly
>> rank: 5 : Finished mat assembly
>> rank: 2 : Finished mat assembly
>> rank: 4 : Finished mat assembly
>> rank: 6 : Finished mat assembly
>> 
>> --Junchao Zhang
>> 
>> 
>> On Thu, Aug 20, 2020 at 5:29 PM Junchao Zhang <junchao.zh...@gmail.com 
>> <mailto:junchao.zh...@gmail.com>> wrote:
>> I will have a look and report back to you. Thanks.
>> --Junchao Zhang
>> 
>> 
>> On Thu, Aug 20, 2020 at 5:23 PM Manav Bhatia <bhatiama...@gmail.com 
>> <mailto:bhatiama...@gmail.com>> wrote:
>> I have created a standalone test that demonstrates the problem at my end. I 
>> have stored the indices, etc.  from my problem in a text file for each rank, 
>> which I use to initialize the matrix.
>> Please note that the test is specifically for 8 ranks. 
>> 
>> The .tgz file is on my google drive: 
>> https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing
>>  
>> <https://drive.google.com/file/d/1R-WjS36av3maXX3pUyiR3ndGAxteTVj-/view?usp=sharing>
>>  
>> 
>> This contains a README file with instructions on running. Please note that 
>> the work directory needs the index files. 
>> 
>> Please let me know if I can provide any further information. 
>> 
>> Thank you all for your help. 
>> 
>> Regards,
>> Manav
>> 
>>> On Aug 20, 2020, at 12:54 PM, Jed Brown <j...@jedbrown.org 
>>> <mailto:j...@jedbrown.org>> wrote:
>>> 
>>> Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>> writes:
>>> 
>>>> On Thu, Aug 20, 2020 at 11:09 AM Manav Bhatia <bhatiama...@gmail.com 
>>>> <mailto:bhatiama...@gmail.com>> wrote:
>>>> 
>>>>> 
>>>>> 
>>>>> On Aug 20, 2020, at 8:31 AM, Stefano Zampini <stefano.zamp...@gmail.com 
>>>>> <mailto:stefano.zamp...@gmail.com>>
>>>>> wrote:
>>>>> 
>>>>> Can you add a MPI_Barrier before
>>>>> 
>>>>> ierr = MatAssemblyBegin(aij->A,mode);CHKERRQ(ierr);
>>>>> 
>>>>> 
>>>>> With a MPI_Barrier before this function call:
>>>>> —  three of the processes have already hit this barrier,
>>>>> —  the other 5 are inside MatStashScatterGetMesg_Private ->
>>>>> MatStashScatterGetMesg_BTS -> MPI_Waitsome(2 processes)/MPI_Waitall(3
>>>>> processes)
>>> 
>>> This is not itself evidence of inconsistent state.  You can use
>>> 
>>>  -build_twosided allreduce
>>> 
>>> to avoid the nonblocking sparse algorithm.
>>> 
>>>> 
>>>> Okay, you should run this with -matstash_legacy just to make sure it is not
>>>> a bug in your MPI implementation. But it looks like
>>>> there is inconsistency in the parallel state. This can happen because we
>>>> have a bug, or it could be that you called a collective
>>>> operation on a subset of the processes. Is there any way you could cut down
>>>> the example (say put all 1s in the matrix, etc) so
>>>> that you could give it to us to run?
>> 
> 

Reply via email to