Re: [petsc-users] Reading VTK files in PETSc

2023-11-29 Thread Jed Brown
Is it necessary that it be VTK format or can it be PETSc's binary format or a 
different mesh format? VTK (be it legacy .vtk or the XML-based .vtu, etc.) is a 
bad format for parallel reading, no matter how much effort might go into an 
implementation.

"Kevin G. Wang"  writes:

> Good morning everyone.
>
> I use the following functions to output parallel vectors --- "globalVec" in
> this example --- to VTK files. It works well, and is quite convenient.
>
> ~
>   PetscViewer viewer;
>   PetscViewerVTKOpen(PetscObjectComm((PetscObject)*dm), filename,
> FILE_MODE_WRITE, );
>   VecView(globalVec, viewer);
>   PetscViewerDestroy();
> ~
>
> Now, I am trying to do the opposite. I would like to read the VTK files
> generated by PETSc back into memory, and assign each one to a Vec. Could
> someone let me know how this can be done?
>
> Thanks!
> Kevin
>
>
> -- 
> Kevin G. Wang, Ph.D.
> Associate Professor
> Kevin T. Crofton Department of Aerospace and Ocean Engineering
> Virginia Tech
> 1600 Innovation Dr., VTSS Rm 224H, Blacksburg, VA 24061
> Office: (540) 231-7547  |  Mobile: (650) 862-2663
> URL: https://www.aoe.vt.edu/people/faculty/wang.html
> Codes: https://github.com/kevinwgy


Re: [petsc-users] [Xolotl-psi-development] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-29 Thread Jed Brown
"Blondel, Sophie"  writes:

> Hi Jed,
>
> I'm not sure I'm going to reply to your question correctly because I don't 
> really understand how the split is done. Is it related to on diagonal and off 
> diagonal? If so, the off-diagonal part is usually pretty small (less than 20 
> DOFs) and related to diffusion, the diagonal part involves thousands of DOFs 
> for the reaction term.

>From the run-time option, it'll be a default (additive) split and we're 
>interested in the two diagonal blocks. One currently has a cheap solver that 
>would only be efficient with a well-conditioned positive definite matrix and 
>the other is using a direct solver ('redundant'). If you were to run with 
>-ksp_view and share the output, it would be informative.

Either way, I'd like to understand what physics are beind the equation 
currently being solved with 'redundant'. If it's diffusive, then algebraic 
multigrid would be a good place to start.

> Let us know what we can do to answer this question more accurately.
>
> Cheers,
>
> Sophie
> 
> From: Jed Brown 
> Sent: Tuesday, November 28, 2023 19:07
> To: Fackler, Philip ; Junchao Zhang 
> 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
> performance losses switching to COO interface
>
> [Some people who received this message don't often get email from 
> j...@jedbrown.org. Learn why this is important at 
> https://aka.ms/LearnAboutSenderIdentification ]
>
> "Fackler, Philip via petsc-users"  writes:
>
>> That makes sense. Here are the arguments that I think are relevant:
>>
>> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type 
>> fieldsplit -pc_fieldsplit_detect_coupling​
>
> What sort of physics are in splits 0 and 1?
>
> SOR is not a good GPU algorithm, so we'll want to change that one way or 
> another. Are the splits of similar size or very different?
>
>> What would you suggest to make this better?
>>
>> Also, note that the cases marked "serial" are running on CPU only, that is, 
>> using only the SERIAL backend for kokkos.
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory
>> 
>> From: Junchao Zhang 
>> Sent: Tuesday, November 28, 2023 15:51
>> To: Fackler, Philip 
>> Cc: petsc-users@mcs.anl.gov ; 
>> xolotl-psi-developm...@lists.sourceforge.net 
>> 
>> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
>> switching to COO interface
>>
>> Hi, Philip,
>>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos 
>> does not have a GPU LU implementation, we do it on CPU via 
>> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>>
>> [Screenshot 2023-11-28 at 2.43.03 PM.png]
>> --Junchao Zhang
>>
>>
>> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
>> mailto:fackle...@ornl.gov>> wrote:
>> I definitely dropped the ball on this. I'm sorry for that. I have new 
>> profiling data using the latest (as of yesterday) of petsc/main. I've put 
>> them in a single google drive folder linked here:
>>
>> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link
>>
>> Have a happy holiday weekend!
>>
>> Thanks,
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory
>> 
>> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
>> Sent: Monday, October 16, 2023 15:24
>> To: Fackler, Philip mailto:fackle...@ornl.gov>>
>> Cc: petsc-users@mcs.anl.gov 
>> mailto:petsc-users@mcs.anl.gov>>; 
>> xolotl-psi-developm...@lists.sourceforge.net
>>  
>> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
>> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
>> switching to COO interface
>>
>> Hi, Philip,
>>That branch was merged to petsc/main today. Let me know once you have new 
>> profiling results.
>>
>>Thanks.
>> --Junchao Zhang
>>
>>
>> On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip 
>> mailto:fackle...@ornl.gov>> wrote:
>> Junchao,
>>
>> I've attached updated timing plots (red and blue are swapped from before; 
>> yellow is the new one). There is an improvement for the NE_3 case only with 
>> CUDA. Serial stays the same, and the PSI 

Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Amneet Bhalla
Awesome! MatZeroRows is very useful and simplified the code logic.

On Wed, Nov 29, 2023 at 5:14 PM Matthew Knepley  wrote:

> On Wed, Nov 29, 2023 at 7:27 PM Amneet Bhalla 
> wrote:
>
>> Ah, I also tried without step 2 (i.e., manually doing MPI_allgatherv for
>> Dirichlet rows), and that also works. So it seems that each processor needs
>> to send in their own Dirichlet rows, and not a union of them. Is that
>> correct?
>>
>
> Yes, that is correct.
>
>   Thanks,
>
> Matt
>
>
>> On Wed, Nov 29, 2023 at 3:48 PM Amneet Bhalla 
>> wrote:
>>
>>> Thanks Barry! I tried that and it seems to be working. This is what I
>>> did. It would be great if you could take a look at it and let me know if
>>> this is what you had in mind.
>>>
>>> 1. Collected Dirichlet rows locally
>>>
>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L731
>>>
>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L797
>>>
>>>
>>> 2. MPI_allgatherv Dirichlet rows
>>>
>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L805-L810
>>>
>>> 3. Called the MatZeroRows function
>>>
>>> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L812-L814
>>>
>>>
>>>
>>>
>>> On Wed, Nov 29, 2023 at 11:32 AM Barry Smith  wrote:
>>>


 On Nov 29, 2023, at 2:11 PM, Matthew Knepley  wrote:

 On Wed, Nov 29, 2023 at 1:55 PM Amneet Bhalla 
 wrote:

> So the code logic is after the matrix is assembled, I iterate over all
> distributed patches in the domain to see which of the patch is abutting a
> Dirichlet boundary. Depending upon which patch abuts a physical and
> Dirichlet boundary, a processor will call this routine. However, that same
> processor is “owning” that DoF, which would be on its diagonal.
>
> I think Barry already mentioned this is not going to work unless I use
> the flag to not communicate explicitly. However, that flag is not working
> as it should over here for some reason.
>

 Oh, I do not think that is right.

 Barry, when I look at the code, MPIU_Allreduce is always going to be
 called to fix up the nonzero_state. Am I wrong about that?


   No, you are correct. I missed that in my earlier look. Setting those
 flags reduce the number of MPI reductions but does not eliminate them
 completely.

   MatZeroRows is collective (as its manual page indicates) so you have
 to do the second thing I suggested. Inside your for loop construct an array
 containing all the local
 rows being zeroed and then make a single call by all MPI processes to
 MatZeroRows().  Note this is a small change of just a handful of lines of
 code.

Barry


   Thanks,

 Matt


> I can always change the matrix coefficients for Dirichlet rows during
> MatSetValues. However, that would lengthen my code and I was trying to
> avoid that.
>
> On Wed, Nov 29, 2023 at 10:02 AM Matthew Knepley 
> wrote:
>
>> On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla 
>> wrote:
>>
>>> Ok, I added both, but it still hangs. Here, is bt from all three
>>> tasks:
>>>
>>
>> It looks like two processes are calling AllReduce, but one is not.
>> Are all procs not calling MatZeroRows?
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Task 1:
>>>
>>> amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44691
>>> (lldb) process attach --pid 44691
>>> Process 44691 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>> frame #0: 0x00018a2d750c
>>> libsystem_kernel.dylib`__semwait_signal + 8
>>> libsystem_kernel.dylib`:
>>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
>>> 0x18a2d7510 <+12>: pacibsp
>>> 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
>>> 0x18a2d7518 <+20>: movx29, sp
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>> Executable module set to
>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>> Architecture set to: arm64-apple-macosx-.
>>> (lldb) cont
>>> Process 44691 resuming
>>> Process 44691 stopped
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>> frame #0: 0x00010ba40b60
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
>>> ->  0x10ba40b60 <+752>: addw8, w8, #0x1
>>> 0x10ba40b64 <+756>: ldrw9, [x22]
>>> 0x10ba40b68 <+760>: cmpw8, 

Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Amneet Bhalla
Ah, I also tried without step 2 (i.e., manually doing MPI_allgatherv for
Dirichlet rows), and that also works. So it seems that each processor needs
to send in their own Dirichlet rows, and not a union of them. Is that
correct?

On Wed, Nov 29, 2023 at 3:48 PM Amneet Bhalla  wrote:

> Thanks Barry! I tried that and it seems to be working. This is what I did.
> It would be great if you could take a look at it and let me know if this is
> what you had in mind.
>
> 1. Collected Dirichlet rows locally
>
> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L731
>
> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L797
>
>
> 2. MPI_allgatherv Dirichlet rows
>
> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L805-L810
>
> 3. Called the MatZeroRows function
>
> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L812-L814
>
>
>
>
> On Wed, Nov 29, 2023 at 11:32 AM Barry Smith  wrote:
>
>>
>>
>> On Nov 29, 2023, at 2:11 PM, Matthew Knepley  wrote:
>>
>> On Wed, Nov 29, 2023 at 1:55 PM Amneet Bhalla 
>> wrote:
>>
>>> So the code logic is after the matrix is assembled, I iterate over all
>>> distributed patches in the domain to see which of the patch is abutting a
>>> Dirichlet boundary. Depending upon which patch abuts a physical and
>>> Dirichlet boundary, a processor will call this routine. However, that same
>>> processor is “owning” that DoF, which would be on its diagonal.
>>>
>>> I think Barry already mentioned this is not going to work unless I use
>>> the flag to not communicate explicitly. However, that flag is not working
>>> as it should over here for some reason.
>>>
>>
>> Oh, I do not think that is right.
>>
>> Barry, when I look at the code, MPIU_Allreduce is always going to be
>> called to fix up the nonzero_state. Am I wrong about that?
>>
>>
>>   No, you are correct. I missed that in my earlier look. Setting those
>> flags reduce the number of MPI reductions but does not eliminate them
>> completely.
>>
>>   MatZeroRows is collective (as its manual page indicates) so you have to
>> do the second thing I suggested. Inside your for loop construct an array
>> containing all the local
>> rows being zeroed and then make a single call by all MPI processes to
>> MatZeroRows().  Note this is a small change of just a handful of lines of
>> code.
>>
>>Barry
>>
>>
>>   Thanks,
>>
>> Matt
>>
>>
>>> I can always change the matrix coefficients for Dirichlet rows during
>>> MatSetValues. However, that would lengthen my code and I was trying to
>>> avoid that.
>>>
>>> On Wed, Nov 29, 2023 at 10:02 AM Matthew Knepley 
>>> wrote:
>>>
 On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla 
 wrote:

> Ok, I added both, but it still hangs. Here, is bt from all three tasks:
>

 It looks like two processes are calling AllReduce, but one is not. Are
 all procs not calling MatZeroRows?

   Thanks,

  Matt


> Task 1:
>
> amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44691
> (lldb) process attach --pid 44691
> Process 44691 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
> SIGSTOP
> frame #0: 0x00018a2d750c
> libsystem_kernel.dylib`__semwait_signal + 8
> libsystem_kernel.dylib`:
> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
> 0x18a2d7510 <+12>: pacibsp
> 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
> 0x18a2d7518 <+20>: movx29, sp
> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
> Executable module set to
> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
> Architecture set to: arm64-apple-macosx-.
> (lldb) cont
> Process 44691 resuming
> Process 44691 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
> SIGSTOP
> frame #0: 0x00010ba40b60
> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
> ->  0x10ba40b60 <+752>: addw8, w8, #0x1
> 0x10ba40b64 <+756>: ldrw9, [x22]
> 0x10ba40b68 <+760>: cmpw8, w9
> 0x10ba40b6c <+764>: b.lt   0x10ba40b4c   ; <+732>
> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
> (lldb) bt
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
> SIGSTOP
>   * frame #0: 0x00010ba40b60
> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
> frame #1: 0x00010ba48528
> libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 1088
> frame #2: 0x00010ba47964
> 

Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Amneet Bhalla
Thanks Barry! I tried that and it seems to be working. This is what I did.
It would be great if you could take a look at it and let me know if this is
what you had in mind.

1. Collected Dirichlet rows locally
https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L731
https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L797


2. MPI_allgatherv Dirichlet rows
https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L805-L810

3. Called the MatZeroRows function
https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L812-L814




On Wed, Nov 29, 2023 at 11:32 AM Barry Smith  wrote:

>
>
> On Nov 29, 2023, at 2:11 PM, Matthew Knepley  wrote:
>
> On Wed, Nov 29, 2023 at 1:55 PM Amneet Bhalla 
> wrote:
>
>> So the code logic is after the matrix is assembled, I iterate over all
>> distributed patches in the domain to see which of the patch is abutting a
>> Dirichlet boundary. Depending upon which patch abuts a physical and
>> Dirichlet boundary, a processor will call this routine. However, that same
>> processor is “owning” that DoF, which would be on its diagonal.
>>
>> I think Barry already mentioned this is not going to work unless I use
>> the flag to not communicate explicitly. However, that flag is not working
>> as it should over here for some reason.
>>
>
> Oh, I do not think that is right.
>
> Barry, when I look at the code, MPIU_Allreduce is always going to be
> called to fix up the nonzero_state. Am I wrong about that?
>
>
>   No, you are correct. I missed that in my earlier look. Setting those
> flags reduce the number of MPI reductions but does not eliminate them
> completely.
>
>   MatZeroRows is collective (as its manual page indicates) so you have to
> do the second thing I suggested. Inside your for loop construct an array
> containing all the local
> rows being zeroed and then make a single call by all MPI processes to
> MatZeroRows().  Note this is a small change of just a handful of lines of
> code.
>
>Barry
>
>
>   Thanks,
>
> Matt
>
>
>> I can always change the matrix coefficients for Dirichlet rows during
>> MatSetValues. However, that would lengthen my code and I was trying to
>> avoid that.
>>
>> On Wed, Nov 29, 2023 at 10:02 AM Matthew Knepley 
>> wrote:
>>
>>> On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla 
>>> wrote:
>>>
 Ok, I added both, but it still hangs. Here, is bt from all three tasks:

>>>
>>> It looks like two processes are calling AllReduce, but one is not. Are
>>> all procs not calling MatZeroRows?
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 Task 1:

 amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44691
 (lldb) process attach --pid 44691
 Process 44691 stopped
 * thread #1, queue = 'com.apple.main-thread', stop reason = signal
 SIGSTOP
 frame #0: 0x00018a2d750c
 libsystem_kernel.dylib`__semwait_signal + 8
 libsystem_kernel.dylib`:
 ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
 0x18a2d7510 <+12>: pacibsp
 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
 0x18a2d7518 <+20>: movx29, sp
 Target 0: (fo_acoustic_streaming_solver_2d) stopped.
 Executable module set to
 "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
 Architecture set to: arm64-apple-macosx-.
 (lldb) cont
 Process 44691 resuming
 Process 44691 stopped
 * thread #1, queue = 'com.apple.main-thread', stop reason = signal
 SIGSTOP
 frame #0: 0x00010ba40b60
 libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
 libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
 ->  0x10ba40b60 <+752>: addw8, w8, #0x1
 0x10ba40b64 <+756>: ldrw9, [x22]
 0x10ba40b68 <+760>: cmpw8, w9
 0x10ba40b6c <+764>: b.lt   0x10ba40b4c   ; <+732>
 Target 0: (fo_acoustic_streaming_solver_2d) stopped.
 (lldb) bt
 * thread #1, queue = 'com.apple.main-thread', stop reason = signal
 SIGSTOP
   * frame #0: 0x00010ba40b60
 libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
 frame #1: 0x00010ba48528
 libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 1088
 frame #2: 0x00010ba47964
 libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma + 368
 frame #3: 0x00010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
 frame #4: 0x000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280
 frame #5: 0x000106d67650
 libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x000105846470, N=1,
 rows=0x00016dbfa9f4, diag=1, x=0x,
 b=0x) at mpiaij.c:827:3
 

Re: [petsc-users] petsc build could not pass make check

2023-11-29 Thread Di Miao via petsc-users
Yes, it is caused by the .petscrc.

Thank you for your help!
Di

-Original Message-
From: Satish Balay  
Sent: Wednesday, November 29, 2023 11:41 AM
To: Di Miao 
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] petsc build could not pass make check

Do you have a ~/.petscrc file - with -log_view enabled?

Satish

On Wed, 29 Nov 2023, Di Miao via petsc-users wrote:

> Hi,
> 
> I tried to compile PETSc with the following configuration:
> 
> ./configure --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' 
> FOPTFLAGS='-O3' --with-clean=1 
> --with-make-exec=/SCRATCH/dimiao/test_space/installed/make/bin/make 
> --with-cmake-exec=/SCRATCH/dimiao/test_space/cmake-3.27.9-linux-x86_64
> /bin/cmake --prefix=/SCRATCH/dimiao/test_space/installed/petsc_opt_mpi 
> --with-mpi-dir=/SCRATCH/dimiao/test_space/installed/mpich 
> PETSC_ARCH=petsc_opt_mpi 
> --with-blaslapack-dir=/SCRATCH/dimiao/oneapi/mkl/latest 
> --with-mkl_pardiso-dir=/SCRATCH/dimiao/oneapi/mkl/latest --with-x=0
> 
> I got three errors:
> Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI 
> process Possible error running C/C++ src/snes/tutorials/ex19 with 2 
> MPI processes Possible error running Fortran example 
> src/snes/tutorials/ex5f with 1 MPI process
> 
> Below each error messages are nothing PETSc's performance summary.
> 
> I have attached make.log, configure.log and the message from make 
> check(make_check.log). Could you please give me some guidance on how to fix 
> this issue?
> 
> Thank you,
> Di
> 
> 
> 



Re: [petsc-users] petsc build could not pass make check

2023-11-29 Thread Barry Smith

   It appears you possibly have the environmental variable PETSC_OPTIONS set to 
-log_view or have a .petscrc file containing -log_view that is triggering the
printing of the logging information.

   The logging information confuses the error checker in make check to make it 
think there may be an error in the output when there is not.

The tests ran fine.

   Barry


> On Nov 29, 2023, at 2:37 PM, Di Miao via petsc-users 
>  wrote:
> 
> Hi,
>  
> I tried to compile PETSc with the following configuration:
>  
> ./configure --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' 
> FOPTFLAGS='-O3' --with-clean=1 
> --with-make-exec=/SCRATCH/dimiao/test_space/installed/make/bin/make 
> --with-cmake-exec=/SCRATCH/dimiao/test_space/cmake-3.27.9-linux-x86_64/bin/cmake
>  --prefix=/SCRATCH/dimiao/test_space/installed/petsc_opt_mpi 
> --with-mpi-dir=/SCRATCH/dimiao/test_space/installed/mpich 
> PETSC_ARCH=petsc_opt_mpi 
> --with-blaslapack-dir=/SCRATCH/dimiao/oneapi/mkl/latest 
> --with-mkl_pardiso-dir=/SCRATCH/dimiao/oneapi/mkl/latest --with-x=0
>  
> I got three errors:
> Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI 
> process
>  
> Below each error messages are nothing PETSc’s performance summary.
>  
> I have attached make.log, configure.log and the message from make 
> check(make_check.log). Could you please give me some guidance on how to fix 
> this issue?
>  
> Thank you,
> Di
>  
>  
> 



Re: [petsc-users] petsc build could not pass make check

2023-11-29 Thread Satish Balay via petsc-users
Do you have a ~/.petscrc file - with -log_view enabled?

Satish

On Wed, 29 Nov 2023, Di Miao via petsc-users wrote:

> Hi,
> 
> I tried to compile PETSc with the following configuration:
> 
> ./configure --with-debugging=0 COPTFLAGS='-O3' CXXOPTFLAGS='-O3' 
> FOPTFLAGS='-O3' --with-clean=1 
> --with-make-exec=/SCRATCH/dimiao/test_space/installed/make/bin/make 
> --with-cmake-exec=/SCRATCH/dimiao/test_space/cmake-3.27.9-linux-x86_64/bin/cmake
>  --prefix=/SCRATCH/dimiao/test_space/installed/petsc_opt_mpi 
> --with-mpi-dir=/SCRATCH/dimiao/test_space/installed/mpich 
> PETSC_ARCH=petsc_opt_mpi 
> --with-blaslapack-dir=/SCRATCH/dimiao/oneapi/mkl/latest 
> --with-mkl_pardiso-dir=/SCRATCH/dimiao/oneapi/mkl/latest --with-x=0
> 
> I got three errors:
> Possible error running C/C++ src/snes/tutorials/ex19 with 1 MPI process
> Possible error running C/C++ src/snes/tutorials/ex19 with 2 MPI processes
> Possible error running Fortran example src/snes/tutorials/ex5f with 1 MPI 
> process
> 
> Below each error messages are nothing PETSc's performance summary.
> 
> I have attached make.log, configure.log and the message from make 
> check(make_check.log). Could you please give me some guidance on how to fix 
> this issue?
> 
> Thank you,
> Di
> 
> 
> 



Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Barry Smith


> On Nov 29, 2023, at 2:11 PM, Matthew Knepley  wrote:
> 
> On Wed, Nov 29, 2023 at 1:55 PM Amneet Bhalla  > wrote:
>> So the code logic is after the matrix is assembled, I iterate over all 
>> distributed patches in the domain to see which of the patch is abutting a 
>> Dirichlet boundary. Depending upon which patch abuts a physical and 
>> Dirichlet boundary, a processor will call this routine. However, that same 
>> processor is “owning” that DoF, which would be on its diagonal. 
>> 
>> I think Barry already mentioned this is not going to work unless I use the 
>> flag to not communicate explicitly. However, that flag is not working as it 
>> should over here for some reason.
> 
> Oh, I do not think that is right.
> 
> Barry, when I look at the code, MPIU_Allreduce is always going to be called 
> to fix up the nonzero_state. Am I wrong about that?

  No, you are correct. I missed that in my earlier look. Setting those flags 
reduce the number of MPI reductions but does not eliminate them completely.

  MatZeroRows is collective (as its manual page indicates) so you have to do 
the second thing I suggested. Inside your for loop construct an array 
containing all the local
rows being zeroed and then make a single call by all MPI processes to 
MatZeroRows().  Note this is a small change of just a handful of lines of code.

   Barry

> 
>   Thanks,
> 
> Matt
>  
>> I can always change the matrix coefficients for Dirichlet rows during 
>> MatSetValues. However, that would lengthen my code and I was trying to avoid 
>> that. 
>> 
>> On Wed, Nov 29, 2023 at 10:02 AM Matthew Knepley > > wrote:
>>> On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla >> > wrote:
 Ok, I added both, but it still hangs. Here, is bt from all three tasks:
>>> 
>>> It looks like two processes are calling AllReduce, but one is not. Are all 
>>> procs not calling MatZeroRows?
>>> 
>>>   Thanks,
>>> 
>>>  Matt
>>>  
 Task 1:
 
 amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44691 
 (lldb) process attach --pid 44691
 Process 44691 stopped
 * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
 frame #0: 0x00018a2d750c libsystem_kernel.dylib`__semwait_signal + 
 8
 libsystem_kernel.dylib`:
 ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
 0x18a2d7510 <+12>: pacibsp 
 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
 0x18a2d7518 <+20>: movx29, sp
 Target 0: (fo_acoustic_streaming_solver_2d) stopped.
 Executable module set to 
 "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
 Architecture set to: arm64-apple-macosx-.
 (lldb) cont
 Process 44691 resuming
 Process 44691 stopped
 * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
 frame #0: 0x00010ba40b60 
 libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
 libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
 ->  0x10ba40b60 <+752>: addw8, w8, #0x1
 0x10ba40b64 <+756>: ldrw9, [x22]
 0x10ba40b68 <+760>: cmpw8, w9
 0x10ba40b6c <+764>: b.lt    0x10ba40b4c   ; 
 <+732>
 Target 0: (fo_acoustic_streaming_solver_2d) stopped.
 (lldb) bt
 * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
   * frame #0: 0x00010ba40b60 
 libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
 frame #1: 0x00010ba48528 
 libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 1088
 frame #2: 0x00010ba47964 
 libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma + 368
 frame #3: 0x00010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
 frame #4: 0x000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280
 frame #5: 0x000106d67650 
 libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x000105846470, N=1, 
 rows=0x00016dbfa9f4, diag=1, x=0x, 
 b=0x) at mpiaij.c:827:3
 frame #6: 0x000106aadfac 
 libpetsc.3.17.dylib`MatZeroRows(mat=0x000105846470, numRows=1, 
 rows=0x00016dbfa9f4, diag=1, x=0x, 
 b=0x) at matrix.c:5935:3
 frame #7: 0x0001023952d0 
 fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x00016dc04168,
  omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4, 
 u_bc_coefs=0x00016dc04398, data_time=NaN, num_dofs_per_proc=size=3, 
 u_dof_index_idx=27, p_dof_index_idx=28, 
 patch_level=Pointer > @ 0x00016dbfcec0, 
 mu_interp_type=VC_HARMONIC_INTERP) at 
 AcousticStreamingPETScMatUtilities.cpp:799:36
 frame #8: 0x0001023acb8c 
 

Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Matthew Knepley
On Wed, Nov 29, 2023 at 1:55 PM Amneet Bhalla  wrote:

> So the code logic is after the matrix is assembled, I iterate over all
> distributed patches in the domain to see which of the patch is abutting a
> Dirichlet boundary. Depending upon which patch abuts a physical and
> Dirichlet boundary, a processor will call this routine. However, that same
> processor is “owning” that DoF, which would be on its diagonal.
>
> I think Barry already mentioned this is not going to work unless I use the
> flag to not communicate explicitly. However, that flag is not working as it
> should over here for some reason.
>

Oh, I do not think that is right.

Barry, when I look at the code, MPIU_Allreduce is always going to be called
to fix up the nonzero_state. Am I wrong about that?

  Thanks,

Matt


> I can always change the matrix coefficients for Dirichlet rows during
> MatSetValues. However, that would lengthen my code and I was trying to
> avoid that.
>
> On Wed, Nov 29, 2023 at 10:02 AM Matthew Knepley 
> wrote:
>
>> On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla 
>> wrote:
>>
>>> Ok, I added both, but it still hangs. Here, is bt from all three tasks:
>>>
>>
>> It looks like two processes are calling AllReduce, but one is not. Are
>> all procs not calling MatZeroRows?
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Task 1:
>>>
>>> amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44691
>>>
>>> (lldb) process attach --pid 44691
>>>
>>> Process 44691 stopped
>>>
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>
>>> frame #0: 0x00018a2d750c
>>> libsystem_kernel.dylib`__semwait_signal + 8
>>>
>>> libsystem_kernel.dylib`:
>>>
>>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
>>>
>>> 0x18a2d7510 <+12>: pacibsp
>>>
>>> 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
>>>
>>> 0x18a2d7518 <+20>: movx29, sp
>>>
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>
>>> Executable module set to
>>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>>
>>> Architecture set to: arm64-apple-macosx-.
>>>
>>> (lldb) cont
>>>
>>> Process 44691 resuming
>>>
>>> Process 44691 stopped
>>>
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>
>>> frame #0: 0x00010ba40b60
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
>>>
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
>>>
>>> ->  0x10ba40b60 <+752>: addw8, w8, #0x1
>>>
>>> 0x10ba40b64 <+756>: ldrw9, [x22]
>>>
>>> 0x10ba40b68 <+760>: cmpw8, w9
>>>
>>> 0x10ba40b6c <+764>: b.lt   0x10ba40b4c   ; <+732>
>>>
>>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>>
>>> (lldb) bt
>>>
>>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>>> SIGSTOP
>>>
>>>   * frame #0: 0x00010ba40b60
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
>>>
>>> frame #1: 0x00010ba48528
>>> libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 1088
>>>
>>> frame #2: 0x00010ba47964
>>> libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma + 368
>>>
>>> frame #3: 0x00010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
>>>
>>> frame #4: 0x000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280
>>>
>>> frame #5: 0x000106d67650
>>> libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x000105846470, N=1,
>>> rows=0x00016dbfa9f4, diag=1, x=0x,
>>> b=0x) at mpiaij.c:827:3
>>>
>>> frame #6: 0x000106aadfac
>>> libpetsc.3.17.dylib`MatZeroRows(mat=0x000105846470, numRows=1,
>>> rows=0x00016dbfa9f4, diag=1, x=0x,
>>> b=0x) at matrix.c:5935:3
>>>
>>> frame #7: 0x0001023952d0
>>> fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x00016dc04168,
>>> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
>>> u_bc_coefs=0x00016dc04398, data_time=NaN, num_dofs_per_proc=size=3,
>>> u_dof_index_idx=27, p_dof_index_idx=28,
>>> patch_level=Pointer > @ 0x00016dbfcec0,
>>> mu_interp_type=VC_HARMONIC_INTERP) at
>>> AcousticStreamingPETScMatUtilities.cpp:799:36
>>>
>>> frame #8: 0x0001023acb8c
>>> fo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x00016dc04018,
>>> x=0x00016dc05778, (null)=0x00016dc05680) at
>>> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>>>
>>> frame #9: 0x00010254a2dc
>>> fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x00016dc04018,
>>> x=0x00016dc05778, b=0x00016dc05680) at PETScLevelSolver.cpp:340:
>>> 5
>>>
>>> frame #10: 0x000102202e5c
>>> fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x00016dc07450) at
>>> fo_acoustic_streaming_solver.cpp:400:22
>>>
>>> frame #11: 0x000189fbbf28 

Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Amneet Bhalla
So the code logic is after the matrix is assembled, I iterate over all
distributed patches in the domain to see which of the patch is abutting a
Dirichlet boundary. Depending upon which patch abuts a physical and
Dirichlet boundary, a processor will call this routine. However, that same
processor is “owning” that DoF, which would be on its diagonal.

I think Barry already mentioned this is not going to work unless I use the
flag to not communicate explicitly. However, that flag is not working as it
should over here for some reason.

I can always change the matrix coefficients for Dirichlet rows during
MatSetValues. However, that would lengthen my code and I was trying to
avoid that.

On Wed, Nov 29, 2023 at 10:02 AM Matthew Knepley  wrote:

> On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla 
> wrote:
>
>> Ok, I added both, but it still hangs. Here, is bt from all three tasks:
>>
>
> It looks like two processes are calling AllReduce, but one is not. Are all
> procs not calling MatZeroRows?
>
>   Thanks,
>
>  Matt
>
>
>> Task 1:
>>
>> amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44691
>>
>> (lldb) process attach --pid 44691
>>
>> Process 44691 stopped
>>
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>>
>> frame #0: 0x00018a2d750c libsystem_kernel.dylib`__semwait_signal
>> + 8
>>
>> libsystem_kernel.dylib`:
>>
>> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
>>
>> 0x18a2d7510 <+12>: pacibsp
>>
>> 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
>>
>> 0x18a2d7518 <+20>: movx29, sp
>>
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>
>> Executable module set to
>> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>>
>> Architecture set to: arm64-apple-macosx-.
>>
>> (lldb) cont
>>
>> Process 44691 resuming
>>
>> Process 44691 stopped
>>
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>>
>> frame #0: 0x00010ba40b60
>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
>>
>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
>>
>> ->  0x10ba40b60 <+752>: addw8, w8, #0x1
>>
>> 0x10ba40b64 <+756>: ldrw9, [x22]
>>
>> 0x10ba40b68 <+760>: cmpw8, w9
>>
>> 0x10ba40b6c <+764>: b.lt   0x10ba40b4c   ; <+732>
>>
>> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>>
>> (lldb) bt
>>
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>>
>>   * frame #0: 0x00010ba40b60
>> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
>>
>> frame #1: 0x00010ba48528
>> libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 1088
>>
>> frame #2: 0x00010ba47964
>> libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma + 368
>>
>> frame #3: 0x00010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
>>
>> frame #4: 0x000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280
>>
>> frame #5: 0x000106d67650
>> libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x000105846470, N=1,
>> rows=0x00016dbfa9f4, diag=1, x=0x,
>> b=0x) at mpiaij.c:827:3
>>
>> frame #6: 0x000106aadfac
>> libpetsc.3.17.dylib`MatZeroRows(mat=0x000105846470, numRows=1,
>> rows=0x00016dbfa9f4, diag=1, x=0x,
>> b=0x) at matrix.c:5935:3
>>
>> frame #7: 0x0001023952d0
>> fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x00016dc04168,
>> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
>> u_bc_coefs=0x00016dc04398, data_time=NaN, num_dofs_per_proc=size=3,
>> u_dof_index_idx=27, p_dof_index_idx=28,
>> patch_level=Pointer > @ 0x00016dbfcec0,
>> mu_interp_type=VC_HARMONIC_INTERP) at
>> AcousticStreamingPETScMatUtilities.cpp:799:36
>>
>> frame #8: 0x0001023acb8c
>> fo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x00016dc04018,
>> x=0x00016dc05778, (null)=0x00016dc05680) at
>> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>>
>> frame #9: 0x00010254a2dc
>> fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x00016dc04018,
>> x=0x00016dc05778, b=0x00016dc05680) at PETScLevelSolver.cpp:340:5
>>
>> frame #10: 0x000102202e5c
>> fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x00016dc07450) at
>> fo_acoustic_streaming_solver.cpp:400:22
>>
>> frame #11: 0x000189fbbf28 dyld`start + 2236
>>
>> (lldb)
>>
>>
>> Task 2:
>>
>> amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44692
>>
>> (lldb) process attach --pid 44692
>>
>> Process 44692 stopped
>>
>> * thread #1, queue = 'com.apple.main-thread', stop reason = signal
>> SIGSTOP
>>
>> frame #0: 0x00018a2d750c libsystem_kernel.dylib`__semwait_signal
>> + 8
>>
>> libsystem_kernel.dylib`:
>>
>> ->  0x18a2d750c <+8>:  

Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Matthew Knepley
On Wed, Nov 29, 2023 at 12:30 PM Amneet Bhalla 
wrote:

> Ok, I added both, but it still hangs. Here, is bt from all three tasks:
>

It looks like two processes are calling AllReduce, but one is not. Are all
procs not calling MatZeroRows?

  Thanks,

 Matt


> Task 1:
>
> amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44691
>
> (lldb) process attach --pid 44691
>
> Process 44691 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
> frame #0: 0x00018a2d750c libsystem_kernel.dylib`__semwait_signal
> + 8
>
> libsystem_kernel.dylib`:
>
> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
>
> 0x18a2d7510 <+12>: pacibsp
>
> 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
>
> 0x18a2d7518 <+20>: movx29, sp
>
> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>
> Executable module set to
> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>
> Architecture set to: arm64-apple-macosx-.
>
> (lldb) cont
>
> Process 44691 resuming
>
> Process 44691 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
> frame #0: 0x00010ba40b60
> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
>
> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release:
>
> ->  0x10ba40b60 <+752>: addw8, w8, #0x1
>
> 0x10ba40b64 <+756>: ldrw9, [x22]
>
> 0x10ba40b68 <+760>: cmpw8, w9
>
> 0x10ba40b6c <+764>: b.lt   0x10ba40b4c   ; <+732>
>
> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>
> (lldb) bt
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
>   * frame #0: 0x00010ba40b60
> libpmpi.12.dylib`MPIDI_POSIX_mpi_release_gather_release + 752
>
> frame #1: 0x00010ba48528
> libpmpi.12.dylib`MPIDI_POSIX_mpi_allreduce_release_gather + 1088
>
> frame #2: 0x00010ba47964
> libpmpi.12.dylib`MPIDI_Allreduce_intra_composition_gamma + 368
>
> frame #3: 0x00010ba35e78 libpmpi.12.dylib`MPIR_Allreduce + 1588
>
> frame #4: 0x000103f587dc libmpi.12.dylib`MPI_Allreduce + 2280
>
> frame #5: 0x000106d67650
> libpetsc.3.17.dylib`MatZeroRows_MPIAIJ(A=0x000105846470, N=1,
> rows=0x00016dbfa9f4, diag=1, x=0x,
> b=0x) at mpiaij.c:827:3
>
> frame #6: 0x000106aadfac
> libpetsc.3.17.dylib`MatZeroRows(mat=0x000105846470, numRows=1,
> rows=0x00016dbfa9f4, diag=1, x=0x,
> b=0x) at matrix.c:5935:3
>
> frame #7: 0x0001023952d0
> fo_acoustic_streaming_solver_2d`IBAMR::AcousticStreamingPETScMatUtilities::constructPatchLevelFOAcousticStreamingOp(mat=0x00016dc04168,
> omega=1, sound_speed=1, rho_idx=3, mu_idx=2, lambda_idx=4,
> u_bc_coefs=0x00016dc04398, data_time=NaN, num_dofs_per_proc=size=3,
> u_dof_index_idx=27, p_dof_index_idx=28,
> patch_level=Pointer > @ 0x00016dbfcec0,
> mu_interp_type=VC_HARMONIC_INTERP) at
> AcousticStreamingPETScMatUtilities.cpp:799:36
>
> frame #8: 0x0001023acb8c
> fo_acoustic_streaming_solver_2d`IBAMR::FOAcousticStreamingPETScLevelSolver::initializeSolverStateSpecialized(this=0x00016dc04018,
> x=0x00016dc05778, (null)=0x00016dc05680) at
> FOAcousticStreamingPETScLevelSolver.cpp:149:5
>
> frame #9: 0x00010254a2dc
> fo_acoustic_streaming_solver_2d`IBTK::PETScLevelSolver::initializeSolverState(this=0x00016dc04018,
> x=0x00016dc05778, b=0x00016dc05680) at PETScLevelSolver.cpp:340:5
>
> frame #10: 0x000102202e5c
> fo_acoustic_streaming_solver_2d`main(argc=11, argv=0x00016dc07450) at
> fo_acoustic_streaming_solver.cpp:400:22
>
> frame #11: 0x000189fbbf28 dyld`start + 2236
>
> (lldb)
>
>
> Task 2:
>
> amneetb@APSB-MacBook-Pro-16:~$ lldb  -p 44692
>
> (lldb) process attach --pid 44692
>
> Process 44692 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
> frame #0: 0x00018a2d750c libsystem_kernel.dylib`__semwait_signal
> + 8
>
> libsystem_kernel.dylib`:
>
> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
>
> 0x18a2d7510 <+12>: pacibsp
>
> 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
>
> 0x18a2d7518 <+20>: movx29, sp
>
> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
>
> Executable module set to
> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
>
> Architecture set to: arm64-apple-macosx-.
>
> (lldb) cont
>
> Process 44692 resuming
>
> Process 44692 stopped
>
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
>
> frame #0: 0x00010e5a022c
> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 516
>
> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
>
> ->  0x10e5a022c <+516>: ldrx10, [x19, #0x4e8]
>
> 0x10e5a0230 <+520>: cmpx9, x10
>
> 0x10e5a0234 <+524>: b.hs   0x10e5a0254   ; <+556>
>
> 0x10e5a0238 <+528>: add  

Re: [petsc-users] [Xolotl-psi-development] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-29 Thread Fackler, Philip via petsc-users
I'm sorry for the extra confusion. I copied those arguments from the wrong 
place. We're actually using jacobi​ instead of sor​ for fieldsplit 0.

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

From: Blondel, Sophie 
Sent: Wednesday, November 29, 2023 11:03
To: Brown, Jed ; Fackler, Philip ; 
Junchao Zhang 
Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
performance losses switching to COO interface

Hi Jed,

I'm not sure I'm going to reply to your question correctly because I don't 
really understand how the split is done. Is it related to on diagonal and off 
diagonal? If so, the off-diagonal part is usually pretty small (less than 20 
DOFs) and related to diffusion, the diagonal part involves thousands of DOFs 
for the reaction term.

Let us know what we can do to answer this question more accurately.

Cheers,

Sophie

From: Jed Brown 
Sent: Tuesday, November 28, 2023 19:07
To: Fackler, Philip ; Junchao Zhang 

Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
performance losses switching to COO interface

[Some people who received this message don't often get email from 
j...@jedbrown.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification
 ]

"Fackler, Philip via petsc-users"  writes:

> That makes sense. Here are the arguments that I think are relevant:
>
> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type fieldsplit 
> -pc_fieldsplit_detect_coupling​

What sort of physics are in splits 0 and 1?

SOR is not a good GPU algorithm, so we'll want to change that one way or 
another. Are the splits of similar size or very different?

> What would you suggest to make this better?
>
> Also, note that the cases marked "serial" are running on CPU only, that is, 
> using only the SERIAL backend for kokkos.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang 
> Sent: Tuesday, November 28, 2023 15:51
> To: Fackler, Philip 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos does 
> not have a GPU LU implementation, we do it on CPU via 
> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>
> [Screenshot 2023-11-28 at 2.43.03 PM.png]
> --Junchao Zhang
>
>
> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> I definitely dropped the ball on this. I'm sorry for that. I have new 
> profiling data using the latest (as of yesterday) of petsc/main. I've put 
> them in a single google drive folder linked here:
>
> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link
>
> Have a happy holiday weekend!
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National 

Re: [petsc-users] [Xolotl-psi-development] [EXTERNAL] Re: Unexpected performance losses switching to COO interface

2023-11-29 Thread Blondel, Sophie via petsc-users
Hi Jed,

I'm not sure I'm going to reply to your question correctly because I don't 
really understand how the split is done. Is it related to on diagonal and off 
diagonal? If so, the off-diagonal part is usually pretty small (less than 20 
DOFs) and related to diffusion, the diagonal part involves thousands of DOFs 
for the reaction term.

Let us know what we can do to answer this question more accurately.

Cheers,

Sophie

From: Jed Brown 
Sent: Tuesday, November 28, 2023 19:07
To: Fackler, Philip ; Junchao Zhang 

Cc: petsc-users@mcs.anl.gov ; 
xolotl-psi-developm...@lists.sourceforge.net 

Subject: Re: [Xolotl-psi-development] [petsc-users] [EXTERNAL] Re: Unexpected 
performance losses switching to COO interface

[Some people who received this message don't often get email from 
j...@jedbrown.org. Learn why this is important at 
https://aka.ms/LearnAboutSenderIdentification ]

"Fackler, Philip via petsc-users"  writes:

> That makes sense. Here are the arguments that I think are relevant:
>
> -fieldsplit_1_pc_type redundant -fieldsplit_0_pc_type sor -pc_type fieldsplit 
> -pc_fieldsplit_detect_coupling​

What sort of physics are in splits 0 and 1?

SOR is not a good GPU algorithm, so we'll want to change that one way or 
another. Are the splits of similar size or very different?

> What would you suggest to make this better?
>
> Also, note that the cases marked "serial" are running on CPU only, that is, 
> using only the SERIAL backend for kokkos.
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang 
> Sent: Tuesday, November 28, 2023 15:51
> To: Fackler, Philip 
> Cc: petsc-users@mcs.anl.gov ; 
> xolotl-psi-developm...@lists.sourceforge.net 
> 
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>I opened hpcdb-PSI_9-serial and it seems you used PCLU.  Since Kokkos does 
> not have a GPU LU implementation, we do it on CPU via 
> MatLUFactorNumeric_SeqAIJ(). Perhaps you can try other PC types?
>
> [Screenshot 2023-11-28 at 2.43.03 PM.png]
> --Junchao Zhang
>
>
> On Wed, Nov 22, 2023 at 10:43 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> I definitely dropped the ball on this. I'm sorry for that. I have new 
> profiling data using the latest (as of yesterday) of petsc/main. I've put 
> them in a single google drive folder linked here:
>
> https://drive.google.com/drive/folders/14ScvyfxOzc4OzXs9HZVeQDO-g6FdIVAI?usp=drive_link
>
> Have a happy holiday weekend!
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Sent: Monday, October 16, 2023 15:24
> To: Fackler, Philip mailto:fackle...@ornl.gov>>
> Cc: petsc-users@mcs.anl.gov 
> mailto:petsc-users@mcs.anl.gov>>; 
> xolotl-psi-developm...@lists.sourceforge.net
>  
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Subject: Re: [EXTERNAL] Re: [petsc-users] Unexpected performance losses 
> switching to COO interface
>
> Hi, Philip,
>That branch was merged to petsc/main today. Let me know once you have new 
> profiling results.
>
>Thanks.
> --Junchao Zhang
>
>
> On Mon, Oct 16, 2023 at 9:33 AM Fackler, Philip 
> mailto:fackle...@ornl.gov>> wrote:
> Junchao,
>
> I've attached updated timing plots (red and blue are swapped from before; 
> yellow is the new one). There is an improvement for the NE_3 case only with 
> CUDA. Serial stays the same, and the PSI cases stay the same. In the PSI 
> cases, MatShift doesn't show up (I assume because we're using different 
> preconditioner arguments). So, there must be some other primary culprit. I'll 
> try to get updated profiling data to you soon.
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory
> 
> From: Fackler, Philip via Xolotl-psi-development 
> mailto:xolotl-psi-developm...@lists.sourceforge.net>>
> Sent: Wednesday, October 11, 2023 11:31
> To: Junchao Zhang mailto:junchao.zh...@gmail.com>>
> Cc: 

Re: [petsc-users] MPI barrier issue using MatZeroRows

2023-11-29 Thread Barry Smith


> On Nov 29, 2023, at 1:16 AM, Amneet Bhalla  wrote:
> 
> BTW, I think you meant using MatSetOption(mat, MAT_NO_OFF_PROC_ZERO_ROWS, 
> PETSC_TRUE)

Yes

>  instead ofMatSetOption(mat, MAT_NO_OFF_PROC_ENTRIES, PETSC_TRUE) ??

  Please try setting both flags.

>  However, that also did not help to overcome the MPI Barrier issue. 

  If there is still a problem please trap all the MPI processes when they hang 
in the debugger and send the output from using bt on all of them. This way
we can see the different places the different MPI processes are stuck at.


> 
> On Tue, Nov 28, 2023 at 9:57 PM Amneet Bhalla  > wrote:
>> I added that option but the code still gets stuck at the same call 
>> MatZeroRows with 3 processors. 
>> 
>> On Tue, Nov 28, 2023 at 7:23 PM Amneet Bhalla > > wrote:
>>> 
>>> 
>>> On Tue, Nov 28, 2023 at 6:42 PM Barry Smith >> > wrote:
 
   for (int comp = 0; comp < 2; ++comp)
 {
 ...
 for (Box::Iterator bc(bc_coef_box); bc; bc++)
 {
..
 if (IBTK::abs_equal_eps(b, 0.0))
 {
 const double diag_value = a;
 ierr = MatZeroRows(mat, 1, _dof_index, 
 diag_value, NULL, NULL);
 IBTK_CHKERRQ(ierr);
 }
 }
 }
 
 In general, this code will not work because each process calls MatZeroRows 
 a different number of times, so it cannot match up with all the processes.
 
 If u_dof_index is always local to the current process, you can call 
 MatSetOption(mat, MAT_NO_OFF_PROC_ENTRIES,PETSC_TRUE) above the for loop 
 and 
 the MatZeroRows will not synchronize across the MPI processes (since it 
 does not need to and you told it that).
>>> 
>>> Yes, u_dof_index is going to be local and I put a check on it a few lines 
>>> before calling MatZeroRows.
>>> 
>>> Can MatSetOption() be called after the matrix has been assembled?
>>> 
 
 If the u_dof_index will not always be local, then you need, on each 
 process, to list all the u_dof_index for each process in an array and then 
 call MatZeroRows()
 once after the loop so it can exchange the needed information with the 
 other MPI processes to get the row indices to the right place.
 
 Barry
 
 
 
 
> On Nov 28, 2023, at 6:44 PM, Amneet Bhalla  > wrote:
> 
> 
> Hi Folks, 
> 
> I am using MatZeroRows() to set Dirichlet boundary conditions. This works 
> fine for the serial run and the solver produces correct results (verified 
> through analytical solution). However, when I run the case in parallel, 
> the simulation gets stuck at MatZeroRows(). My understanding is that this 
> function needs to be called after the MatAssemblyBegin{End}() has been 
> called, and should be called by all processors. Here is that bit of the 
> code which calls MatZeroRows() after the matrix has been assembled
> 
> https://github.com/IBAMR/IBAMR/blob/amneetb/acoustically-driven-flows/src/acoustic_streaming/AcousticStreamingPETScMatUtilities.cpp#L724-L801
> 
> I ran the parallel code (on 3 processors) in the debugger 
> (-start_in_debugger). Below is the call stack from the processor that 
> gets stuck
> 
> amneetb@APSB-MBP-16:~$ lldb  -p 4307 
> (lldb) process attach --pid 4307
> Process 4307 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
> frame #0: 0x00018a2d750c libsystem_kernel.dylib`__semwait_signal 
> + 8
> libsystem_kernel.dylib`:
> ->  0x18a2d750c <+8>:  b.lo   0x18a2d752c   ; <+40>
> 0x18a2d7510 <+12>: pacibsp 
> 0x18a2d7514 <+16>: stpx29, x30, [sp, #-0x10]!
> 0x18a2d7518 <+20>: movx29, sp
> Target 0: (fo_acoustic_streaming_solver_2d) stopped.
> Executable module set to 
> "/Users/amneetb/Softwares/IBAMR-Git/objs-dbg/tests/IBTK/fo_acoustic_streaming_solver_2d".
> Architecture set to: arm64-apple-macosx-.
> (lldb) cont
> Process 4307 resuming
> Process 4307 stopped
> * thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGSTOP
> frame #0: 0x000109d281b8 
> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather + 400
> libpmpi.12.dylib`MPIDI_POSIX_mpi_barrier_release_gather:
> ->  0x109d281b8 <+400>: ldrw9, [x24]
> 0x109d281bc <+404>: cmpw8, w9
> 0x109d281c0 <+408>: b.lt    0x109d281a0   ; 
> <+376>
> 0x109d281c4 <+412>: bl 0x109d28e64   ; 
> MPID_Progress_test
> Target 0: 

[petsc-users] Reading VTK files in PETSc

2023-11-29 Thread Kevin G. Wang
Good morning everyone.

I use the following functions to output parallel vectors --- "globalVec" in
this example --- to VTK files. It works well, and is quite convenient.

~
  PetscViewer viewer;
  PetscViewerVTKOpen(PetscObjectComm((PetscObject)*dm), filename,
FILE_MODE_WRITE, );
  VecView(globalVec, viewer);
  PetscViewerDestroy();
~

Now, I am trying to do the opposite. I would like to read the VTK files
generated by PETSc back into memory, and assign each one to a Vec. Could
someone let me know how this can be done?

Thanks!
Kevin


-- 
Kevin G. Wang, Ph.D.
Associate Professor
Kevin T. Crofton Department of Aerospace and Ocean Engineering
Virginia Tech
1600 Innovation Dr., VTSS Rm 224H, Blacksburg, VA 24061
Office: (540) 231-7547  |  Mobile: (650) 862-2663
URL: https://www.aoe.vt.edu/people/faculty/wang.html
Codes: https://github.com/kevinwgy