Re: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR)

2023-01-17 Thread Barry Smith

   It appears that Tao is not written to allow multiple TaoSolve() on the same 
Tao object; this is different from KSP, SNES, and TS. 

   If you look at the converged reason at the beginning of the second 
TaoSolve() you will see it is the reason that occurred in the first solve and 
all the Tao data structures are in the state they were in when the previous 
TaoSolve ended. Thus it is using incorrect flags and previous matrices 
incorrectly.

   Fixing this would be a largish process I think. 

   I added an error check for TaoSolve that checks if converged reason is not 
iterating (meaning the Tao object was previously used and left in a bad state) 
so that this same problem won't come up for other users. 
https://gitlab.com/petsc/petsc/-/merge_requests/5986

  Barry


> On Jan 16, 2023, at 5:07 PM, Blaise Bourdin  wrote:
> 
> Hi,
> 
> I am attaching a small modification of the eptorsion1.c example that 
> replicates the issue. It looks like this bug is triggered when the upper and 
> lower bounds are equal on enough (10) degrees of freedom.
>  
> 
>  Elastic-Plastic Torsion Problem -
> mx: 10 my: 10   
> 
> i: 0
> i: 1
> i: 2
> i: 3
> i: 4
> i: 5
> i: 6
> i: 7
> i: 8
> i: 9
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Nonconforming object sizes
> [0]PETSC ERROR: Preconditioner number of local rows 44 does not equal input 
> vector size 54
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.18.2-242-g4615508c7fc  GIT 
> Date: 2022-11-28 10:21:46 -0600
> [0]PETSC ERROR: ./eptorsion1 on a ventura-gcc12.2-arm64-g64 named 
> bblaptop.math.mcmaster.ca by blaise Mon Jan 16 17:06:57 2023
> [0]PETSC ERROR: Configure options --CFLAGS="-Wimplicit-function-declaration 
> -Wunused" --FFLAGS="-ffree-line-length-none -fallow-argument-mismatch 
> -Wunused" --download-ctetgen=1 --download-exodusii=1 --download-hdf5=1 
> --download-hypre=1 --download-metis=1 --download-netcdf=1 --download-mumps=1 
> --download-parmetis=1 --download-pnetcdf=1 --download-scalapack 
> --download-triangle=1 --download-zlib=1 --with-64-bit-indices=1 
> --with-debugging=1 --with-exodusii-fortran-bindings --with-shared-libraries=1 
> --with-x11=0
> [0]PETSC ERROR: #1 PCApply() at 
> /opt/HPC/petsc-main/src/ksp/pc/interface/precon.c:434
> [0]PETSC ERROR: #2 KSP_PCApply() at 
> /opt/HPC/petsc-main/include/petsc/private/kspimpl.h:380
> [0]PETSC ERROR: #3 KSPCGSolve_STCG() at 
> /opt/HPC/petsc-main/src/ksp/ksp/impls/cg/stcg/stcg.c:76
> [0]PETSC ERROR: #4 KSPSolve_Private() at 
> /opt/HPC/petsc-main/src/ksp/ksp/interface/itfunc.c:898
> [0]PETSC ERROR: #5 KSPSolve() at 
> /opt/HPC/petsc-main/src/ksp/ksp/interface/itfunc.c:1070
> [0]PETSC ERROR: #6 TaoBNKComputeStep() at 
> /opt/HPC/petsc-main/src/tao/bound/impls/bnk/bnk.c:459
> [0]PETSC ERROR: #7 TaoSolve_BNTR() at 
> /opt/HPC/petsc-main/src/tao/bound/impls/bnk/bntr.c:138
> [0]PETSC ERROR: #8 TaoSolve() at 
> /opt/HPC/petsc-main/src/tao/interface/taosolver.c:177
> [0]PETSC ERROR: #9 main() at eptorsion1.c:166
> [0]PETSC ERROR: No PETSc Option Table entries
> [0]PETSC ERROR: End of Error Message ---send entire error 
> message to petsc-ma...@mcs.anl.gov--
> Abort(60) on node 0 (rank 0 in comm 16): application called 
> MPI_Abort(MPI_COMM_SELF, 60) - process 0
> 
> I hope that this helps.
> 
> Blaise
> 
> 
>> On Jan 16, 2023, at 3:14 PM, Alexis Marboeuf  
>> wrote:
>> 
>> Hi Matt,
>> After investigation, it fails because, at some point, the boolean needH is 
>> set to PETSC_FALSE when initializing the BNK method with TAOBNKInitialize 
>> (line 103 of $PETSC_DIR/src/tao/bound/impls/bnk/bntr.c). The Hessian and the 
>> precondtitioner are thus not updated throughout the TAO iterations. It has 
>> something to do with the option BNK_INIT_INTERPOLATION set by default. It 
>> works when I choose BNK_INIT_CONSTANT. In my case, in all the successful 
>> calls of TAOSolve, the computed trial objective value is better than the 
>> current value which implies needH = PETSC_TRUE within TAOBNKInitialize. At 
>> some point, the trial value becomes equal to the current objective value up 
>> to machine precision and then, needH = PETSC_FALSE. I have to admit I am 
>> struggling understanding how that boolean needH is computed when BNK is 
>> initialized with BNK_INIT_INTERPOLATION. Can you help me with that?
>> Thanks a lot.
>> Alexis
>> De : Alexis Marboeuf 
>> Envoyé : samedi 14 janvier 2023 05:24
>> À : Matthew Knepley 
>> Cc : petsc-users@mcs.anl.gov 
>> Objet : RE: [petsc-users] Nonconforming object sizes using TAO (TAOBNTR)
>>  
>> Hi Matt,
>> Indeed, it fails on 1 process with the same error. The source code is 
>> available here: https://github.com/bourdin/mef90 (branch 
>> marboeuf/vdef-tao-test)
>>    
>> GitHub - bourdin/mef90: Official repository for mef90/vDef 

Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll

2023-01-17 Thread Venugopal, Vysakh (venugovh) via petsc-users
Sure, I will try this. I will update this thread once I get it working using 
the suggested method. Thank you!

Vysakh

From: Blaise Bourdin 
Sent: Tuesday, January 17, 2023 5:13 PM
To: Venugopal, Vysakh (venugovh) 
Cc: Barry Smith ; petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution


Got it. Can you partition your mesh with only one processor in the z-direction? 
(Trivial if using DMDA)
Blaise



On Jan 17, 2023, at 4:49 PM, Venugopal, Vysakh (venugovh) 
mailto:venug...@mail.uc.edu>> wrote:

This is the support structure minimization filter. So I need to go 
layer-by-layer from the bottommost slice of the array and update it as I move 
up. Every slice needs the updated values below that slice.

Vysakh

From: Blaise Bourdin mailto:bour...@mcmaster.ca>>
Sent: Tuesday, January 17, 2023 4:47 PM
To: Venugopal, Vysakh (venugovh) 
mailto:venug...@mail.uc.edu>>
Cc: Barry Smith mailto:bsm...@petsc.dev>>; 
petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution



What type of filter are you implementing?
Convolution filters are expensive to parallelize since you need an overlap of 
the size of the support of the filter, but it may still not be worst than doing 
it sequentially (typically the filter size is only one or 2 element diameters). 
Or you may be able to apply the filter in Fourier space.
PDE-filters are typically elliptic and can be parallelized.

Blaise



On Jan 17, 2023, at 4:38 PM, Venugopal,
Vysakh (venugovh) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Thank you! I am doing a structural optimization filter that inherently cannot 
be parallelized.

Vysakh

From: Barry Smith mailto:bsm...@petsc.dev>>
Sent: Tuesday, January 17, 2023 3:28 PM
To: Venugopal, Vysakh (venugovh) 
mailto:venug...@mail.uc.edu>>
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution









On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Hi,

I am doing the following thing.

Step 1. Create DM object and get global vector 'V' using DMGetGlobalVector.
Step 2. Doing some parallel operations on V.
Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 
'V_SEQ' using VecScatterBegin/End with SCATTER_FORWARD.
Step 4. I am performing an expensive operation on V_SEQ and outputting the 
updated V_SEQ.
Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and 
sequential flipped) to get V that is updated with new values from V_SEQ.
Step 6. I continue using this new V on the rest of the parallelized program.

Question: Suppose I have n MPI processes, is the expensive operation in Step 4 
repeated n times? If yes, is there a workaround such that the operation in Step 
4 is performed only once? I would like to follow the same structure as steps 1 
to 6 with step 4 only performed once.

  Each MPI rank is doing the same operations on its copy of the sequential 
vector. Since they are running in parallel it probably does not matter much 
that each is doing the same computation. Step 5 does not require any MPI since 
only part of the sequential vector (which everyone has) is needed in the 
parallel vector.

  You could use VecScatterCreateToZero() but then step 3 would require less 
communication but step 5 would require communication to get parts of the 
solution from rank 0 to the other ranks. The time for step 4 would be roughly 
the same.

  You will likely only see a worthwhile improvement in performance if you can 
parallelize the computation in 4. What are you doing that is computational 
intense and requires all the data on a rank?

Barry





Thanks,

Vysakh Venugopal
---
Vysakh Venugopal
Ph.D. Candidate
Department of Mechanical Engineering
University of Cincinnati, Cincinnati, OH 45221-0072

-
Canada Research Chair in Mathematical and Computational Aspects of Solid 
Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
https://www.math.mcmaster.ca/bourdin
 | +1 (905) 525 9140 ext. 27243

-
Canada Research Chair in Mathematical and Computational Aspects of Solid 
Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main 

Re: [petsc-users] Performance problem using COO interface

2023-01-17 Thread Zhang, Junchao via petsc-users
Hi, Philip,
  Could you add -log_view and see what functions are used in the solve? Since 
it is CPU-only, perhaps with -log_view of different runs, we can easily see 
which functions slowed down.

--Junchao Zhang

From: Fackler, Philip 
Sent: Tuesday, January 17, 2023 4:13 PM
To: xolotl-psi-developm...@lists.sourceforge.net 
; petsc-users@mcs.anl.gov 

Cc: Mills, Richard Tran ; Zhang, Junchao 
; Blondel, Sophie ; Roth, Philip 

Subject: Performance problem using COO interface

In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's 
COO interface for creating the Jacobian matrix (and the Kokkos interface for 
interacting with Vec entries). As the attached plots show for one case, while 
the code for computing the RHSFunction and RHSJacobian perform similarly (or 
slightly better) after the port, the performance for the solve as a whole is 
significantly worse.

Note:
This is all CPU-only (so kokkos and kokkos-kernels are built with only the 
serial backend).
The dev version is using MatSetValuesStencil with the default implementations 
for Mat and Vec.
The port version is using MatSetValuesCOO and is run with -dm_mat_type 
aijkokkos -dm_vec_type kokkos​.
The port/def version is using MatSetValuesCOO and is run with -dm_vec_type 
kokkos​ (using the default Mat implementation).

So, this seems to be due be a performance difference in the petsc 
implementations. Please advise. Is this a known issue? Or am I missing 
something?

Thank you for the help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory


Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll

2023-01-17 Thread Blaise Bourdin



Got it. Can you partition your mesh with only one processor in the z-direction? (Trivial if using DMDA)
Blaise



On Jan 17, 2023, at 4:49 PM, Venugopal, Vysakh (venugovh)  wrote:



This is the support structure minimization filter. So I need to go layer-by-layer from the bottommost slice of the array and update it
 as I move up. Every slice needs the updated values below that slice.
 
Vysakh
 


From: Blaise Bourdin  
Sent: Tuesday, January 17, 2023 4:47 PM
To: Venugopal, Vysakh (venugovh) 
Cc: Barry Smith ; petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll


 
External Email: Use Caution




What type of filter are you implementing?

Convolution filters are expensive to parallelize since you need an overlap of the size of the support of the filter, but it may still not be worst than doing it sequentially (typically
 the filter size is only one or 2 element diameters). Or you may be able to apply the filter in Fourier space.


PDE-filters are typically elliptic and can be parallelized.

 


Blaise







On Jan 17, 2023, at 4:38 PM, Venugopal,
Vysakh (venugovh) via petsc-users  wrote:

 


Thank you! I am doing a structural optimization filter that inherently cannot be parallelized.


 


Vysakh


 




From: Barry Smith  
Sent: Tuesday, January 17, 2023 3:28 PM
To: Venugopal, Vysakh (venugovh) 
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll




 

External Email: Use Caution








 











On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
 wrote:



 




Hi,




 




I am doing the following thing.




 




Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.




Step 2. Doing some parallel operations on V.




Step 3. I am using VecScatterCreateToAll on V to create a sequential vector ‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.




Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ.




Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from
 V_SEQ.




Step 6. I continue using this new V on the rest of the parallelized program.




 




Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that
 the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once.






 



  Each MPI rank is doing the same operations on its copy of the sequential vector. Since they are running in parallel it probably does not matter much that each is doing the same computation.
 Step 5 does not require any MPI since only part of the sequential vector (which everyone has) is needed in the parallel vector.




 




  You could use VecScatterCreateToZero() but then step 3 would require less communication but step 5 would require communication to get parts of the solution from rank 0 to the other
 ranks. The time for step 4 would be roughly the same.




 




  You will likely only see a worthwhile improvement in performance if you can parallelize the computation in 4. What are you doing that is computational intense and requires all the
 data on a rank?




 




Barry













 




Thanks,




 




Vysakh Venugopal





---





Vysakh Venugopal





Ph.D. Candidate





Department of Mechanical Engineering





University of Cincinnati, Cincinnati, OH 45221-0072









 

















— 
Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)


Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243








































— 
Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243






















[petsc-users] Performance problem using COO interface

2023-01-17 Thread Fackler, Philip via petsc-users
In Xolotl's feature-petsc-kokkos branch I have ported the code to use petsc's 
COO interface for creating the Jacobian matrix (and the Kokkos interface for 
interacting with Vec entries). As the attached plots show for one case, while 
the code for computing the RHSFunction and RHSJacobian perform similarly (or 
slightly better) after the port, the performance for the solve as a whole is 
significantly worse.

Note:
This is all CPU-only (so kokkos and kokkos-kernels are built with only the 
serial backend).
The dev version is using MatSetValuesStencil with the default implementations 
for Mat and Vec.
The port version is using MatSetValuesCOO and is run with -dm_mat_type 
aijkokkos -dm_vec_type kokkos​.
The port/def version is using MatSetValuesCOO and is run with -dm_vec_type 
kokkos​ (using the default Mat implementation).

So, this seems to be due be a performance difference in the petsc 
implementations. Please advise. Is this a known issue? Or am I missing 
something?

Thank you for the help,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory


Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll

2023-01-17 Thread Venugopal, Vysakh (venugovh) via petsc-users
This is the support structure minimization filter. So I need to go 
layer-by-layer from the bottommost slice of the array and update it as I move 
up. Every slice needs the updated values below that slice.

Vysakh

From: Blaise Bourdin 
Sent: Tuesday, January 17, 2023 4:47 PM
To: Venugopal, Vysakh (venugovh) 
Cc: Barry Smith ; petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution


What type of filter are you implementing?
Convolution filters are expensive to parallelize since you need an overlap of 
the size of the support of the filter, but it may still not be worst than doing 
it sequentially (typically the filter size is only one or 2 element diameters). 
Or you may be able to apply the filter in Fourier space.
PDE-filters are typically elliptic and can be parallelized.

Blaise


On Jan 17, 2023, at 4:38 PM, Venugopal,
Vysakh (venugovh) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Thank you! I am doing a structural optimization filter that inherently cannot 
be parallelized.

Vysakh

From: Barry Smith mailto:bsm...@petsc.dev>>
Sent: Tuesday, January 17, 2023 3:28 PM
To: Venugopal, Vysakh (venugovh) 
mailto:venug...@mail.uc.edu>>
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution







On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Hi,

I am doing the following thing.

Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.
Step 2. Doing some parallel operations on V.
Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 
‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.
Step 4. I am performing an expensive operation on V_SEQ and outputting the 
updated V_SEQ.
Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and 
sequential flipped) to get V that is updated with new values from V_SEQ.
Step 6. I continue using this new V on the rest of the parallelized program.

Question: Suppose I have n MPI processes, is the expensive operation in Step 4 
repeated n times? If yes, is there a workaround such that the operation in Step 
4 is performed only once? I would like to follow the same structure as steps 1 
to 6 with step 4 only performed once.

  Each MPI rank is doing the same operations on its copy of the sequential 
vector. Since they are running in parallel it probably does not matter much 
that each is doing the same computation. Step 5 does not require any MPI since 
only part of the sequential vector (which everyone has) is needed in the 
parallel vector.

  You could use VecScatterCreateToZero() but then step 3 would require less 
communication but step 5 would require communication to get parts of the 
solution from rank 0 to the other ranks. The time for step 4 would be roughly 
the same.

  You will likely only see a worthwhile improvement in performance if you can 
parallelize the computation in 4. What are you doing that is computational 
intense and requires all the data on a rank?

Barry




Thanks,

Vysakh Venugopal
---
Vysakh Venugopal
Ph.D. Candidate
Department of Mechanical Engineering
University of Cincinnati, Cincinnati, OH 45221-0072

—
Canada Research Chair in Mathematical and Computational Aspects of Solid 
Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243



Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll

2023-01-17 Thread Blaise Bourdin



What type of filter are you implementing?
Convolution filters are expensive to parallelize since you need an overlap of the size of the support of the filter, but it may still not be worst than doing it sequentially (typically the filter size is only one or 2 element diameters). Or you may be
 able to apply the filter in Fourier space.
PDE-filters are typically elliptic and can be parallelized.


Blaise


On Jan 17, 2023, at 4:38 PM, Venugopal,
Vysakh (venugovh) via petsc-users  wrote:



Thank you! I am doing a structural optimization filter that inherently cannot be parallelized.
 
Vysakh
 


From: Barry Smith  
Sent: Tuesday, January 17, 2023 3:28 PM
To: Venugopal, Vysakh (venugovh) 
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll


 
External Email: Use Caution




 






On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
 wrote:

 


Hi,


 


I am doing the following thing.


 


Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.


Step 2. Doing some parallel operations on V.


Step 3. I am using VecScatterCreateToAll on V to create a sequential vector ‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.


Step 4. I am performing an expensive operation on V_SEQ and outputting the updated V_SEQ.


Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and sequential flipped) to get V that is updated with new values from
 V_SEQ.


Step 6. I continue using this new V on the rest of the parallelized program.


 


Question: Suppose I have n MPI processes, is the expensive operation in Step 4 repeated n times? If yes, is there a workaround such that
 the operation in Step 4 is performed only once? I would like to follow the same structure as steps 1 to 6 with step 4 only performed once.




 

  Each MPI rank is doing the same operations on its copy of the sequential vector. Since they are running in parallel it probably does not matter much that each is doing the same computation.
 Step 5 does not require any MPI since only part of the sequential vector (which everyone has) is needed in the parallel vector.


 


  You could use VecScatterCreateToZero() but then step 3 would require less communication but step 5 would require communication to get parts of the solution from rank 0 to the other
 ranks. The time for step 4 would be roughly the same.


 


  You will likely only see a worthwhile improvement in performance if you can parallelize the computation in 4. What are you doing that is computational intense and requires all the
 data on a rank?


 


Barry








 


Thanks,


 


Vysakh Venugopal



---



Vysakh Venugopal



Ph.D. Candidate



Department of Mechanical Engineering



University of Cincinnati, Cincinnati, OH 45221-0072


























— 
Canada Research Chair in Mathematical and Computational Aspects of Solid Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada 
https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243






















Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll

2023-01-17 Thread Venugopal, Vysakh (venugovh) via petsc-users
Thank you! I am doing a structural optimization filter that inherently cannot 
be parallelized.

Vysakh

From: Barry Smith 
Sent: Tuesday, January 17, 2023 3:28 PM
To: Venugopal, Vysakh (venugovh) 
Cc: petsc-users@mcs.anl.gov
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution





On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
mailto:petsc-users@mcs.anl.gov>> wrote:

Hi,

I am doing the following thing.

Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.
Step 2. Doing some parallel operations on V.
Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 
‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.
Step 4. I am performing an expensive operation on V_SEQ and outputting the 
updated V_SEQ.
Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and 
sequential flipped) to get V that is updated with new values from V_SEQ.
Step 6. I continue using this new V on the rest of the parallelized program.

Question: Suppose I have n MPI processes, is the expensive operation in Step 4 
repeated n times? If yes, is there a workaround such that the operation in Step 
4 is performed only once? I would like to follow the same structure as steps 1 
to 6 with step 4 only performed once.

  Each MPI rank is doing the same operations on its copy of the sequential 
vector. Since they are running in parallel it probably does not matter much 
that each is doing the same computation. Step 5 does not require any MPI since 
only part of the sequential vector (which everyone has) is needed in the 
parallel vector.

  You could use VecScatterCreateToZero() but then step 3 would require less 
communication but step 5 would require communication to get parts of the 
solution from rank 0 to the other ranks. The time for step 4 would be roughly 
the same.

  You will likely only see a worthwhile improvement in performance if you can 
parallelize the computation in 4. What are you doing that is computational 
intense and requires all the data on a rank?

Barry



Thanks,

Vysakh Venugopal
---
Vysakh Venugopal
Ph.D. Candidate
Department of Mechanical Engineering
University of Cincinnati, Cincinnati, OH 45221-0072



Re: [petsc-users] about repeat of expensive functions using VecScatterCreateToAll

2023-01-17 Thread Barry Smith


> On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
>  wrote:
> 
> Hi,
>  
> I am doing the following thing.
>  
> Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.
> Step 2. Doing some parallel operations on V.
> Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 
> ‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.
> Step 4. I am performing an expensive operation on V_SEQ and outputting the 
> updated V_SEQ.
> Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and 
> sequential flipped) to get V that is updated with new values from V_SEQ.
> Step 6. I continue using this new V on the rest of the parallelized program.
>  
> Question: Suppose I have n MPI processes, is the expensive operation in Step 
> 4 repeated n times? If yes, is there a workaround such that the operation in 
> Step 4 is performed only once? I would like to follow the same structure as 
> steps 1 to 6 with step 4 only performed once.

  Each MPI rank is doing the same operations on its copy of the sequential 
vector. Since they are running in parallel it probably does not matter much 
that each is doing the same computation. Step 5 does not require any MPI since 
only part of the sequential vector (which everyone has) is needed in the 
parallel vector.

  You could use VecScatterCreateToZero() but then step 3 would require less 
communication but step 5 would require communication to get parts of the 
solution from rank 0 to the other ranks. The time for step 4 would be roughly 
the same.

  You will likely only see a worthwhile improvement in performance if you can 
parallelize the computation in 4. What are you doing that is computational 
intense and requires all the data on a rank?

Barry

>  
> Thanks,
>  
> Vysakh Venugopal
> ---
> Vysakh Venugopal
> Ph.D. Candidate
> Department of Mechanical Engineering
> University of Cincinnati, Cincinnati, OH 45221-0072



[petsc-users] about repeat of expensive functions using VecScatterCreateToAll

2023-01-17 Thread Venugopal, Vysakh (venugovh) via petsc-users
Hi,

I am doing the following thing.

Step 1. Create DM object and get global vector 'V' using DMGetGlobalVector.
Step 2. Doing some parallel operations on V.
Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 
'V_SEQ' using VecScatterBegin/End with SCATTER_FORWARD.
Step 4. I am performing an expensive operation on V_SEQ and outputting the 
updated V_SEQ.
Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and 
sequential flipped) to get V that is updated with new values from V_SEQ.
Step 6. I continue using this new V on the rest of the parallelized program.

Question: Suppose I have n MPI processes, is the expensive operation in Step 4 
repeated n times? If yes, is there a workaround such that the operation in Step 
4 is performed only once? I would like to follow the same structure as steps 1 
to 6 with step 4 only performed once.

Thanks,

Vysakh Venugopal
---
Vysakh Venugopal
Ph.D. Candidate
Department of Mechanical Engineering
University of Cincinnati, Cincinnati, OH 45221-0072



Re: [petsc-users] DMPlex and CGNS

2023-01-17 Thread Jed Brown
Copying my private reply that appeared off-list. If you have one base with 
different element types, that's in scope for what I plan to develop soon.

Congrats, you crashed cgnsview.

$ cgnsview dl/HybridGrid.cgns
Error in startup script: file was not found
while executing
"CGNSfile $ProgData(file)"
(procedure "file_stats" line 4)
invoked from within
"file_stats"
(procedure "file_load" line 53)
invoked from within
"file_load $fname"
invoked from within
"if {$argc} {
  set fname [lindex $argv [expr $argc - 1]]
  if {[file isfile $fname] && [file readable $fname]} {
file_load $fname
  }
}"
(file "/usr/share/cgnstools/cgnsview.tcl" line 3013)

This file looks okay in cgnscheck and paraview, but I don't have a need for 
multi-block and I'm stretched really thin so probably won't make it work any 
time soon. But if
you make a single block with HexElements alongside PyramidElements and 
TetElements, I should be able to read it. If you don't mind prepping such a 
file (this size or
smaller), it would help me test.


"Engblom, William A."  writes:

> Jesus,
>
> The CGNS files we get from Pointwise have only one base, so that should not 
> be an issue.  However, sections are needed to contain each cell type, the 
> BCs, and zonal boundaries. So, there are always several sections.  The grid 
> that Spencer made for you must have multiple sections.  We have to be able to 
> deal with grids like Spencer's example or else it's not useful.
>
> B.
>
>
>
>
>
>
> 
> From: Ferrand, Jesus A. 
> Sent: Monday, January 16, 2023 5:41 PM
> To: petsc-users@mcs.anl.gov 
> Subject: DMPlex and CGNS
>
> Dear PETSc team:
>
> I would like to use DMPlex to partition a mesh stored as a CGNS file. I 
> configured my installation with --download_cgns = 1, got me a .cgns file and 
> called DMPlexCreateCGNSFromFile() on it. Doing so got me this error:
>
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: Error in external library
> [0]PETSC ERROR: CGNS file must have a single section, not 4
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.18.3, unknown
> [0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon 
> Jan 16 17:25:11 2023
> [0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes 
> --download-cgns=yes --download-metis=yes --download-parmetis=yes 
> --download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1
> [0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104
> [0]PETSC ERROR: #2 DMPlexCreateCGNS() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60
> [0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27
> [0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at 
> /home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29
>
> I looked around mail archives for clused and found this one 
> (https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). 
> There, Matt provides a link to the source code for DMPlexCreateCGNSFromFile() 
> and another (seemingly broken) link to CGNS files that can be opened with the 
> former.  After reading the source code I now understand that it is hardcoded 
> for CGNS files that feature a single "base" and a single "section", whatever 
> those are.
>
> After navigating the CGNS documentation, I can sympathize with the comments 
> in the source code.
>
> Anyhow, I wanted to ask if I could be furnished with one such CGNS file that 
> is compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS 
> files to conform to it. If not, I will look into building the DAG myself 
> using DMPlex APIs.
>
>
> Sincerely:
>
> J.A. Ferrand
>
> Embry-Riddle Aeronautical University - Daytona Beach FL
>
> M.Sc. Aerospace Engineering
>
> B.Sc. Aerospace Engineering
>
> B.Sc. Computational Mathematics
>
>
> Phone: (386)-843-1829
>
> Email(s): ferra...@my.erau.edu
>
> jesus.ferr...@gmail.com


Re: [petsc-users] DMPlex and CGNS

2023-01-17 Thread Engblom, William A.
Jesus,

The CGNS files we get from Pointwise have only one base, so that should not be 
an issue.  However, sections are needed to contain each cell type, the BCs, and 
zonal boundaries. So, there are always several sections.  The grid that Spencer 
made for you must have multiple sections.  We have to be able to deal with 
grids like Spencer's example or else it's not useful.

B.







From: Ferrand, Jesus A. 
Sent: Monday, January 16, 2023 5:41 PM
To: petsc-users@mcs.anl.gov 
Subject: DMPlex and CGNS

Dear PETSc team:

I would like to use DMPlex to partition a mesh stored as a CGNS file. I 
configured my installation with --download_cgns = 1, got me a .cgns file and 
called DMPlexCreateCGNSFromFile() on it. Doing so got me this error:

[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: Error in external library
[0]PETSC ERROR: CGNS file must have a single section, not 4
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.18.3, unknown
[0]PETSC ERROR: ./program.exe on a arch-linux-c-debug named F86 by jesus Mon 
Jan 16 17:25:11 2023
[0]PETSC ERROR: Configure options --download-mpich=yes --download-hdf5=yes 
--download-cgns=yes --download-metis=yes --download-parmetis=yes 
--download-ptscotch=yes --download-chaco=yes --with-32bits-pci-domain=1
[0]PETSC ERROR: #1 DMPlexCreateCGNS_Internal() at 
/home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:104
[0]PETSC ERROR: #2 DMPlexCreateCGNS() at 
/home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:60
[0]PETSC ERROR: #3 DMPlexCreateCGNSFromFile_Internal() at 
/home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/cgns/plexcgns2.c:27
[0]PETSC ERROR: #4 DMPlexCreateCGNSFromFile() at 
/home/jesus/Desktop/JAF_NML/3rd_Party/PETSc/petsc/src/dm/impls/plex/plexcgns.c:29

I looked around mail archives for clused and found this one 
(https://lists.mcs.anl.gov/pipermail/petsc-users/2018-June/035544.html). There, 
Matt provides a link to the source code for DMPlexCreateCGNSFromFile() and 
another (seemingly broken) link to CGNS files that can be opened with the 
former.  After reading the source code I now understand that it is hardcoded 
for CGNS files that feature a single "base" and a single "section", whatever 
those are.

After navigating the CGNS documentation, I can sympathize with the comments in 
the source code.

Anyhow, I wanted to ask if I could be furnished with one such CGNS file that is 
compatible with DMPlexCreateCGNSFromFile() to see if I can modify my CGNS files 
to conform to it. If not, I will look into building the DAG myself using DMPlex 
APIs.


Sincerely:

J.A. Ferrand

Embry-Riddle Aeronautical University - Daytona Beach FL

M.Sc. Aerospace Engineering

B.Sc. Aerospace Engineering

B.Sc. Computational Mathematics


Phone: (386)-843-1829

Email(s): ferra...@my.erau.edu

jesus.ferr...@gmail.com