This is the support structure minimization filter. So I need to go 
layer-by-layer from the bottommost slice of the array and update it as I move 
up. Every slice needs the updated values below that slice.

Vysakh

From: Blaise Bourdin <[email protected]>
Sent: Tuesday, January 17, 2023 4:47 PM
To: Venugopal, Vysakh (venugovh) <[email protected]>
Cc: Barry Smith <[email protected]>; [email protected]
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution


What type of filter are you implementing?
Convolution filters are expensive to parallelize since you need an overlap of 
the size of the support of the filter, but it may still not be worst than doing 
it sequentially (typically the filter size is only one or 2 element diameters). 
Or you may be able to apply the filter in Fourier space.
PDE-filters are typically elliptic and can be parallelized.

Blaise


On Jan 17, 2023, at 4:38 PM, Venugopal,
Vysakh (venugovh) via petsc-users 
<[email protected]<mailto:[email protected]>> wrote:

Thank you! I am doing a structural optimization filter that inherently cannot 
be parallelized.

Vysakh

From: Barry Smith <[email protected]<mailto:[email protected]>>
Sent: Tuesday, January 17, 2023 3:28 PM
To: Venugopal, Vysakh (venugovh) 
<[email protected]<mailto:[email protected]>>
Cc: [email protected]<mailto:[email protected]>
Subject: Re: [petsc-users] about repeat of expensive functions using 
VecScatterCreateToAll


External Email: Use Caution







On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
<[email protected]<mailto:[email protected]>> wrote:

Hi,

I am doing the following thing.

Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.
Step 2. Doing some parallel operations on V.
Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 
‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.
Step 4. I am performing an expensive operation on V_SEQ and outputting the 
updated V_SEQ.
Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and 
sequential flipped) to get V that is updated with new values from V_SEQ.
Step 6. I continue using this new V on the rest of the parallelized program.

Question: Suppose I have n MPI processes, is the expensive operation in Step 4 
repeated n times? If yes, is there a workaround such that the operation in Step 
4 is performed only once? I would like to follow the same structure as steps 1 
to 6 with step 4 only performed once.

  Each MPI rank is doing the same operations on its copy of the sequential 
vector. Since they are running in parallel it probably does not matter much 
that each is doing the same computation. Step 5 does not require any MPI since 
only part of the sequential vector (which everyone has) is needed in the 
parallel vector.

  You could use VecScatterCreateToZero() but then step 3 would require less 
communication but step 5 would require communication to get parts of the 
solution from rank 0 to the other ranks. The time for step 4 would be roughly 
the same.

  You will likely only see a worthwhile improvement in performance if you can 
parallelize the computation in 4. What are you doing that is computational 
intense and requires all the data on a rank?

Barry




Thanks,

Vysakh Venugopal
---
Vysakh Venugopal
Ph.D. Candidate
Department of Mechanical Engineering
University of Cincinnati, Cincinnati, OH 45221-0072

—
Canada Research Chair in Mathematical and Computational Aspects of Solid 
Mechanics (Tier 1)
Professor, Department of Mathematics & Statistics
Hamilton Hall room 409A, McMaster University
1280 Main Street West, Hamilton, Ontario L8S 4K1, Canada
https://www.math.mcmaster.ca/bourdin | +1 (905) 525 9140 ext. 27243

Reply via email to