> On Jan 17, 2023, at 3:12 PM, Venugopal, Vysakh (venugovh) via petsc-users 
> <[email protected]> wrote:
> 
> Hi,
>  
> I am doing the following thing.
>  
> Step 1. Create DM object and get global vector ‘V’ using DMGetGlobalVector.
> Step 2. Doing some parallel operations on V.
> Step 3. I am using VecScatterCreateToAll on V to create a sequential vector 
> ‘V_SEQ’ using VecScatterBegin/End with SCATTER_FORWARD.
> Step 4. I am performing an expensive operation on V_SEQ and outputting the 
> updated V_SEQ.
> Step 5. I am using VecScatterBegin/End with SCATTER_REVERSE (global and 
> sequential flipped) to get V that is updated with new values from V_SEQ.
> Step 6. I continue using this new V on the rest of the parallelized program.
>  
> Question: Suppose I have n MPI processes, is the expensive operation in Step 
> 4 repeated n times? If yes, is there a workaround such that the operation in 
> Step 4 is performed only once? I would like to follow the same structure as 
> steps 1 to 6 with step 4 only performed once.

  Each MPI rank is doing the same operations on its copy of the sequential 
vector. Since they are running in parallel it probably does not matter much 
that each is doing the same computation. Step 5 does not require any MPI since 
only part of the sequential vector (which everyone has) is needed in the 
parallel vector.

  You could use VecScatterCreateToZero() but then step 3 would require less 
communication but step 5 would require communication to get parts of the 
solution from rank 0 to the other ranks. The time for step 4 would be roughly 
the same.

  You will likely only see a worthwhile improvement in performance if you can 
parallelize the computation in 4. What are you doing that is computational 
intense and requires all the data on a rank?

Barry

>  
> Thanks,
>  
> Vysakh Venugopal
> ---
> Vysakh Venugopal
> Ph.D. Candidate
> Department of Mechanical Engineering
> University of Cincinnati, Cincinnati, OH 45221-0072

Reply via email to