On 08/20/14 12:11, Karl Rupp wrote:
Hi Pierre,

> I have a cluster with nodes of 2 sockets of 4 cores+1 GPU.

Is there a way to run a calculation with 4*N MPI tasks where
my matrix is first built outside PETSc, then to solve the
linear system using PETSc Mat, Vec, KSP on only N MPI
tasks to adress efficiently the N GPUs ?

as far as I can tell, this should be possible with a suitable subcommunicator. The tricky piece, however, is to select the right MPI ranks for this. Note that you generally have no guarantee on how the MPI ranks are distributed across the nodes, so be prepared for something fairly specific to your MPI installation.
Yes, I am ready to face this point too.


I am playing with the communicators without success, but I
am surely confusing things...

To keep matters simple, try to get this scenario working with a purely CPU-based solve. Once this works, the switch to GPUs should be just a matter of passing the right flags. Have a look at PetscInitialize() here: http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html which mentions that you need to create the subcommunicator of MPI_COMM_WORLD first.

I also started the work with a purely CPU-based solve only to test, but without success. When
I read this:

"If you wish PETSc code to run ONLY on a subcommunicator of MPI_COMM_WORLD, create that communicator first and assign it to PETSC_COMM_WORLD <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PETSC_COMM_WORLD.html#PETSC_COMM_WORLD> BEFORE calling PetscInitialize <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html#PetscInitialize>(). Thus if you are running a four process job and two processes will run PETSc and have PetscInitialize <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html#PetscInitialize>() and PetscFinalize <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscFinalize.html#PetscFinalize>() and two process will not, then do this. If ALL processes in the job are using PetscInitialize <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html#PetscInitialize>() and PetscFinalize <http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscFinalize.html#PetscFinalize>() then you don't need to do this, even if different subcommunicators of the job are doing different things with PETSc."

I think I am not in this special scenario, because as my matrix is initially partitionned on 4 processes, I need to call PetscInitialize() on each 4 processes in order to build the PETSc matrix with MatSetValues. And my goal is after to solve the linear system on only 2 processes... So
building a sub-communicator will really do the trick ? Or i miss something ?

Thanks Karli for your answer,

Pierre
Best regards,
Karli



--
*Trio_U support team*
Marthe ROUX (01 69 08 00 02) Saclay
Pierre LEDAC (04 38 78 91 49) Grenoble

Reply via email to