On 08/20/14 12:11, Karl Rupp wrote:
Hi Pierre,
> I have a cluster with nodes of 2 sockets of 4 cores+1 GPU.
Is there a way to run a calculation with 4*N MPI tasks where
my matrix is first built outside PETSc, then to solve the
linear system using PETSc Mat, Vec, KSP on only N MPI
tasks to adress efficiently the N GPUs ?
as far as I can tell, this should be possible with a suitable
subcommunicator. The tricky piece, however, is to select the right MPI
ranks for this. Note that you generally have no guarantee on how the
MPI ranks are distributed across the nodes, so be prepared for
something fairly specific to your MPI installation.
Yes, I am ready to face this point too.
I am playing with the communicators without success, but I
am surely confusing things...
To keep matters simple, try to get this scenario working with a purely
CPU-based solve. Once this works, the switch to GPUs should be just a
matter of passing the right flags. Have a look at PetscInitialize() here:
http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html
which mentions that you need to create the subcommunicator of
MPI_COMM_WORLD first.
I also started the work with a purely CPU-based solve only to test, but
without success. When
I read this:
"If you wish PETSc code to run ONLY on a subcommunicator of
MPI_COMM_WORLD, create that communicator first and assign it to
PETSC_COMM_WORLD
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PETSC_COMM_WORLD.html#PETSC_COMM_WORLD>
BEFORE calling PetscInitialize
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html#PetscInitialize>().
Thus if you are running a four process job and two processes will run
PETSc and have PetscInitialize
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html#PetscInitialize>()
and PetscFinalize
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscFinalize.html#PetscFinalize>()
and two process will not, then do this. If ALL processes in
the job are using PetscInitialize
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscInitialize.html#PetscInitialize>()
and PetscFinalize
<http://www.mcs.anl.gov/petsc/petsc-current/docs/manualpages/Sys/PetscFinalize.html#PetscFinalize>()
then you don't need to do this, even if different subcommunicators of
the job are doing different things with PETSc."
I think I am not in this special scenario, because as my matrix is
initially partitionned on 4
processes, I need to call PetscInitialize() on each 4 processes in order
to build the PETSc matrix
with MatSetValues. And my goal is after to solve the linear system on
only 2 processes... So
building a sub-communicator will really do the trick ? Or i miss something ?
Thanks Karli for your answer,
Pierre
Best regards,
Karli
--
*Trio_U support team*
Marthe ROUX (01 69 08 00 02) Saclay
Pierre LEDAC (04 38 78 91 49) Grenoble