Re: [OMPI users] cuIpcOpenMemHandle failure when using OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service

Lev Givon Thu, 21 May 2015 11:32:49 -0400 (EDT)

Received from Rolf vandeVaart on Wed, May 20, 2015 at 07:48:15AM EDT:

(snip)


> I see that you mentioned you are starting 4 MPS daemons.  Are you following
> the instructions here?
> 
> http://cudamusing.blogspot.de/2013/07/enabling-cuda-multi-process-service-mps.html
>  

Yes - also
https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf

> This relies on setting CUDA_VISIBLE_DEVICES which can cause problems for CUDA
> IPC. Since you are using CUDA 7 there is no more need to start multiple
> daemons. You simply leave CUDA_VISIBLE_DEVICES untouched and start a single
> MPS control daemon which will handle all GPUs.  Can you try that?  

I assume that this means that only one CUDA_MPS_PIPE_DIRECTORY value should be
passed to all MPI processes. 

Several questions related to your comment above:

- Should the MPI processes select and initialize the GPUs they respectively need
  to access as they normally would when MPS is not in use?
- Can CUDA_VISIBLE_DEVICES be used to control what GPUs are visible to MPS (and
  hence the client processes)? I ask because SLURM uses CUDA_VISIBLE_DEVICES to
  control GPU resource allocation, and I would like to run my program (and the
  MPS control daemon) on a cluster via SLURM.
- Does the clash between setting CUDA_VISIBLE_DEVICES and CUDA IPC imply that
  MPS and CUDA IPC cannot reliably be used simultaneously in a multi-GPU setting
  with CUDA 6.5 even when one starts multiple MPS control daemons as described
  in the aforementioned blog post?

> Because of this question, we realized we need to update our documentation as
> well.
-- 
Lev Givon
Bionet Group | Neurokernel Project
http://www.columbia.edu/~lev/
http://lebedov.github.io/
http://neurokernel.github.io/

Re: [OMPI users] cuIpcOpenMemHandle failure when using OpenMPI 1.8.5 with CUDA 7.0 and Multi-Process Service

Reply via email to