Received from Rolf vandeVaart on Wed, May 20, 2015 at 07:48:15AM EDT: (snip)
> I see that you mentioned you are starting 4 MPS daemons. Are you following > the instructions here? > > http://cudamusing.blogspot.de/2013/07/enabling-cuda-multi-process-service-mps.html > Yes - also https://docs.nvidia.com/deploy/pdf/CUDA_Multi_Process_Service_Overview.pdf > This relies on setting CUDA_VISIBLE_DEVICES which can cause problems for CUDA > IPC. Since you are using CUDA 7 there is no more need to start multiple > daemons. You simply leave CUDA_VISIBLE_DEVICES untouched and start a single > MPS control daemon which will handle all GPUs. Can you try that? I assume that this means that only one CUDA_MPS_PIPE_DIRECTORY value should be passed to all MPI processes. Several questions related to your comment above: - Should the MPI processes select and initialize the GPUs they respectively need to access as they normally would when MPS is not in use? - Can CUDA_VISIBLE_DEVICES be used to control what GPUs are visible to MPS (and hence the client processes)? I ask because SLURM uses CUDA_VISIBLE_DEVICES to control GPU resource allocation, and I would like to run my program (and the MPS control daemon) on a cluster via SLURM. - Does the clash between setting CUDA_VISIBLE_DEVICES and CUDA IPC imply that MPS and CUDA IPC cannot reliably be used simultaneously in a multi-GPU setting with CUDA 6.5 even when one starts multiple MPS control daemons as described in the aforementioned blog post? > Because of this question, we realized we need to update our documentation as > well. -- Lev Givon Bionet Group | Neurokernel Project http://www.columbia.edu/~lev/ http://lebedov.github.io/ http://neurokernel.github.io/