Hi Akshay,

I'm building both UCX and OpenMPI as you mention. The portions of the script 

./configure --prefix=/usr/local/ucx-cuda-install 
--with-cuda=/usr/local/cuda-10.1  --with-gdrcopy=/home/odyhpc/gdrcopy 

sudo make install


./configure --with-cuda=/usr/local/cuda-10.1 
--with-ucx=/usr/local/ucx-cuda-install --prefix=/opt/openmpi

sudo make all install

As far as the job submission, I have tried several combinations with different 
MCAs (yesterday I forgot to include '--mca pml ucx' flag as it had made no 
difference in the past). I just tried your suggested syntax (mpirun -np 2 --mca 
pml ucx --mca btl ^smcuda,openib ./osu_latency D H) with the same results. The 
latency times are of the same order no matter which flags I include. As far as 
checking GPU usage, I'm not familiar with 'nvprof' and simply using the basic 
continuous info (nvidia-smi -l). I'm trying all of this in a cloud environment, 
and my suspicion is that there might be some interference (maybe because of 
some virtualization component) but cannot pinpoint the cause.




From: Akshay Venkatesh <akshay.v.3...@gmail.com> 
Sent: Friday, September 06, 2019 11:14 AM
To: Open MPI Users <users@lists.open-mpi.org>
Cc: Joshua Ladd <jladd.m...@gmail.com>; Arturo Fernandez <afernan...@odyhpc.com>
Subject: Re: [OMPI users] CUDA-aware codes not using GPU


Hi, Arturo.


Usually, for OpenMPI+UCX we use the following recipe 


for UCX:

./configure --prefix=/path/to/ucx-cuda-install --with-cuda=/usr/local/cuda 
make -j install

then OpenMPI:


./configure --with-cuda=/usr/local/cuda --with-ucx=/path/to/ucx-cuda-install
make -j install

Can you run with the following to see if it helps: 

mpirun -np 2 --mca pml ucx --mca btl ^smcuda,openib ./osu_latency D H

There are details here that may be useful: 


Also, note that for short messages D->H path for inter-node may not involve 
call CUDA API (if you're using nvprof to detect CUDA activity) because 
GPUDirectRDMA path and gdrcopy is used.


On Fri, Sep 6, 2019 at 7:36 AM Arturo Fernandez via users 
<users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> > wrote:


Thank you. Yes, I built UCX with CUDA and gdrcopy support. I also had to 
disable numa (--disable-numa) as requested during the installation. 



Joshua Ladd wrote 

Did you build UCX with CUDA support (--with-cuda) ? 




On Thu, Sep 5, 2019 at 8:45 PM AFernandez via users < users@lists.open-mpi.org  
<mailto:users@lists.open-mpi.org> > wrote: 

Hello OpenMPI Team, 

I'm trying to use CUDA-aware OpenMPI but the system simply ignores the GPU and 
the code runs on the CPUs. I've tried different software but will focus on the 
OSU benchmarks (collective and pt2pt communications). Let me provide some data 
about the configuration of the system: 

-OFED v4.17-1-rc2 (the NIC is virtualized but I also tried a Mellanox card with 
MOFED a few days ago and found the same issue) 

-CUDA v10.1 

-gdrcopy v1.3 

-UCX 1.6.0 

-OpenMPI 4.0.1 

Everything looks like good (CUDA programs work fine, MPI programs run on the 
CPUs without any problem), and the ompi_info outputs what I was expecting (but 
maybe I'm missing something): 






mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU buffer 
support is built into library or not 







The available btls are the usual self, openib, tcp & vader plus smcuda, uct & 
usnic. The full output from ompi_info is attached. If I try the flag '--mca 
opal_cuda_verbose 10,' it doesn't output anything, which seems to agree with 
the lack of GPU use. If I try with '--mca btl smcuda,' it makes no difference. 
I have also tried to specify the program to use host and device (e.g. mpirun 
-np 2 ./osu_latency D H) but the same result. I am probably missing something 
but not sure where else to look at or what else to try. 

Thank you, 


users mailing list 
users@lists.open-mpi.org  <mailto:users@lists.open-mpi.org> 

users mailing list
users@lists.open-mpi.org <mailto:users@lists.open-mpi.org> 





users mailing list

Reply via email to