Hi, Arturo.

Usually, for OpenMPI+UCX we use the following recipe

for UCX:


./configure --prefix=/path/to/ucx-cuda-install
--with-cuda=/usr/local/cuda --with-gdrcopy=/usr


make -j install


then OpenMPI:

./configure --with-cuda=/usr/local/cuda --with-ucx=/path/to/ucx-cuda-install

make -j install


Can you run with the following to see if it helps:

mpirun -np 2 --mca pml ucx --mca btl ^smcuda,openib ./osu_latency D H

There are details here that may be useful:
https://www.open-mpi.org/faq/?category=runcuda#run-ompi-cuda-ucx

Also, note that for short messages D->H path for inter-node may not involve
call CUDA API (if you're using nvprof to detect CUDA activity) because
GPUDirectRDMA path and gdrcopy is used.

On Fri, Sep 6, 2019 at 7:36 AM Arturo Fernandez via users <
users@lists.open-mpi.org> wrote:

> Josh,
> Thank you. Yes, I built UCX with CUDA and gdrcopy support. I also had to
> disable numa (--disable-numa) as requested during the installation.
> AFernandez
>
> Joshua Ladd wrote
>
> Did you build UCX with CUDA support (--with-cuda) ?
>
> Josh
>
> On Thu, Sep 5, 2019 at 8:45 PM AFernandez via users < users@lists.open-mpi.org
> > wrote:
>
>> Hello OpenMPI Team,
>>
>> I'm trying to use CUDA-aware OpenMPI but the system simply ignores the
>> GPU and the code runs on the CPUs. I've tried different software but will
>> focus on the OSU benchmarks (collective and pt2pt communications). Let me
>> provide some data about the configuration of the system:
>>
>> -OFED v4.17-1-rc2 (the NIC is virtualized but I also tried a Mellanox
>> card with MOFED a few days ago and found the same issue)
>>
>> -CUDA v10.1
>>
>> -gdrcopy v1.3
>>
>> -UCX 1.6.0
>>
>> -OpenMPI 4.0.1
>>
>> Everything looks like good (CUDA programs work fine, MPI programs run on
>> the CPUs without any problem), and the ompi_info outputs what I was
>> expecting (but maybe I'm missing something):
>>
>> mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with_cuda_support
>>
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:value:true
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:source:default
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:level:4
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU
>> buffer support is built into library or not
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:type:bool
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_with_cuda_support
>>
>>
>> mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false
>>
>> The available btls are the usual self, openib, tcp & vader plus smcuda,
>> uct & usnic. The full output from ompi_info is attached. If I try the flag
>> '--mca opal_cuda_verbose 10,' it doesn't output anything, which seems to
>> agree with the lack of GPU use. If I try with '--mca btl smcuda,' it makes
>> no difference. I have also tried to specify the program to use host and
>> device (e.g. mpirun -np 2 ./osu_latency D H) but the same result. I am
>> probably missing something but not sure where else to look at or what else
>> to try.
>>
>> Thank you,
>>
>> AFernandez
>>
>> _______________________________________________
>> users mailing list
>> users@lists.open-mpi.org
>> https://lists.open-mpi.org/mailman/listinfo/users
>
> _______________________________________________
> users mailing list
> users@lists.open-mpi.org
> https://lists.open-mpi.org/mailman/listinfo/users



-- 
-Akshay
NVIDIA
_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to