Hello OpenMPI Team, I'm trying to use CUDA-aware OpenMPI but the system simply ignores the GPU and the code runs on the CPUs. I've tried different software but will focus on the OSU benchmarks (collective and pt2pt communications). Let me provide some data about the configuration of the system:
-OFED v4.17-1-rc2 (the NIC is virtualized but I also tried a Mellanox card with MOFED a few days ago and found the same issue) -CUDA v10.1 -gdrcopy v1.3 -UCX 1.6.0 -OpenMPI 4.0.1 Everything looks like good (CUDA programs work fine, MPI programs run on the CPUs without any problem), and the ompi_info outputs what I was expecting (but maybe I'm missing something): mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with _cuda_support mca:mpi:base:param:mpi_built_with_cuda_support:value:true mca:mpi:base:param:mpi_built_with_cuda_support:source:default mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only mca:mpi:base:param:mpi_built_with_cuda_support:level:4 mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU buffer support is built into library or not mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no mca:mpi:base:param:mpi_built_with_cuda_support:type:bool mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_wi th_cuda_support mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false The available btls are the usual self, openib, tcp & vader plus smcuda, uct & usnic. The full output from ompi_info is attached. If I try the flag '--mca opal_cuda_verbose 10,' it doesn't output anything, which seems to agree with the lack of GPU use. If I try with '--mca btl smcuda,' it makes no difference. I have also tried to specify the program to use host and device (e.g. mpirun -np 2 ./osu_latency D H) but the same result. I am probably missing something but not sure where else to look at or what else to try. Thank you, AFernandez
$ ompi_info -param all all MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.1) MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.1) MCA btl: smcuda (MCA v2.1.0, API v3.1.0, Component v4.0.1) MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.1) MCA btl: uct (MCA v2.1.0, API v3.1.0, Component v4.0.1) MCA btl: usnic (MCA v2.1.0, API v3.1.0, Component v4.0.1) MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.1) MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.0.1) MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v4.0.1) MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v4.0.1) MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA routed: debruijn (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.0.1) MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.1) MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.0.1) MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component v4.0.1) MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component v4.0.1) MCA btl tcp: --------------------------------------------------- MCA btl tcp: parameter "btl_tcp_if_include" (current value: "", data source: default, level: 1 user/basic, type: string) Comma-delimited list of devices and/or CIDR notation of networks to use for MPI communication (e.g., "eth0,192.168.0.0/16"). Mutually exclusive with btl_tcp_if_exclude. MCA btl tcp: parameter "btl_tcp_if_exclude" (current value: "127.0.0.1/8,sppp", data source: default, level: 1 user/basic, type: string) Comma-delimited list of devices and/or CIDR notation of networks to NOT use for MPI communication -- all devices not matching these specifications will be used (e.g., "eth0,192.168.0.0/16"). If set to a non-default value, it is mutually exclusive with btl_tcp_if_include. MCA btl tcp: parameter "btl_tcp_progress_thread" (current value: "0", data source: default, level: 1 user/basic, type: int) MCA btl usnic: --------------------------------------------------- MCA btl usnic: parameter "btl_usnic_if_include" (current value: "", data source: default, level: 1 user/basic, type: string) Comma-delimited list of usNIC devices/networks to be used (e.g. "eth3,usnic_0,10.10.0.0/16"; empty value means to use all available usNICs). Mutually exclusive with btl_usnic_if_exclude. MCA btl usnic: parameter "btl_usnic_if_exclude" (current value: "", data source: default, level: 1 user/basic, type: string) Comma-delimited list of usNIC devices/networks to be excluded (empty value means to not exclude any usNICs). Mutually exclusive with btl_usnic_if_include. MCA mtl ofi: --------------------------------------------------- MCA mtl ofi: parameter "mtl_ofi_provider_include" (current value: "", data source: default, level: 1 user/basic, type: string) Comma-delimited list of OFI providers that are considered for use (e.g., "psm,psm2"; an empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_exclude. MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current value: "shm,sockets,tcp,udp,rstream", data source: default, level: 1 user/basic, type: string) Comma-delimited list of OFI providers that are not considered for use (default: "sockets,mxm"; empty value means that all providers will be considered). Mutually exclusive with mtl_ofi_provider_include. MCA pml monitoring: --------------------------------------------------- MCA pml monitoring: performance "pml_monitoring_flush" (type: string, class: generic) Flush the monitoring information in the provided file. The filename is append with the .%d.prof suffix, where %d is replaced with the processus rank in MPI_COMM_WORLD.
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users