Hello OpenMPI Team,

I'm trying to use CUDA-aware OpenMPI but the system simply ignores the GPU
and the code runs on the CPUs. I've tried different software but will focus
on the OSU benchmarks (collective and pt2pt communications). Let me provide
some data about the configuration of the system:

-OFED v4.17-1-rc2 (the NIC is virtualized but I also tried a Mellanox card
with MOFED a few days ago and found the same issue)

-CUDA v10.1

-gdrcopy v1.3

-UCX 1.6.0

-OpenMPI 4.0.1

Everything looks like good (CUDA programs work fine, MPI programs run on the
CPUs without any problem), and the ompi_info outputs what I was expecting
(but maybe I'm missing something):

mca:opal:base:param:opal_built_with_cuda_support:synonym:name:mpi_built_with
_cuda_support

mca:mpi:base:param:mpi_built_with_cuda_support:value:true

mca:mpi:base:param:mpi_built_with_cuda_support:source:default

mca:mpi:base:param:mpi_built_with_cuda_support:status:read-only

mca:mpi:base:param:mpi_built_with_cuda_support:level:4

mca:mpi:base:param:mpi_built_with_cuda_support:help:Whether CUDA GPU buffer
support is built into library or not

mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:0:false

mca:mpi:base:param:mpi_built_with_cuda_support:enumerator:value:1:true

mca:mpi:base:param:mpi_built_with_cuda_support:deprecated:no

mca:mpi:base:param:mpi_built_with_cuda_support:type:bool

mca:mpi:base:param:mpi_built_with_cuda_support:synonym_of:name:opal_built_wi
th_cuda_support

mca:mpi:base:param:mpi_built_with_cuda_support:disabled:false

The available btls are the usual self, openib, tcp & vader plus smcuda, uct
& usnic. The full output from ompi_info is attached. If I try the flag
'--mca opal_cuda_verbose 10,' it doesn't output anything, which seems to
agree with the lack of GPU use. If I try with '--mca btl smcuda,' it makes
no difference. I have also tried to specify the program to use host and
device (e.g. mpirun -np 2 ./osu_latency D H) but the same result. I am
probably missing something but not sure where else to look at or what else
to try. 

Thank you,

AFernandez

 

$ ompi_info -param all all
           MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.0.1)
           MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.0.1)
           MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.1)
                 MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.1)
                 MCA btl: smcuda (MCA v2.1.0, API v3.1.0, Component v4.0.1)
                 MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.1)
                 MCA btl: uct (MCA v2.1.0, API v3.1.0, Component v4.0.1)
                 MCA btl: usnic (MCA v2.1.0, API v3.1.0, Component v4.0.1)
                 MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.1)
            MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.0.1)
            MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                  MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.0.1)
               MCA event: libevent2022 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
               MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                  MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
                  MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
         MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.0.1)
         MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.0.1)
              MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.0.1)
             MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component
                          v4.0.1)
                MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.0.1)
              MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.0.1)
              MCA rcache: gpusm (MCA v2.1.0, API v3.3.0, Component v4.0.1)
              MCA rcache: rgpusm (MCA v2.1.0, API v3.3.0, Component v4.0.1)
           MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.0.1)
           MCA reachable: netlink (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA dfs: app (MCA v2.1.0, API v1.0.0, Component v4.0.1)
                 MCA dfs: orted (MCA v2.1.0, API v1.0.0, Component v4.0.1)
                 MCA dfs: test (MCA v2.1.0, API v1.0.0, Component v4.0.1)
              MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component
                          v4.0.1)
              MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component
                          v4.0.1)
              MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component
                          v4.0.1)
              MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component
                          v4.0.1)
                 MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component
                          v4.0.1)
                 MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.1)
               MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.0.1)
             MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.0.1)
            MCA notifier: syslog (MCA v2.1.0, API v1.0.0, Component v4.0.1)
                MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
                 MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.0.1)
                MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.0.1)
               MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
               MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
               MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
               MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.0.1)
              MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.0.1)
              MCA routed: debruijn (MCA v2.1.0, API v3.0.0, Component v4.0.1)
              MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.0.1)
              MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.0.1)
              MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.0.1)
              MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.0.1)
              MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.0.1)
              MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.1)
               MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.0.1)
               MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.0.1)
               MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.0.1)
               MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.0.1)
               MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.0.1)
                 MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: cuda (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
                MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.0.1)
               MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
               MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
               MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
               MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                  MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                  MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                  MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component
                          v4.0.1)
                 MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.1)
                 MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
                 MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                 MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.0.1)
            MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
            MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
            MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.0.1)
                MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.0.1)
                MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component
                          v4.0.1)
           MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component
                          v4.0.1)
             MCA btl tcp: ---------------------------------------------------
             MCA btl tcp: parameter "btl_tcp_if_include" (current value: "",
                          data source: default, level: 1 user/basic, type:
                          string)
                          Comma-delimited list of devices and/or CIDR
                          notation of networks to use for MPI communication
                          (e.g., "eth0,192.168.0.0/16").  Mutually exclusive
                          with btl_tcp_if_exclude.
             MCA btl tcp: parameter "btl_tcp_if_exclude" (current value:
                          "127.0.0.1/8,sppp", data source: default, level: 1
                          user/basic, type: string)
                          Comma-delimited list of devices and/or CIDR
                          notation of networks to NOT use for MPI
                          communication -- all devices not matching these
                          specifications will be used (e.g.,
                          "eth0,192.168.0.0/16").  If set to a non-default
                          value, it is mutually exclusive with
                          btl_tcp_if_include.
             MCA btl tcp: parameter "btl_tcp_progress_thread" (current value:
                          "0", data source: default, level: 1 user/basic,
                          type: int)
           MCA btl usnic: ---------------------------------------------------
           MCA btl usnic: parameter "btl_usnic_if_include" (current value:
                          "", data source: default, level: 1 user/basic,
                          type: string)
                          Comma-delimited list of usNIC devices/networks to
                          be used (e.g. "eth3,usnic_0,10.10.0.0/16"; empty
                          value means to use all available usNICs).  Mutually
                          exclusive with btl_usnic_if_exclude.
           MCA btl usnic: parameter "btl_usnic_if_exclude" (current value:
                          "", data source: default, level: 1 user/basic,
                          type: string)
                          Comma-delimited list of usNIC devices/networks to
                          be excluded (empty value means to not exclude any
                          usNICs).  Mutually exclusive with
                          btl_usnic_if_include.
             MCA mtl ofi: ---------------------------------------------------
             MCA mtl ofi: parameter "mtl_ofi_provider_include" (current
                          value: "", data source: default, level: 1
                          user/basic, type: string)
                          Comma-delimited list of OFI providers that are
                          considered for use (e.g., "psm,psm2"; an empty
                          value means that all providers will be considered).
                          Mutually exclusive with mtl_ofi_provider_exclude.
             MCA mtl ofi: parameter "mtl_ofi_provider_exclude" (current
                          value: "shm,sockets,tcp,udp,rstream", data source:
                          default, level: 1 user/basic, type: string)
                          Comma-delimited list of OFI providers that are not
                          considered for use (default: "sockets,mxm"; empty
                          value means that all providers will be considered).
                          Mutually exclusive with mtl_ofi_provider_include.
      MCA pml monitoring: ---------------------------------------------------
      MCA pml monitoring: performance "pml_monitoring_flush" (type: string,
                          class: generic)
                          Flush the monitoring information in the provided
                          file. The filename is append with the .%d.prof
                          suffix, where %d is replaced with the processus
                          rank in MPI_COMM_WORLD.


_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to