Am 27.07.2020 um 21:18 schrieb Prentice Bisbal via users:
> Can anyone explain why my job still calls libibverbs when I run it with '-mca > btl ^openib'? A similar behavior I observed too in a mixed cluster where some nodes have InfiniBand and others not. Even checking the node beforehand and applying '-mca btl ^openib' didn't help to suppress the warnings about the missing libibverbs. While in case of IB even more libs are required, at least the libibverbs seems to be required to avoid the warning about its absence in any case (while the job continued despite the warning). [node01:119439] mca_base_component_repository_open: unable to open mca_oob_ud: libibverbs.so.1: cannot open shared object file: No such file or directory (ignored) > If I instead use '-mca btl tcp', my jobs don't segfault. I would assum 'mca > btl ^openib' and '-mca btl tcp' to essentially be equivalent, but there's > obviously a difference in the two. This I didn't check but just ignored the warning later on. Would '-mca btl tcp' also allow local communication without the network being involved and/or replace vader? This was the reason I found '-mca btl ^openib' more appealing than listing all others. -- Reuti > Prentice > > On 7/23/20 3:34 PM, Prentice Bisbal wrote: >> I manage a cluster that is very heterogeneous. Some nodes have InfiniBand, >> while others have 10 Gb/s Ethernet. We recently upgraded to CentOS 7, and >> built a new software stack for CentOS 7. We are using OpenMPI 4.0.3, and we >> are using Slurm 19.05.5 as our job scheduler. >> >> We just noticed that when jobs are sent to the nodes with IB, the segfault >> immediately, with the segfault appearing to come from libibverbs.so. This is >> what I see in the stderr output for one of these failed jobs: >> >> srun: error: greene021: tasks 0-3: Segmentation fault >> >> And here is what I see in the log messages of the compute node where that >> segfault happened: >> >> Jul 23 15:19:41 greene021 kernel: mpihello[7911]: segfault at 7f0635f38910 >> ip 00007f0635f49405 sp 00007ffe354485a0 error 4 >> Jul 23 15:19:41 greene021 kernel: mpihello[7912]: segfault at 7f23d51ea910 >> ip 00007f23d51fb405 sp 00007ffef250a9a0 error 4 >> Jul 23 15:19:41 greene021 kernel: in >> libibverbs.so.1.5.22.4[7f23d51ec000+18000] >> Jul 23 15:19:41 greene021 kernel: >> Jul 23 15:19:41 greene021 kernel: mpihello[7909]: segfault at 7ff504ba5910 >> ip 00007ff504bb6405 sp 00007ffff917ccb0 error 4 >> Jul 23 15:19:41 greene021 kernel: in >> libibverbs.so.1.5.22.4[7ff504ba7000+18000] >> Jul 23 15:19:41 greene021 kernel: >> Jul 23 15:19:41 greene021 kernel: mpihello[7910]: segfault at 7fa58abc5910 >> ip 00007fa58abd6405 sp 00007ffdde50c0d0 error 4 >> Jul 23 15:19:41 greene021 kernel: in >> libibverbs.so.1.5.22.4[7fa58abc7000+18000] >> Jul 23 15:19:41 greene021 kernel: >> Jul 23 15:19:41 greene021 kernel: in >> libibverbs.so.1.5.22.4[7f0635f3a000+18000] >> Jul 23 15:19:41 greene021 kernel >> >> Any idea what is going on here, or how to debug further? I've been using >> OpenMPI for years, and it usually just works. >> >> I normally start my job with srun like this: >> >> srun ./mpihello >> >> But even if I try to take IB out of the equation by starting the job like >> this: >> >> mpirun -mca btl ^openib ./mpihello >> >> I still get a segfault issue, although the message to stderr is now a little >> different: >> >> -------------------------------------------------------------------------- >> Primary job terminated normally, but 1 process returned >> a non-zero exit code. Per user-direction, the job has been aborted. >> -------------------------------------------------------------------------- >> -------------------------------------------------------------------------- >> mpirun noticed that process rank 1 with PID 8502 on node greene021 exited on >> signal 11 (Segmentation fault). >> -------------------------------------------------------------------------- >> >> The segfaults happens immediately. It seems to happen as soon as MPI_Init() >> is called. The program I'm running is very simple MPI "Hello world!" program. >> >> The output of ompi_info is below my signature, in case that helps. >> >> Prentice >> >> $ ompi_info >> Package: Open MPI u...@host.example.com Distribution >> Open MPI: 4.0.3 >> Open MPI repo revision: v4.0.3 >> Open MPI release date: Mar 03, 2020 >> Open RTE: 4.0.3 >> Open RTE repo revision: v4.0.3 >> Open RTE release date: Mar 03, 2020 >> OPAL: 4.0.3 >> OPAL repo revision: v4.0.3 >> OPAL release date: Mar 03, 2020 >> MPI API: 3.1.0 >> Ident string: 4.0.3 >> Prefix: /usr/pppl/gcc/9.3-pkgs/openmpi-4.0.3 >> Configured architecture: x86_64-unknown-linux-gnu >> Configure host: dawson027.pppl.gov >> Configured by: lglant >> Configured on: Mon Jun 1 12:37:07 EDT 2020 >> Configure host: dawson027.pppl.gov >> Configure command line: '--prefix=/usr/pppl/gcc/9.3-pkgs/openmpi-4.0.3' >> '--with-ucx' '--with-verbs' '--with-libfabric' >> '--with-libevent=/usr' >> '--with-libevent-libdir=/usr/lib64' >> '--with-pmix=/usr/pppl/pmix/3.1.5' '--with-pmi' >> Built by: lglant >> Built on: Mon Jun 1 13:05:40 EDT 2020 >> Built host: dawson027.pppl.gov >> C bindings: yes >> C++ bindings: no >> Fort mpif.h: yes (all) >> Fort use mpi: yes (full: ignore TKR) >> Fort use mpi size: deprecated-ompi-info-value >> Fort use mpi_f08: yes >> Fort mpi_f08 compliance: The mpi_f08 module is available, but due to >> limitations in the gfortran compiler and/or Open >> MPI, does not support the following: array >> subsections, direct passthru (where possible) to >> underlying Open MPI's C functionality >> Fort mpi_f08 subarrays: no >> Java bindings: no >> Wrapper compiler rpath: runpath >> C compiler: gcc >> C compiler absolute: /usr/pppl/gcc/9.3.0/bin/gcc >> C compiler family name: GNU >> C compiler version: 9.3.0 >> C++ compiler: g++ >> C++ compiler absolute: /usr/pppl/gcc/9.3.0/bin/g++ >> Fort compiler: gfortran >> Fort compiler abs: /usr/pppl/gcc/9.3.0/bin/gfortran >> Fort ignore TKR: yes (!GCC$ ATTRIBUTES NO_ARG_CHECK ::) >> Fort 08 assumed shape: yes >> Fort optional args: yes >> Fort INTERFACE: yes >> Fort ISO_FORTRAN_ENV: yes >> Fort STORAGE_SIZE: yes >> Fort BIND(C) (all): yes >> Fort ISO_C_BINDING: yes >> Fort SUBROUTINE BIND(C): yes >> Fort TYPE,BIND(C): yes >> Fort T,BIND(C,name="a"): yes >> Fort PRIVATE: yes >> Fort PROTECTED: yes >> Fort ABSTRACT: yes >> Fort ASYNCHRONOUS: yes >> Fort PROCEDURE: yes >> Fort USE...ONLY: yes >> Fort C_FUNLOC: yes >> Fort f08 using wrappers: yes >> Fort MPI_SIZEOF: yes >> C profiling: yes >> C++ profiling: no >> Fort mpif.h profiling: yes >> Fort use mpi profiling: yes >> Fort use mpi_f08 prof: yes >> C++ exceptions: no >> Thread support: posix (MPI_THREAD_MULTIPLE: yes, OPAL support: yes, >> OMPI progress: no, ORTE progress: yes, Event lib: >> yes) >> Sparse Groups: no >> Internal debug support: no >> MPI interface warnings: yes >> MPI parameter check: runtime >> Memory profiling support: no >> Memory debugging support: no >> dl support: yes >> Heterogeneous support: no >> mpirun default --prefix: no >> MPI_WTIME support: native >> Symbol vis. support: yes >> Host topology support: yes >> IPv6 support: no >> MPI1 compatibility: no >> MPI extensions: affinity, cuda, pcollreq >> FT Checkpoint support: no (checkpoint thread: no) >> C/R Enabled Debugging: no >> MPI_MAX_PROCESSOR_NAME: 256 >> MPI_MAX_ERROR_STRING: 256 >> MPI_MAX_OBJECT_NAME: 64 >> MPI_MAX_INFO_KEY: 36 >> MPI_MAX_INFO_VAL: 256 >> MPI_MAX_PORT_NAME: 1024 >> MPI_MAX_DATAREP_STRING: 128 >> MCA allocator: bucket (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA allocator: basic (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA backtrace: execinfo (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA btl: self (MCA v2.1.0, API v3.1.0, Component v4.0.3) >> MCA btl: uct (MCA v2.1.0, API v3.1.0, Component v4.0.3) >> MCA btl: tcp (MCA v2.1.0, API v3.1.0, Component v4.0.3) >> MCA btl: usnic (MCA v2.1.0, API v3.1.0, Component v4.0.3) >> MCA btl: vader (MCA v2.1.0, API v3.1.0, Component v4.0.3) >> MCA btl: openib (MCA v2.1.0, API v3.1.0, Component v4.0.3) >> MCA compress: gzip (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA compress: bzip (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA crs: none (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA dl: dlopen (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA event: external (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA hwloc: hwloc201 (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA if: linux_ipv6 (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA if: posix_ipv4 (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA installdirs: env (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA installdirs: config (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA memory: patcher (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA mpool: hugepage (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA patcher: overwrite (MCA v2.1.0, API v1.0.0, Component >> v4.0.3) >> MCA pmix: flux (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pmix: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pmix: s2 (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pmix: ext3x (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pmix: s1 (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pmix: pmix3x (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pstat: linux (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA rcache: grdma (MCA v2.1.0, API v3.3.0, Component v4.0.3) >> MCA reachable: weighted (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA shmem: posix (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA shmem: mmap (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA shmem: sysv (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA timer: linux (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA errmgr: default_app (MCA v2.1.0, API v3.0.0, Component >> v4.0.3) >> MCA errmgr: default_orted (MCA v2.1.0, API v3.0.0, Component >> v4.0.3) >> MCA errmgr: default_tool (MCA v2.1.0, API v3.0.0, Component >> v4.0.3) >> MCA errmgr: default_hnp (MCA v2.1.0, API v3.0.0, Component >> v4.0.3) >> MCA ess: env (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA ess: hnp (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA ess: pmi (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA ess: slurm (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA ess: singleton (MCA v2.1.0, API v3.0.0, Component >> v4.0.3) >> MCA ess: tool (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA filem: raw (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA grpcomm: direct (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA iof: tool (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA iof: hnp (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA iof: orted (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA odls: pspawn (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA odls: default (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA oob: tcp (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA plm: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA plm: rsh (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA plm: isolated (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA ras: slurm (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA ras: simulator (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA regx: fwd (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA regx: naive (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA regx: reverse (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA rmaps: rank_file (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA rmaps: ppr (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA rmaps: resilient (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA rmaps: round_robin (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA rmaps: mindist (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA rmaps: seq (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA rml: oob (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA routed: direct (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA routed: binomial (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA routed: radix (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA rtc: hwloc (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA schizo: slurm (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA schizo: orte (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA schizo: ompi (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA schizo: flux (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA state: orted (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA state: novm (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA state: hnp (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA state: app (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA state: tool (MCA v2.1.0, API v1.0.0, Component v4.0.3) >> MCA bml: r2 (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA coll: self (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA coll: sm (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA coll: inter (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA coll: libnbc (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA coll: sync (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA coll: monitoring (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA coll: basic (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA coll: tuned (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA fbtl: posix (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA fcoll: dynamic (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA fcoll: two_phase (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA fcoll: dynamic_gen2 (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA fcoll: vulcan (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA fcoll: individual (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA fs: ufs (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA io: romio321 (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA io: ompio (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA mtl: ofi (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA mtl: psm (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA osc: ucx (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA osc: pt2pt (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA osc: monitoring (MCA v2.1.0, API v3.0.0, Component >> v4.0.3) >> MCA osc: sm (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA osc: rdma (MCA v2.1.0, API v3.0.0, Component v4.0.3) >> MCA pml: v (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pml: ob1 (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pml: ucx (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pml: cm (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA pml: monitoring (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA rte: orte (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA sharedfp: individual (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA sharedfp: sm (MCA v2.1.0, API v2.0.0, Component v4.0.3) >> MCA sharedfp: lockedfile (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> MCA topo: treematch (MCA v2.1.0, API v2.2.0, Component >> v4.0.3) >> MCA topo: basic (MCA v2.1.0, API v2.2.0, Component v4.0.3) >> MCA vprotocol: pessimist (MCA v2.1.0, API v2.0.0, Component >> v4.0.3) >> >> > -- > Prentice Bisbal > Lead Software Engineer > Research Computing > Princeton Plasma Physics Laboratory > http://www.pppl.gov >