Re: [OMPI users] Setting LD_LIBRARY_PATH for orted
Gary, one option (as mentioned in the error message) is to configure Open MPI with --enable-orterun-prefix-by-default. this will force the build process to use rpath, so you do not have to set LD_LIBRARY_PATH this is the easiest option, but cannot be used if you plan to relocate the Open MPI installation directory. an other option is to use a wrapper for orted. mpirun --mca orte_launch_agent /.../myorted ... where myorted is a script that looks like #!/bin/sh export LD_LIBRARY_PATH=... exec /.../bin/orted "$@" you can make this setting system-wide by adding the following line to /.../etc/openmpi-mca-params.conf orte_launch_agent = /.../myorted Cheers, Gilles On 8/22/2017 1:06 AM, Jackson, Gary L. wrote: I’m using a binary distribution of OpenMPI 1.10.2. As linked, it requires certain shared libraries outside of OpenMPI for orted itself to start. So, passing in LD_LIBRARY_PATH with the “-x” flag to mpirun doesn’t do anything: $ mpirun –hostfile ${HOSTFILE} -N 1 -n 2 -x LD_LIBRARY_PATH hostname /path/to/orted: error while loading shared libraries: LIBRARY.so: cannot open shared object file: No such file or directory -- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -- How do I get around this cleanly? This works just fine when I set LD_LIBRARY_PATH in my .bashrc, but I’d rather not pollute that if I can avoid it. -- Gary Jackson, Ph.D. Johns Hopkins University Applied Physics Laboratory ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
Re: [OMPI users] Setting LD_LIBRARY_PATH for orted
Hi, > Am 21.08.2017 um 18:06 schrieb Jackson, Gary L.: > > > I’m using a binary distribution of OpenMPI 1.10.2. As linked, it requires > certain shared libraries outside of OpenMPI for orted itself to start. So, > passing in LD_LIBRARY_PATH with the “-x” flag to mpirun doesn’t do anything: > > $ mpirun –hostfile ${HOSTFILE} -N 1 -n 2 -x LD_LIBRARY_PATH hostname > /path/to/orted: error while loading shared libraries: LIBRARY.so: cannot open > shared object file: No such file or directory > -- > ORTE was unable to reliably start one or more daemons. > This usually is caused by: > > * not finding the required libraries and/or binaries on > one or more nodes. Please check your PATH and LD_LIBRARY_PATH > settings, or configure OMPI with --enable-orterun-prefix-by-default > > * lack of authority to execute on one or more specified nodes. > Please verify your allocation and authorities. > > * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). > Please check with your sys admin to determine the correct location to use. > > * compilation of the orted with dynamic libraries when static are required > (e.g., on Cray). Please check your configure cmd line and consider using > one of the contrib/platform definitions for your system type. > > * an inability to create a connection back to mpirun due to a > lack of common network interfaces and/or no route found between > them. Please check network connectivity (including firewalls > and network routing requirements). > -- > > How do I get around this cleanly? This works just fine when I set > LD_LIBRARY_PATH in my .bashrc, but I’d rather not pollute that if I can avoid > it. Do you set or extend the LD_LIBRARY_PATH in your .bashrc? -- Reuti ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] Setting LD_LIBRARY_PATH for orted
I’m using a binary distribution of OpenMPI 1.10.2. As linked, it requires certain shared libraries outside of OpenMPI for orted itself to start. So, passing in LD_LIBRARY_PATH with the “-x” flag to mpirun doesn’t do anything: $ mpirun –hostfile ${HOSTFILE} -N 1 -n 2 -x LD_LIBRARY_PATH hostname /path/to/orted: error while loading shared libraries: LIBRARY.so: cannot open shared object file: No such file or directory -- ORTE was unable to reliably start one or more daemons. This usually is caused by: * not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -- How do I get around this cleanly? This works just fine when I set LD_LIBRARY_PATH in my .bashrc, but I’d rather not pollute that if I can avoid it. -- Gary Jackson, Ph.D. Johns Hopkins University Applied Physics Laboratory ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] MIMD execution with global "--map-by node"
Hello I try to place executables on different sockets on different nodes with Open MPI 2.1.1. Therefore I use something like the following command: mpirun --map-by ppr:1:node -np 1 numactl -N 0 /bin/hostname : -np 1 numactl -N 1 /bin/hostname But from the output I see, that all processes are placed on the same node. Trying the same with Intel MPI using the following command export I_MPI_JOB_RESPECT_PROCESS_PLACEMENT=no mpirun -ppn 1 -np 1 /bin/hostname : -np 1 /bin/hostname processes go to different nodes as desired. So I am wondering if "--map-by" is a "local option" according to the mpirun man page and how I can achieve the desired behaviour with Open MPI. Thanks for your help. Best Christoph Niethammer ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] openmpi-2.1.2rc2: warnings from "make" and "make check"
Hi, I've installed openmpi-2.1.2rc2 on my "SUSE Linux Enterprise Server 12.2 (x86_64)" with Sun C 5.15 (Oracle Developer Studio 12.6) and gcc-7.1.0. Perhaps somebody wants to eliminate the following warnings. openmpi-2.1.2rc2-Linux.x86_64.64_gcc/log.make.Linux.x86_64.64_gcc:openmpi-2.1.2rc2/ompi/mca/io/romio314/romio/adio/common/utils.c:97:3: warning: passing argument 3 of 'PMPI_Type_hindexed' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] openmpi-2.1.2rc2-Linux.x86_64.64_gcc/log.make.Linux.x86_64.64_gcc:openmpi-2.1.2rc2/ompi/mpiext/cuda/c/mpiext_cuda_c.h:16:0: warning: "MPIX_CUDA_AWARE_SUPPORT" redefined openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/hwloc/hwloc1112/hwloc/src/topology-custom.c", line 88: warning: initializer will be sign-extended: -1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/hwloc/hwloc1112/hwloc/src/topology-linux.c", line 2640: warning: initializer will be sign-extended: -1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/hwloc/hwloc1112/hwloc/src/topology-synthetic.c", line 851: warning: initializer will be sign-extended: -1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/hwloc/hwloc1112/hwloc/src/topology-x86.c", line 113: warning: initializer will be sign-extended: -1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/hwloc/hwloc1112/hwloc/src/topology-xml.c", line 1667: warning: initializer will be sign-extended: -1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/ompi/mca/io/romio314/romio/adio/common/ad_fstype.c", line 428: warning: statement not reached openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/ompi/mca/io/romio314/romio/adio/common/ad_threaded_io.c", line 31: warning: statement not reached openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/ompi/mca/io/romio314/romio/adio/common/utils.c", line 97: warning: argument #3 is incompatible with prototype: openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/include/opal/sys/x86_64/atomic.h", line 161: warning: parameter in inline asm statement unused: %3 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/include/opal/sys/x86_64/atomic.h", line 207: warning: parameter in inline asm statement unused: %2 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/include/opal/sys/x86_64/atomic.h", line 228: warning: parameter in inline asm statement unused: %2 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/include/opal/sys/x86_64/atomic.h", line 249: warning: parameter in inline asm statement unused: %2 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/include/opal/sys/x86_64/atomic.h", line 270: warning: parameter in inline asm statement unused: %2 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/pmix/pmix112/pmix/src/client/pmi1.c", line 708: warning: null dimension: argvp openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c", line 266: warning: initializer will be sign-extended: -1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/pmix/pmix112/pmix/src/server/pmix_server.c", line 267: warning: initializer will be sign-extended: -1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/ompi/mpiext/cuda/c/mpiext_cuda_c.h", line 16: warning: macro redefined: MPIX_CUDA_AWARE_SUPPORT openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/include/opal/sys/x86_64/timer.h", line 49: warning: initializer does not fit or is out of range: 0x8007 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/pmix/pmix112/pmix1_client.c", line 408: warning: enum type mismatch: arg #1 openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"openmpi-2.1.2rc2/opal/mca/base/mca_base_component_repository.c", line 265: warning: statement not reached openmpi-2.1.2rc2-Linux.x86_64.64_cc/log.make.Linux.x86_64.64_cc:"/export2/src/openmpi-2.1.2/openmpi-2.1.2rc2/opal/mca/pmix/pmix112/pmix/include/pmi.h", line 788: warning: null dimension: argvp openmpi-2.1.2rc2-Linux.x86_64.64_gcc/log.make-check.Linux.x86_64.64_gcc:openmpi-2.1.2rc2/test/class/opal_fifo.c:109:26: warning: assignment discards 'volatile' qualifier from pointer target type [-Wdiscarded-qualifiers] openmpi-2.1.2rc2-Linux.x86_64.64_gcc/log.make-check.Linux.x86_64.64_gcc:openmpi-2.1.2rc2/test/class/opal_lifo.c:72:26: warning: assignment discards 'volatile' qualifier from pointer target
[OMPI users] Bottleneck of OpenMPI over 100Gbps ROCE
Hi all, Sorry for resubmitting this problem because I found I didn't add the subject in the last email. I encountered a problem when I tested the performance of OpenMPI over ROCE 100Gbps. I have two servers connected with mellanox 100Gbps Connect-X4 ROCE NICs on them. I used intel mpi benchmark to test the performance of OpenMPI (1.10.3) over RDMA. I found the bandwidth of benchmark pingpong (2 ranks, every server has only one rank) could reach only 6GB/s (with openib btl). I also used osu mpi benchmark, the bandwidth could reach only 6.5GB/s. However, when I start two benchmarks at the same time, the total bandwidth can reach about 11GB/s (every server has two ranks). It seems that the CPU is the bottleneck. Obviously, the bottleneck is not memcpy. And RDMA itself ought not to comsume too much CPU resources, since the perftest of ib_write_bw can reach 11GB/s easily. Is the bandwidth limit is normal? Is there anyone know what is the real bottleneck? Thanks for your kindly help in advance. Regards, Zhaogeng ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users
[OMPI users] (no subject)
Hi all, I encountered a problem when I tested the performance of OpenMPI over ROCE 100Gbps. I have two servers connected with mellanox 100Gbps Connect-X4 ROCE NICs on them. I used intel mpi benchmark to test the performance of OpenMPI (1.10.3) over RDMA. I found the bandwidth of benchmark pingpong (2 ranks, every server has only one rank) could reach only 6GB/s (with openib btl). I also used osu mpi benchmark, the bandwidth could reach only 6.5GB/s. However, when I start two benchmarks at the same time, the total bandwidth can reach about 11GB/s (every server has two ranks). It seems that the CPU is the bottleneck. Obviously, the bottleneck is not memcpy. And RDMA itself ought not to comsume too much CPU resources, since the perftest of ib_write_bw can reach 11GB/s easily. Is the bandwidth limit is normal? Is there anyone know what is the real bottleneck? Thanks for your kindly help in advance. Regards, Zhaogeng ___ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users