On Mon, Feb 23, 2015 at 4:37 PM, Joshua Ladd <jladd.m...@gmail.com> wrote:

> Nathan,
>
> I do, but the hang comes later on. It looks like it's a situation where
> the root is way, way faster than the children and he's inducing an an
> overrun in the unexpected message queue. I think the queue is set to just
> keep growing and it eventually blows up the memory??
>

This is indeed possible there is no control flow in the PML (not in most of
the BTLs). However, the fact that the received prefers to drain the
incoming queues instead of returning back to MPI is slightly disturbing.

  George.


> $/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun -np 3
> --display-map -mca btl vader,self ./a.out
>  Data for JOB [14187,1] offset 0
>
>  ========================   JOB MAP   ========================
>
>  Data for node: mngx-apl-01     Num slots: 16   Max slots: 0    Num procs:
> 3
>         Process OMPI jobid: [14187,1] App: 0 Process rank: 0
>         Process OMPI jobid: [14187,1] App: 0 Process rank: 1
>         Process OMPI jobid: [14187,1] App: 0 Process rank: 2
>
>  =============================================================
> rank 2, m = 0
> rank 0, m = 0
> rank 1, m = 0
> rank 0, m = 1000
> rank 0, m = 2000
> rank 0, m = 3000
> rank 2, m = 1000
> rank 1, m = 1000
> rank 0, m = 4000
> rank 0, m = 5000
> rank 0, m = 6000
> rank 0, m = 7000
> rank 1, m = 2000
> rank 2, m = 2000
> rank 0, m = 8000
> rank 0, m = 9000
> rank 0, m = 10000
> rank 0, m = 11000
> rank 2, m = 3000
> rank 1, m = 3000
> rank 0, m = 12000
> rank 0, m = 13000
> rank 0, m = 14000
> rank 1, m = 4000
> rank 2, m = 4000
> rank 0, m = 15000
> rank 0, m = 16000
> rank 0, m = 17000
> rank 0, m = 18000
> rank 1, m = 5000
> rank 2, m = 5000
> rank 0, m = 19000
> rank 0, m = 20000
> rank 0, m = 21000
> rank 0, m = 22000
> rank 2, m = 6000     <--- Finally hangs when Ranks 2 and 1 are at 6000 but
> rank 0, the root, is at 22,000
> rank 1, m = 6000
>
> It fails with the ompi_coll_tuned_bcast_intra_split_bintree algorithm in
> Tuned - looks like a scatter/allgather type of operation. It's in the
> allgather phase during the bidirectional send/recv that things go bad.
> There are no issues running this under "Basic" colls.
>
> Josh
>
>
>
>
> On Mon, Feb 23, 2015 at 4:13 PM, Nathan Hjelm <hje...@lanl.gov> wrote:
>
>>
>> Josh, do you see a hang when using vader? It is preferred over the old
>> sm btl.
>>
>> -Nathan
>>
>> On Mon, Feb 23, 2015 at 03:48:17PM -0500, Joshua Ladd wrote:
>> >    Sachin,
>> >
>> >    I am able to reproduce something funny. Looks like your issue. When
>> I run
>> >    on a single host with two ranks, the test works fine. However, when
>> I try
>> >    three or more, it looks like only the root, rank 0, is making any
>> progress
>> >    after the first iteration.
>> >
>> >    $/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
>> -np 3
>> >    -mca btl self,sm ./bcast_loop
>> >    rank 0, m = 0
>> >    rank 1, m = 0
>> >    rank 2, m = 0
>> >    rank 0, m = 1000
>> >    rank 0, m = 2000
>> >    rank 0, m = 3000
>> >    rank 0, m = 4000
>> >    rank 0, m = 5000
>> >    rank 0, m = 6000
>> >    rank 0, m = 7000
>> >    rank 0, m = 8000
>> >    rank 0, m = 9000
>> >    rank 0, m = 10000
>> >    rank 0, m = 11000
>> >    rank 0, m = 12000
>> >    rank 0, m = 13000
>> >    rank 0, m = 14000
>> >    rank 0, m = 15000
>> >    rank 0, m = 16000   <----- Hanging
>> >
>> >    After hanging for a while, I get an OOM kernel panic message:
>> >
>> >    joshual@mngx-apl-01 ~
>> >    $
>> >    Message from syslogd@localhost at Feb 23 22:42:17 ...
>> >     kernel:Kernel panic - not syncing: Out of memory: system-wide
>> >    panic_on_oom is enabled
>> >
>> >    Message from syslogd@localhost at Feb 23 22:42:17 ...
>> >     kernel:
>> >
>> >    With TCP BTL the result is sensible, i.e. I see three ranks
>> reporting for
>> >    each multiple of 1000:
>> >    $/hpc/mtl_scrap/users/joshual/openmpi-1.8.4/ompi_install/bin/mpirun
>> -np 3
>> >    -mca btl self,tcp ./a.out
>> >    rank 1, m = 0
>> >    rank 2, m = 0
>> >    rank 0, m = 0
>> >    rank 0, m = 1000
>> >    rank 2, m = 1000
>> >    rank 1, m = 1000
>> >    rank 1, m = 2000
>> >    rank 0, m = 2000
>> >    rank 2, m = 2000
>> >    rank 0, m = 3000
>> >    rank 2, m = 3000
>> >    rank 1, m = 3000
>> >    rank 0, m = 4000
>> >    rank 1, m = 4000
>> >    rank 2, m = 4000
>> >    rank 0, m = 5000
>> >    rank 2, m = 5000
>> >    rank 1, m = 5000
>> >    rank 0, m = 6000
>> >    rank 1, m = 6000
>> >    rank 2, m = 6000
>> >    rank 2, m = 7000
>> >    rank 1, m = 7000
>> >    rank 0, m = 7000
>> >    rank 0, m = 8000
>> >    rank 2, m = 8000
>> >    rank 1, m = 8000
>> >    rank 0, m = 9000
>> >    rank 2, m = 9000
>> >    rank 1, m = 9000
>> >    rank 2, m = 10000
>> >    rank 0, m = 10000
>> >    rank 1, m = 10000
>> >    rank 1, m = 11000
>> >    rank 0, m = 11000
>> >    rank 2, m = 11000
>> >    rank 2, m = 12000
>> >    rank 1, m = 12000
>> >    rank 0, m = 12000
>> >    rank 1, m = 13000
>> >    rank 0, m = 13000
>> >    rank 2, m = 13000
>> >    rank 1, m = 14000
>> >    rank 2, m = 14000
>> >    rank 0, m = 14000
>> >    rank 1, m = 15000
>> >    rank 0, m = 15000
>> >    rank 2, m = 15000
>> >    etc...
>> >
>> >    It looks like a bug in the SM BTL. I can poke some more at this
>> tomorrow.
>> >
>> >    Josh
>> >    On Sun, Feb 22, 2015 at 11:18 PM, Sachin Krishnan <
>> sachk...@gmail.com>
>> >    wrote:
>> >
>> >      George,
>> >      I was able to run the code without any errors in an older version
>> of
>> >      OpenMPI in another machine. It looks like some problem with my
>> machine
>> >      like Josh pointed out.
>> >      Adding --mca coll tuned or basic  to the mpirun command resulted
>> in an
>> >      MPI_Init failed error with the following additional information
>> for the
>> >      Open MPI developer:
>> >       mca_coll_base_comm_select(MPI_COMM_WORLD) failed
>> >        --> Returned "Not found" (-13) instead of "Success" (0)
>> >      Thanks for the help.
>> >      Sachin
>> >      On Mon, Feb 23, 2015 at 4:17 AM, George Bosilca <
>> bosi...@icl.utk.edu>
>> >      wrote:
>> >
>> >        Sachin,
>> >        I cant replicate your issue neither with the latest 1.8 nor with
>> the
>> >        trunk. I tried using a single host, while forcing SM and then TP
>> to no
>> >        avail.
>> >        Can you try restricting the collective modules in use (adding
>> --mca
>> >        coll tuned,basic) to your mpirun command?
>> >          George.
>> >        On Fri, Feb 20, 2015 at 9:31 PM, Sachin Krishnan <
>> sachk...@gmail.com>
>> >        wrote:
>> >
>> >          Josh,
>> >          Thanks for the help.
>> >          I'm running on a single host. How do I confirm that it is an
>> issue
>> >          with the shared memory?
>> >          Sachin
>> >          On Fri, Feb 20, 2015 at 11:58 PM, Joshua Ladd <
>> jladd.m...@gmail.com>
>> >          wrote:
>> >
>> >            Sachin,
>> >
>> >            Are you running this on a single host or across multiple
>> hosts
>> >            (i.e. are you communicating between processes via
>> networking.) If
>> >            it's on a single host, then it might be an issue with shared
>> >            memory.
>> >
>> >            Josh
>> >            On Fri, Feb 20, 2015 at 1:51 AM, Sachin Krishnan
>> >            <sachk...@gmail.com> wrote:
>> >
>> >              Hello Josh,
>> >
>> >              The command i use to compile the code is:
>> >
>> >              mpicc bcast_loop.c
>> >
>> >              To run the code I use:
>> >
>> >              mpirun -np 2 ./a.out
>> >
>> >              Output is unpredictable. It gets stuck at different places.
>> >
>> >              Im attaching lstopo and ompi_info outputs. Do you need any
>> other
>> >              info?
>> >
>> >              lstopo-no-graphics output:
>> >
>> >              Machine (3433MB)
>> >
>> >                Socket L#0 + L3 L#0 (8192KB)
>> >
>> >                  L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) +
>> Core L#0
>> >
>> >                    PU L#0 (P#0)
>> >
>> >                    PU L#1 (P#4)
>> >
>> >                  L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) +
>> Core L#1
>> >
>> >                    PU L#2 (P#1)
>> >
>> >                    PU L#3 (P#5)
>> >
>> >                  L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) +
>> Core L#2
>> >
>> >                    PU L#4 (P#2)
>> >
>> >                    PU L#5 (P#6)
>> >
>> >                  L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) +
>> Core L#3
>> >
>> >                    PU L#6 (P#3)
>> >
>> >                    PU L#7 (P#7)
>> >
>> >                HostBridge L#0
>> >
>> >                  PCI 8086:0162
>> >
>> >                    GPU L#0 "card0"
>> >
>> >                    GPU L#1 "renderD128"
>> >
>> >                    GPU L#2 "controlD64"
>> >
>> >                  PCI 8086:1502
>> >
>> >                    Net L#3 "eth0"
>> >
>> >                  PCI 8086:1e02
>> >
>> >                    Block L#4 "sda"
>> >
>> >                    Block L#5 "sr0"
>> >
>> >              ompi_info output:
>> >
>> >                               Package: Open MPI builduser@anatol
>> Distribution
>> >
>> >                              Open MPI: 1.8.4
>> >
>> >                Open MPI repo revision: v1.8.3-330-g0344f04
>> >
>> >                 Open MPI release date: Dec 19, 2014
>> >
>> >                              Open RTE: 1.8.4
>> >
>> >                Open RTE repo revision: v1.8.3-330-g0344f04
>> >
>> >                 Open RTE release date: Dec 19, 2014
>> >
>> >                                  OPAL: 1.8.4
>> >
>> >                    OPAL repo revision: v1.8.3-330-g0344f04
>> >
>> >                     OPAL release date: Dec 19, 2014
>> >
>> >                               MPI API: 3.0
>> >
>> >                          Ident string: 1.8.4
>> >
>> >                                Prefix: /usr
>> >
>> >               Configured architecture: i686-pc-linux-gnu
>> >
>> >                        Configure host: anatol
>> >
>> >                         Configured by: builduser
>> >
>> >                         Configured on: Sat Dec 20 17:00:34 PST 2014
>> >
>> >                        Configure host: anatol
>> >
>> >                              Built by: builduser
>> >
>> >                              Built on: Sat Dec 20 17:12:16 PST 2014
>> >
>> >                            Built host: anatol
>> >
>> >                            C bindings: yes
>> >
>> >                          C++ bindings: yes
>> >
>> >                           Fort mpif.h: yes (all)
>> >
>> >                          Fort use mpi: yes (full: ignore TKR)
>> >
>> >                     Fort use mpi size: deprecated-ompi-info-value
>> >
>> >                      Fort use mpi_f08: yes
>> >
>> >               Fort mpi_f08 compliance: The mpi_f08 module is available,
>> but
>> >              due to
>> >
>> >                                        limitations in the
>> /usr/bin/gfortran
>> >              compiler, does
>> >
>> >                                        not support the following: array
>> >              subsections,
>> >
>> >                                        direct passthru (where possible)
>> to
>> >              underlying Open
>> >
>> >                                        MPI's C functionality
>> >
>> >                Fort mpi_f08 subarrays: no
>> >
>> >                         Java bindings: no
>> >
>> >                Wrapper compiler rpath: runpath
>> >
>> >                            C compiler: gcc
>> >
>> >                   C compiler absolute: /usr/bin/gcc
>> >
>> >                C compiler family name: GNU
>> >
>> >                    C compiler version: 4.9.2
>> >
>> >                          C++ compiler: g++
>> >
>> >                 C++ compiler absolute: /usr/bin/g++
>> >
>> >                         Fort compiler: /usr/bin/gfortran
>> >
>> >                     Fort compiler abs:
>> >
>> >                       Fort ignore TKR: yes (!GCC$ ATTRIBUTES
>> NO_ARG_CHECK ::)
>> >
>> >                 Fort 08 assumed shape: yes
>> >
>> >                    Fort optional args: yes
>> >
>> >                        Fort INTERFACE: yes
>> >
>> >                  Fort ISO_FORTRAN_ENV: yes
>> >
>> >                     Fort STORAGE_SIZE: yes
>> >
>> >                    Fort BIND(C) (all): yes
>> >
>> >                    Fort ISO_C_BINDING: yes
>> >
>> >               Fort SUBROUTINE BIND(C): yes
>> >
>> >                     Fort TYPE,BIND(C): yes
>> >
>> >               Fort T,BIND(C,name="a"): yes
>> >
>> >                          Fort PRIVATE: yes
>> >
>> >                        Fort PROTECTED: yes
>> >
>> >                         Fort ABSTRACT: yes
>> >
>> >                     Fort ASYNCHRONOUS: yes
>> >
>> >                        Fort PROCEDURE: yes
>> >
>> >                         Fort C_FUNLOC: yes
>> >
>> >               Fort f08 using wrappers: yes
>> >
>> >                       Fort MPI_SIZEOF: yes
>> >
>> >                           C profiling: yes
>> >
>> >                         C++ profiling: yes
>> >
>> >                 Fort mpif.h profiling: yes
>> >
>> >                Fort use mpi profiling: yes
>> >
>> >                 Fort use mpi_f08 prof: yes
>> >
>> >                        C++ exceptions: no
>> >
>> >                        Thread support: posix (MPI_THREAD_MULTIPLE: no,
>> OPAL
>> >              support: yes,
>> >
>> >                                        OMPI progress: no, ORTE
>> progress: yes,
>> >              Event lib:
>> >
>> >                                        yes)
>> >
>> >                         Sparse Groups: no
>> >
>> >                Internal debug support: no
>> >
>> >                MPI interface warnings: yes
>> >
>> >                   MPI parameter check: runtime
>> >
>> >              Memory profiling support: no
>> >
>> >              Memory debugging support: no
>> >
>> >                       libltdl support: yes
>> >
>> >                 Heterogeneous support: no
>> >
>> >               mpirun default --prefix: no
>> >
>> >                       MPI I/O support: yes
>> >
>> >                     MPI_WTIME support: gettimeofday
>> >
>> >                   Symbol vis. support: yes
>> >
>> >                 Host topology support: yes
>> >
>> >                        MPI extensions:
>> >
>> >                 FT Checkpoint support: no (checkpoint thread: no)
>> >
>> >                 C/R Enabled Debugging: no
>> >
>> >                   VampirTrace support: yes
>> >
>> >                MPI_MAX_PROCESSOR_NAME: 256
>> >
>> >                  MPI_MAX_ERROR_STRING: 256
>> >
>> >                   MPI_MAX_OBJECT_NAME: 64
>> >
>> >                      MPI_MAX_INFO_KEY: 36
>> >
>> >                      MPI_MAX_INFO_VAL: 256
>> >
>> >                     MPI_MAX_PORT_NAME: 1024
>> >
>> >                MPI_MAX_DATAREP_STRING: 128
>> >
>> >                         MCA backtrace: execinfo (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                          MCA compress: bzip (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                          MCA compress: gzip (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA crs: none (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                                MCA db: hash (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                                MCA db: print (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA event: libevent2021 (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA hwloc: external (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                                MCA if: posix_ipv4 (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                                MCA if: linux_ipv6 (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                       MCA installdirs: env (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                       MCA installdirs: config (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                        MCA memchecker: valgrind (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                            MCA memory: linux (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA pstat: linux (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA sec: basic (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA shmem: mmap (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA shmem: posix (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA shmem: sysv (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA timer: linux (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA dfs: app (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA dfs: orted (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA dfs: test (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                            MCA errmgr: default_app (MCA v2.0, API v3.0,
>> >              Component v1.8.4)
>> >
>> >                            MCA errmgr: default_hnp (MCA v2.0, API v3.0,
>> >              Component v1.8.4)
>> >
>> >                            MCA errmgr: default_orted (MCA v2.0, API
>> v3.0,
>> >              Component
>> >
>> >                                        v1.8.4)
>> >
>> >                            MCA errmgr: default_tool (MCA v2.0, API v3.0,
>> >              Component v1.8.4)
>> >
>> >                               MCA ess: env (MCA v2.0, API v3.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA ess: hnp (MCA v2.0, API v3.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA ess: singleton (MCA v2.0, API v3.0,
>> >              Component v1.8.4)
>> >
>> >                               MCA ess: tool (MCA v2.0, API v3.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA filem: raw (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                           MCA grpcomm: bad (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA iof: hnp (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA iof: mr_hnp (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA iof: mr_orted (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                               MCA iof: orted (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA iof: tool (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA odls: default (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA oob: tcp (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA plm: isolated (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                               MCA plm: rsh (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA ras: loadleveler (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                               MCA ras: simulator (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA rmaps: lama (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA rmaps: mindist (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA rmaps: ppr (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA rmaps: rank_file (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA rmaps: resilient (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA rmaps: round_robin (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA rmaps: seq (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA rmaps: staged (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA rml: oob (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                            MCA routed: binomial (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                            MCA routed: debruijn (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                            MCA routed: direct (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                            MCA routed: radix (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA state: app (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA state: hnp (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA state: novm (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA state: orted (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA state: staged_hnp (MCA v2.0, API v1.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA state: staged_orted (MCA v2.0, API v1.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA state: tool (MCA v2.0, API v1.0,
>> Component
>> >              v1.8.4)
>> >
>> >                         MCA allocator: basic (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                         MCA allocator: bucket (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA bcol: basesmuma (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                              MCA bcol: ptpcoll (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA bml: r2 (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                               MCA btl: self (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA btl: sm (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                               MCA btl: tcp (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA btl: vader (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA coll: basic (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA coll: hierarch (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                              MCA coll: inter (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA coll: libnbc (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA coll: ml (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                              MCA coll: self (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA coll: sm (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                              MCA coll: tuned (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA dpm: orte (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA fbtl: posix (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA fcoll: dynamic (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA fcoll: individual (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA fcoll: static (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA fcoll: two_phase (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                             MCA fcoll: ylib (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                                MCA fs: ufs (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                                MCA io: ompio (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                                MCA io: romio (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA mpool: grdma (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                             MCA mpool: sm (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                               MCA osc: rdma (MCA v2.0, API v3.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA osc: sm (MCA v2.0, API v3.0, Component
>> >              v1.8.4)
>> >
>> >                               MCA pml: v (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                               MCA pml: bfo (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA pml: cm (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                               MCA pml: ob1 (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                            MCA pubsub: orte (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                            MCA rcache: vma (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                               MCA rte: orte (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                              MCA sbgp: basesmsocket (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                              MCA sbgp: basesmuma (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                              MCA sbgp: p2p (MCA v2.0, API v2.0,
>> Component
>> >              v1.8.4)
>> >
>> >                          MCA sharedfp: individual (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                          MCA sharedfp: lockedfile (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >                          MCA sharedfp: sm (MCA v2.0, API v2.0, Component
>> >              v1.8.4)
>> >
>> >                              MCA topo: basic (MCA v2.0, API v2.1,
>> Component
>> >              v1.8.4)
>> >
>> >                         MCA vprotocol: pessimist (MCA v2.0, API v2.0,
>> >              Component v1.8.4)
>> >
>> >              Sachin
>> >
>> >              >Sachin,
>> >
>> >              >Can you, please, provide a command line? Additional
>> information
>> >              about your
>> >              >system could be helpful also.
>> >
>> >              >Josh
>> >
>> >              >>On Wed, Feb 18, 2015 at 3:43 AM, Sachin Krishnan
>> >              <sachkris_at_[hidden]> wrote:
>> >
>> >              >> Hello,
>> >              >>
>> >              >> I am new to MPI and also this list.
>> >              >> I wrote an MPI code with several MPI_Bcast calls in a
>> loop.
>> >              My code was
>> >              >> getting stuck at random points, ie it was not
>> systematic.
>> >              After a few hours
>> >              >> of debugging and googling, I found that the issue may
>> be with
>> >              the several
>> >              >> MPI_Bcast calls in a loop.
>> >              >>
>> >              >> I stumbled on this test code which can reproduce the
>> issue:
>> >              >>
>> https://github.com/fintler/ompi/blob/master/orte/test/mpi/bcast_loop.c
>> >              >>
>> >              >> Im using OpenMPI v1.8.4 installed from official Arch
>> Linux
>> >              repo.
>> >              >>
>> >              >> Is it a known issue with OpenMPI?
>> >              >> Is it some problem with the way openmpi is configured
>> in my
>> >              system?
>> >              >>
>> >              >> Thanks in advance.
>> >              >>
>> >              >> Sachin
>> >              >>
>> >              >>
>> >              >>
>> >              >> _______________________________________________
>> >              >> users mailing list
>> >              >> users_at_[hidden]
>> >              >>
>> >              Subscription:
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >              >> Link to this post:
>> >              >>
>> http://www.open-mpi.org/community/lists/users/2015/02/26338.php
>> >              >>
>> >
>> >              _______________________________________________
>> >              users mailing list
>> >              us...@open-mpi.org
>> >              Subscription:
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >              Link to this post:
>> >
>> http://www.open-mpi.org/community/lists/users/2015/02/26363.php
>> >
>> >            _______________________________________________
>> >            users mailing list
>> >            us...@open-mpi.org
>> >            Subscription:
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >            Link to this post:
>> >
>> http://www.open-mpi.org/community/lists/users/2015/02/26366.php
>> >
>> >          _______________________________________________
>> >          users mailing list
>> >          us...@open-mpi.org
>> >          Subscription:
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >          Link to this post:
>> >
>> http://www.open-mpi.org/community/lists/users/2015/02/26367.php
>> >
>> >        _______________________________________________
>> >        users mailing list
>> >        us...@open-mpi.org
>> >        Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >        Link to this post:
>> >        http://www.open-mpi.org/community/lists/users/2015/02/26368.php
>> >
>> >      _______________________________________________
>> >      users mailing list
>> >      us...@open-mpi.org
>> >      Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> >      Link to this post:
>> >      http://www.open-mpi.org/community/lists/users/2015/02/26369.php
>>
>> > _______________________________________________
>> > users mailing list
>> > us...@open-mpi.org
>> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> > Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/02/26375.php
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/02/26376.php
>>
>
>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/02/26377.php
>

Reply via email to