
I'd be grateful if someone could explain the meaning of this error message to me and whether it indicates a hardware problem or application software issue:

[node2:11881] OOB: Connection to HNP lost
[node1:09876] OOB: Connection to HNP lost

I have a small cluster which until last week was just fine. Unfortunately we were hit by a sudden power dip which brought the cluster down and did significant damage to other servers (blew power supplies and disk). Although the cluster machines and the Infiniband link is up and running jobs I am now getting these errors in user applications which we've never had before.

The system messages file reports (for node2):
Jul 5 12:08:28 node1 genunix: [ID 408789 kern.notice] NOTICE: tavor0: fault cleared external to device; service available Jul 5 12:08:28 node1 genunix: [ID 451854 kern.notice] NOTICE: tavor0: port 1 up Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info] /pci@1,0/pci1022,7450@2/pci15b3,5a46@1/pci15b3,5a44@0 (tavor0) online
Jul  7 16:18:32 node1 ib: [ID 842868 kern.info] IB device: daplt@0, daplt0
Jul  7 16:18:32 node1 genunix: [ID 936769 kern.info] daplt0 is /ib/daplt@0
Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info] /ib/daplt@0 (daplt0) online Jul 7 16:18:32 node1 genunix: [ID 834635 kern.info] /ib/daplt@0 (daplt0) multipath status: degraded, path /pci@1,0/pci1022,7450@2/pci15 b3,5a46@1/pci15b3,5a44@0 (tavor0) to target address: daplt,0 is online Load balancing: round-robin

I wonder if this messages are indicative of a hardware problem, possibly on the Infiniband switch or the host adapters on the cluster machines. The cluster software has not been altered but there have been small changes to the application codes. But I want to rule out hardware issues because of the power dip first.

Anyone seen this message before and know whether to investigate hardware first? I did check the archives but it didn't help. More info provided below.

Any help appreciate, thanks.


Cluster uses mix of Sun's X4100/X4200 machines linked with Sun supplied Infiniband and host adapters. All machines are running Solaris 10_x86 (11/06) with latest kernel patches
Software is Sun Clustertools 7.

Node2 $ ifconfig ibd1
ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
        inet netmask ffffff00 broadcast

Node1 $ ifconfig ibd1
ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3
        inet netmask ffffff00 broadcast

ompi_info -a
                Open MPI: 1.2.1r14096-ct7b030r1838
   Open MPI SVN revision: 0
                Open RTE: 1.2.1r14096-ct7b030r1838
   Open RTE SVN revision: 0
                    OPAL: 1.2.1r14096-ct7b030r1838
       OPAL SVN revision: 0
           MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.1)
           MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1)
           MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1)
               MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)
           MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)
           MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0)
                MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)
                MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)
               MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)
               MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1)
              MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1)
              MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1)
                 MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)
                 MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)
                 MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1)
                MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA osc: pt2pt (MCA v1.0, API v1.0, Component v1.2.1)
              MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)
              MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1)
              MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA iof: proxy (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)
                  MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                  MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)
                 MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)
                 MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1)
               MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1)
                MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)
                MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1)
                 MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1)
                 MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1)
                 MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1)
                  Prefix: /opt/SUNWhpc/HPC7.0
                  Bindir: /opt/SUNWhpc/HPC7.0/bin
                  Libdir: /opt/SUNWhpc/HPC7.0/lib
                  Incdir: /opt/SUNWhpc/HPC7.0/include
               Pkglibdir: /opt/SUNWhpc/HPC7.0/lib/openmpi
              Sysconfdir: /opt/SUNWhpc/HPC7.0/etc
 Configured architecture: i386-pc-solaris2.10
           Configured by: root
           Configured on: Fri Mar 30 13:40:12 EDT 2007
          Configure host: burpen-csx10-0
                Built by: root
                Built on: Fri Mar 30 13:57:25 EDT 2007
              Built host: burpen-csx10-0
              C bindings: yes
            C++ bindings: yes
      Fortran77 bindings: yes (all)
      Fortran90 bindings: yes
 Fortran90 bindings size: trivial
              C compiler: cc
     C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc
             C char size: 1
             C bool size: 1
            C short size: 2
              C int size: 4
             C long size: 4
            C float size: 4
           C double size: 8
          C pointer size: 4
            C char align: 1
            C bool align: 1
             C int align: 4
           C float align: 4
          C double align: 4
            C++ compiler: CC
   C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC
      Fortran77 compiler: f77
  Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77
      Fortran90 compiler: f95
  Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95
       Fort integer size: 4
       Fort logical size: 4
 Fort logical value true: 1
      Fort have integer1: yes
      Fort have integer2: yes
      Fort have integer4: yes
      Fort have integer8: yes
     Fort have integer16: no
         Fort have real4: yes
         Fort have real8: yes
        Fort have real16: no
      Fort have complex8: yes
     Fort have complex16: yes
     Fort have complex32: no
      Fort integer1 size: 1
      Fort integer2 size: 2
      Fort integer4 size: 4
      Fort integer8 size: 8
     Fort integer16 size: -1
          Fort real size: 4
         Fort real4 size: 4
         Fort real8 size: 8
        Fort real16 size: -1
      Fort dbl prec size: 4
          Fort cplx size: 4
      Fort dbl cplx size: 4
         Fort cplx8 size: 8
        Fort cplx16 size: 16
        Fort cplx32 size: -1
      Fort integer align: 4
     Fort integer1 align: 1
     Fort integer2 align: 2
     Fort integer4 align: 4
     Fort integer8 align: 4
    Fort integer16 align: -1
         Fort real align: 4
        Fort real4 align: 4
        Fort real8 align: 4
       Fort real16 align: -1
     Fort dbl prec align: 4
         Fort cplx align: 4
     Fort dbl cplx align: 4
        Fort cplx8 align: 4
       Fort cplx16 align: 4
       Fort cplx32 align: -1
             C profiling: yes
           C++ profiling: yes
     Fortran77 profiling: yes
     Fortran90 profiling: yes
          C++ exceptions: yes
          Thread support: no
            Build CFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 -xprefetch
-xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all
          Build CXXFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 -xprefetch
-xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all
Build FFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch -xprefetch_level=2
                          -xvector=simd -stackvar -xO5
Build FCFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch -xprefetch_level=2
                          -xvector=simd -stackvar -xO5
Build LDFLAGS: -export-dynamic -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/lib
                          -R/opt/mx/lib/amd64 -R/opt/SUNWhpc/HPC7.0/lib/amd64
-R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/amd64
                          -R/opt/SUNWhpc/HPC7.0/lib/amd64 -R/opt/mx/lib
                          -R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/amd64
              Build LIBS: -lsocket -lnsl  -lrt -lm
    Wrapper extra CFLAGS:
  Wrapper extra CXXFLAGS:
    Wrapper extra FFLAGS:
   Wrapper extra FCFLAGS:
Wrapper extra LDFLAGS: -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/amd64
      Wrapper extra LIBS:      -lsocket -lnsl -lrt -lm -ldl
  Internal debug support: no
     MPI parameter check: runtime
Memory profiling support: no
Memory debugging support: no
         libltdl support: yes
   Heterogeneous support: yes
 mpirun default --prefix: yes
                 MCA mca: parameter "mca_param_files" (current value:

Path for MCA configuration files containing default parameter
                 MCA mca: parameter "mca_component_path" (current value:

                          Path where to look for Open MPI and ORTE components
                 MCA mca: parameter "mca_verbose" (current value: <none>)
                          Top-level verbosity parameter
MCA mca: parameter "mca_component_show_load_errors" (current value: "0") Whether to show errors for components that failed to load or
MCA mca: parameter "mca_component_disable_dlopen" (current value: "0") Whether to attempt to disable opening dynamic components or not
                 MCA mpi: parameter "mpi_param_check" (current value: "1")
Whether you want MPI API parameters checked at run-time or not. Possible values are 0 (no checking) and 1 (perform checking at
                 MCA mpi: parameter "mpi_yield_when_idle" (current value: "0")
Yield the processor when waiting for MPI communication (for MPI processes, will default to 1 when oversubscribing nodes)
                 MCA mpi: parameter "mpi_event_tick_rate" (current value: "-1")
How often to progress TCP communications (0 = never, otherwise
                          specified in microseconds)
MCA mpi: parameter "mpi_show_handle_leaks" (current value: "0") Whether MPI_FINALIZE shows all MPI handles that were not freed
                          or not
                 MCA mpi: parameter "mpi_no_free_handles" (current value: "0")
Whether to actually free MPI objects when their handles are
                 MCA mpi: parameter "mpi_show_mca_params" (current value: "0")
Whether to show all MCA parameter value during MPI_INIT or not
                          (good for reproducability of MPI jobs)
MCA mpi: parameter "mpi_show_mca_params_file" (current value: <none>) If mpi_show_mca_params is true, setting this string to a valid filename tells Open MPI to dump all the MCA parameter values into a file suitable for reading via the mca_param_files
                          parameter (good for reproducability of MPI jobs)
                 MCA mpi: parameter "mpi_paffinity_alone" (current value: "0")
                          If nonzero, assume that this job is the only (set of)
process(es) running on each node and bind processes to
                          processors, starting with processor ID 0
MCA mpi: parameter "mpi_keep_peer_hostnames" (current value: "1") If nonzero, save the string hostnames of all MPI peer processes (mostly for error / debugging output messages). This can add
                          quite a bit of memory usage to each MPI process.
                 MCA mpi: parameter "mpi_abort_delay" (current value: "0")
If nonzero, print out an identifying message when MPI_ABORT is invoked (hostname, PID of the process that called MPI_ABORT) and delay for that many seconds before exiting (a negative delay value means to never abort). This allows attaching of a
                          debugger before quitting the job.
                 MCA mpi: information "mpi_abort_print_stack" (value: "0")
If nonzero, print out a stack trace when MPI_ABORT is invoked
                 MCA mpi: parameter "mpi_preconnect_all" (current value: "0")
Whether to force MPI processes to create connections / warmup with *all* peers during MPI_INIT (vs. making connections lazily -- upon the first MPI traffic between each process peer pair)
                 MCA mpi: parameter "mpi_preconnect_oob" (current value: "0")
Whether to force MPI processes to fully wire-up the OOB system
                          between MPI processes.
                 MCA mpi: parameter "mpi_leave_pinned" (current value: "0")
Whether to use the "leave pinned" protocol or not. Enabling this setting can help bandwidth performance when repeatedly sending and receiving large messages with the same buffers over
                          RDMA-based networks.
MCA mpi: parameter "mpi_leave_pinned_pipeline" (current value: "0") Whether to use the "leave pinned pipeline" protocol or not.
                MCA orte: parameter "orte_debug" (current value: "0")
                          Top-level ORTE debug switch
                MCA orte: parameter "orte_no_daemonize" (current value: "0")
                          Whether to properly daemonize the ORTE daemons or not
MCA orte: parameter "orte_base_user_debugger" (current value: "totalview @mpirun@ -a @mpirun_args@ : fxp @mpirun@ -a @mpirun_args@") Sequence of user-level debuggers to search for in orterun
                MCA orte: parameter "orte_abort_timeout" (current value: "10")
Time to wait [in seconds] before giving up on aborting an ORTE
                MCA orte: parameter "orte_timing" (current value: "0")
                          Request that critical timing loops be measured
                MCA opal: parameter "opal_signal" (current value: "6,10,8,11")
If a signal is received, display the stack trace frame
           MCA backtrace: parameter "backtrace" (current value: <none>)
Default selection set of components for the backtrace framework
                          (<none> means "use all components that can be found")
MCA backtrace: parameter "backtrace_base_verbose" (current value: "0") Verbosity level for the backtrace framework (0 = no verbosity) MCA backtrace: parameter "backtrace_printstack_priority" (current value: "0")
              MCA memory: parameter "memory" (current value: <none>)
Default selection set of components for the memory framework
                          (<none> means "use all components that can be found")
              MCA memory: parameter "memory_base_verbose" (current value: "0")
Verbosity level for the memory framework (0 = no verbosity)
           MCA paffinity: parameter "paffinity" (current value: <none>)
Default selection set of components for the paffinity framework
                          (<none> means "use all components that can be found")
MCA paffinity: parameter "paffinity_solaris_priority" (current value: "10")
                          Priority of the solaris paffinity component
           MCA maffinity: parameter "maffinity" (current value: <none>)
Default selection set of components for the maffinity framework
                          (<none> means "use all components that can be found")
MCA maffinity: parameter "maffinity_first_use_priority" (current value: "10")
                          Priority of the first_use maffinity component
               MCA timer: parameter "timer" (current value: <none>)
Default selection set of components for the timer framework
                          (<none> means "use all components that can be found")
               MCA timer: parameter "timer_base_verbose" (current value: "0")
Verbosity level for the timer framework (0 = no verbosity) MCA timer: parameter "timer_solaris_priority" (current value: "0")
           MCA allocator: parameter "allocator" (current value: <none>)
Default selection set of components for the allocator framework
                          (<none> means "use all components that can be found")
MCA allocator: parameter "allocator_base_verbose" (current value: "0") Verbosity level for the allocator framework (0 = no verbosity) MCA allocator: parameter "allocator_basic_priority" (current value: "0") MCA allocator: parameter "allocator_bucket_num_buckets" (current value: "30") MCA allocator: parameter "allocator_bucket_priority" (current value: "0")
                MCA coll: parameter "coll" (current value: <none>)
Default selection set of components for the coll framework
                          (<none> means "use all components that can be found")
                MCA coll: parameter "coll_base_verbose" (current value: "0")
Verbosity level for the coll framework (0 = no verbosity)
                MCA coll: parameter "coll_basic_priority" (current value: "10")
                          Priority of the basic coll component
                MCA coll: parameter "coll_basic_crossover" (current value: "4")
Minimum number of processes in a communicator before using the
                          logarithmic algorithms
                MCA coll: parameter "coll_self_priority" (current value: "75")
                MCA coll: parameter "coll_sm_priority" (current value: "0")
                          Priority of the sm coll component
MCA coll: parameter "coll_sm_control_size" (current value: "4096") Length of the control data -- should usually be either the length of a cache line on most SMPs, or the size of a page on machines that support direct memory affinity page placement (in
MCA coll: parameter "coll_sm_bootstrap_filename" (current value:
Filename (in the Open MPI session directory) of the coll sm
                          component bootstrap rendezvous mmap file
MCA coll: parameter "coll_sm_bootstrap_num_segments" (current value: "8")
                          Number of segments in the bootstrap file
MCA coll: parameter "coll_sm_fragment_size" (current value: "8192") Fragment size (in bytes) used for passing data through shared memory (will be rounded up to the nearest control_size size)
                MCA coll: parameter "coll_sm_mpool" (current value: "sm")
                          Name of the mpool component to use
MCA coll: parameter "coll_sm_comm_in_use_flags" (current value: "2") Number of "in use" flags, used to mark a message passing area segment as currently being used or not (must be >= 2 and <=
MCA coll: parameter "coll_sm_comm_num_segments" (current value: "8") Number of segments in each communicator's shared memory message
                          passing area (must be >= 2, and must be a multiple of
                MCA coll: parameter "coll_sm_tree_degree" (current value: "4")
Degree of the tree for tree-based operations (must be => 1 and
                          <= min(control_size, 255))
MCA coll: information "coll_sm_shared_mem_used_bootstrap" (value: "160") Amount of shared memory used in the shared memory bootstrap area
                          (in bytes)
MCA coll: parameter "coll_sm_info_num_procs" (current value: "4")
                          Number of processes to use for the calculation of the
shared_mem_size MCA information parameter (must be => 2) MCA coll: information "coll_sm_shared_mem_used_data" (value: "548864") Amount of shared memory used in the shared memory data area for
                          info_num_procs processes (in bytes)
                MCA coll: parameter "coll_tuned_priority" (current value: "30")
                          Priority of the tuned coll component
MCA coll: parameter "coll_tuned_pre_allocate_memory_comm_size_limit"
                          (current value: "32768")
Size of communicator were we stop pre-allocating memory for the fixed internal buffer used for message requests etc that is hung off the communicator data segment. I.e. if you have a 100'000 nodes you might not want to pre-allocate 200'000 request handle
                          slots per communicator instance!
MCA coll: parameter "coll_tuned_init_tree_fanout" (current value: "4") Inital fanout used in the tree topologies for each communicator. This is only an initial guess, if a tuned collective needs a different fanout for an operation, it build it dynamically. This parameter is only for the first guess and might save a little
MCA coll: parameter "coll_tuned_init_chain_fanout" (current value: "4") Inital fanout used in the chain (fanout followed by pipeline) topologies for each communicator. This is only an initial guess, if a tuned collective needs a different fanout for an operation, it build it dynamically. This parameter is only for the first
                          guess and might save a little time
MCA coll: parameter "coll_tuned_use_dynamic_rules" (current value: "0") Switch used to decide if we use static (compiled/if statements)
                          or dynamic (built at runtime) decision function rules
MCA io: parameter "io_base_freelist_initial_size" (current value: "16")
                          Initial MPI-2 IO request freelist size
MCA io: parameter "io_base_freelist_max_size" (current value: "64")
                          Max size of the MPI-2 IO request freelist
MCA io: parameter "io_base_freelist_increment" (current value: "16")
                          Increment size of the MPI-2 IO request freelist
                  MCA io: parameter "io" (current value: <none>)
Default selection set of components for the io framework (<none>
                          means "use all components that can be found")
                  MCA io: parameter "io_base_verbose" (current value: "0")
Verbosity level for the io framework (0 = no verbosity)
                  MCA io: parameter "io_romio_priority" (current value: "10")
                          Priority of the io romio component
MCA io: parameter "io_romio_delete_priority" (current value: "10")
                          Delete priority of the io romio component
MCA io: parameter "io_romio_enable_parallel_optimizations" (current
                          value: "0")
Enable set of Open MPI-added options to improve collective file
                          i/o performance
               MCA mpool: parameter "mpool" (current value: <none>)
Default selection set of components for the mpool framework
                          (<none> means "use all components that can be found")
               MCA mpool: parameter "mpool_base_verbose" (current value: "0")
Verbosity level for the mpool framework (0 = no verbosity) MCA mpool: parameter "mpool_sm_allocator" (current value: "bucket")
                          Name of allocator component to use with sm mpool
MCA mpool: parameter "mpool_sm_max_size" (current value: "536870912")
                          Maximum size of the sm mpool shared memory file
MCA mpool: parameter "mpool_sm_min_size" (current value: "134217728")
                          Minimum size of the sm mpool shared memory file
MCA mpool: parameter "mpool_sm_per_peer_size" (current value: "33554432") Size (in bytes) to allocate per local peer in the sm mpool
                          shared memory file, bounded by min_size and max_size
               MCA mpool: parameter "mpool_sm_priority" (current value: "0")
               MCA mpool: parameter "mpool_udapl_priority" (current value: "0")
MCA mpool: parameter "mpool_base_use_mem_hooks" (current value: "0")
                          use memory hooks for deregistering freed memory
               MCA mpool: parameter "mpool_use_mem_hooks" (current value: "0")
                          (deprecated, use mpool_base_use_mem_hooks)
                 MCA pml: parameter "pml" (current value: <none>)
Default selection set of components for the pml framework
                          (<none> means "use all components that can be found")
                 MCA pml: parameter "pml_base_verbose" (current value: "0")
Verbosity level for the pml framework (0 = no verbosity)
                 MCA pml: parameter "pml_cm_free_list_num" (current value: "4")
                          Initial size of request free lists
MCA pml: parameter "pml_cm_free_list_max" (current value: "-1")
                          Maximum size of request free lists
MCA pml: parameter "pml_cm_free_list_inc" (current value: "64") Number of elements to add when growing request free lists
                 MCA pml: parameter "pml_cm_priority" (current value: "30")
                          CM PML selection priority
MCA pml: parameter "pml_ob1_free_list_num" (current value: "4") MCA pml: parameter "pml_ob1_free_list_max" (current value: "-1") MCA pml: parameter "pml_ob1_free_list_inc" (current value: "64")
                 MCA pml: parameter "pml_ob1_priority" (current value: "20")
MCA pml: parameter "pml_ob1_eager_limit" (current value: "131072") MCA pml: parameter "pml_ob1_send_pipeline_depth" (current value: "3") MCA pml: parameter "pml_ob1_recv_pipeline_depth" (current value: "4")
                 MCA bml: parameter "bml" (current value: <none>)
Default selection set of components for the bml framework
                          (<none> means "use all components that can be found")
                 MCA bml: parameter "bml_base_verbose" (current value: "0")
Verbosity level for the bml framework (0 = no verbosity) MCA bml: parameter "bml_r2_show_unreach_errors" (current value: "1")
                          Show error message when procs are unreachable
                 MCA bml: parameter "bml_r2_priority" (current value: "0")
              MCA rcache: parameter "rcache" (current value: <none>)
Default selection set of components for the rcache framework
                          (<none> means "use all components that can be found")
              MCA rcache: parameter "rcache_base_verbose" (current value: "0")
Verbosity level for the rcache framework (0 = no verbosity)
              MCA rcache: parameter "rcache_rb_priority" (current value: "0")
              MCA rcache: parameter "rcache_vma_mru_len" (current value: "256")
The maximum size IN ENTRIES of the MRU (most recently used)
                          rcache list
MCA rcache: parameter "rcache_vma_mru_size" (current value: "1073741824") The maximum size IN BYTES of the MRU (most recently used) rcache
              MCA rcache: parameter "rcache_vma_priority" (current value: "0")
                 MCA btl: parameter "btl_base_debug" (current value: "0")
If btl_base_debug is 1 standard debug is output, if > 1 verbose
                          debug is output
                 MCA btl: parameter "btl" (current value: <none>)
Default selection set of components for the btl framework
                          (<none> means "use all components that can be found")
                 MCA btl: parameter "btl_base_verbose" (current value: "0")
Verbosity level for the btl framework (0 = no verbosity) MCA btl: parameter "btl_self_free_list_num" (current value: "0")
                          Number of fragments by default
MCA btl: parameter "btl_self_free_list_max" (current value: "-1")
                          Maximum number of fragments
MCA btl: parameter "btl_self_free_list_inc" (current value: "32")
                          Increment by this number of fragments
MCA btl: parameter "btl_self_eager_limit" (current value: "131072")
                          Eager size fragmeng (before the rendez-vous ptotocol)
MCA btl: parameter "btl_self_min_send_size" (current value: "262144")
                          Minimum fragment size after the rendez-vous
MCA btl: parameter "btl_self_max_send_size" (current value: "262144")
                          Maximum fragment size after the rendez-vous
                 MCA btl: parameter "btl_self_min_rdma_size" (current value:
                          Maximum fragment size for the RDMA transfer
                 MCA btl: parameter "btl_self_max_rdma_size" (current value:
                          Maximum fragment size for the RDMA transfer
MCA btl: parameter "btl_self_exclusivity" (current value: "65536")
                          Device exclusivity
                 MCA btl: parameter "btl_self_flags" (current value: "10")
                          Active behavior flags
                 MCA btl: parameter "btl_self_priority" (current value: "0")
                 MCA btl: parameter "btl_sm_free_list_num" (current value: "8")
MCA btl: parameter "btl_sm_free_list_max" (current value: "-1") MCA btl: parameter "btl_sm_free_list_inc" (current value: "64") MCA btl: parameter "btl_sm_exclusivity" (current value: "65535")
                 MCA btl: parameter "btl_sm_latency" (current value: "100")
                 MCA btl: parameter "btl_sm_max_procs" (current value: "-1")
MCA btl: parameter "btl_sm_sm_extra_procs" (current value: "2")
                 MCA btl: parameter "btl_sm_mpool" (current value: "sm")
MCA btl: parameter "btl_sm_eager_limit" (current value: "4096") MCA btl: parameter "btl_sm_max_frag_size" (current value: "32768") MCA btl: parameter "btl_sm_size_of_cb_queue" (current value: "128") MCA btl: parameter "btl_sm_cb_lazy_free_freq" (current value: "120")
                 MCA btl: parameter "btl_sm_priority" (current value: "0")
MCA btl: parameter "btl_tcp_if_include" (current value: <none>)
                 MCA btl: parameter "btl_tcp_if_exclude" (current value: "lo")
MCA btl: parameter "btl_tcp_free_list_num" (current value: "8") MCA btl: parameter "btl_tcp_free_list_max" (current value: "-1") MCA btl: parameter "btl_tcp_free_list_inc" (current value: "32")
                 MCA btl: parameter "btl_tcp_sndbuf" (current value: "131072")
                 MCA btl: parameter "btl_tcp_rcvbuf" (current value: "131072")
MCA btl: parameter "btl_tcp_endpoint_cache" (current value: "30720")
                 MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")
MCA btl: parameter "btl_tcp_eager_limit" (current value: "65536") MCA btl: parameter "btl_tcp_min_send_size" (current value: "65536") MCA btl: parameter "btl_tcp_max_send_size" (current value: "131072") MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072") MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647")
                 MCA btl: parameter "btl_tcp_flags" (current value: "122")
                 MCA btl: parameter "btl_tcp_priority" (current value: "0")
MCA btl: parameter "btl_udapl_free_list_num" (current value: "8")
                          Initial size of free lists (must be >= 1).
MCA btl: parameter "btl_udapl_free_list_max" (current value: "-1") Maximum size of free lists (-1 = infinite, otherwise must be >=
MCA btl: parameter "btl_udapl_free_list_inc" (current value: "8")
                          Increment size of free lists (must be >= 1).
                 MCA btl: parameter "btl_udapl_mpool" (current value: "udapl")
                          Name of the memory pool to be used.
MCA btl: parameter "btl_udapl_max_modules" (current value: "8")
                          Maximum number of supported HCAs.
                 MCA btl: parameter "btl_udapl_num_recvs" (current value: "8")
Total number of receive buffers to keep posted per endpoint
                          (must be >= 1).
                 MCA btl: parameter "btl_udapl_num_sends" (current value: "7")
Maximum number of sends to post on an endpoint (must be >= 1).
                 MCA btl: parameter "btl_udapl_sr_win" (current value: "4")
Window size at which point an explicit credit message will be
                          generated (must be >= 1).
MCA btl: parameter "btl_udapl_eager_rdma_num" (current value: "32") Number of RDMA buffers to allocate for small messages (must be
                          >= 1).
MCA btl: parameter "btl_udapl_max_eager_rdma_peers" (current value:
Maximum number of peers allowed to use RDMA for short messages (independently RDMA will still be used for large messages, (must
                          be >= 0; if zero then RDMA will not be used for short
MCA btl: parameter "btl_udapl_eager_rdma_win" (current value: "28") Window size at which point an explicit credit message will be
                          generated (must be >= 1).
MCA btl: parameter "btl_udapl_timeout" (current value: "10000000")
                          Connection timeout, in microseconds.
MCA btl: parameter "btl_udapl_conn_priv_data" (current value: "1") Use connect private data to establish connections (not supported
                          by all uDAPL implementations).
MCA btl: parameter "btl_udapl_async_events" (current value: "100000000") The asynchronous event queue will only be checked after entering
                          progress this number of times.
MCA btl: parameter "btl_udapl_buffer_alignment" (current value: "256") Preferred communication buffer alignment, in bytes (must be >=
MCA btl: parameter "btl_udapl_async_evd_qlen" (current value: "256")
                          The asynchronous event dispatcher queue length.
MCA btl: parameter "btl_udapl_conn_evd_qlen" (current value: "256") The connection event dispatcher queue length is a function of
                          the number of connections expected.
MCA btl: parameter "btl_udapl_dto_evd_qlen" (current value: "256") The data transfer operation event dispatcher queue length is a function of the number of connections as well as the maximum
                          number of outstanding data transfer operations.
MCA btl: parameter "btl_udapl_max_request_dtos" (current value: "76") Maximum number of outstanding submitted sends and rdma operations per endpoint, (see Section 6.6.6 of uDAPL Spec.). MCA btl: parameter "btl_udapl_max_recv_dtos" (current value: "8") Maximum number of outstanding submitted receive operations per
                          endpoint, (see Section 6.6.6 of uDAPL Spec.).
MCA btl: parameter "btl_udapl_exclusivity" (current value: "1014")
                          uDAPL BTL exclusivity (must be >= 0).
MCA btl: parameter "btl_udapl_eager_limit" (current value: "8192")
                          Eager send limit, in bytes (must be >= 1).
MCA btl: parameter "btl_udapl_min_send_size" (current value: "16384")
                          Minimum send size, in bytes (must be >= 1).
MCA btl: parameter "btl_udapl_max_send_size" (current value: "65536")
                          Maximum send size, in bytes (must be >= 1).
MCA btl: parameter "btl_udapl_min_rdma_size" (current value: "524288")
                          Minimum RDMA size, in bytes (must be >= 1).
MCA btl: parameter "btl_udapl_max_rdma_size" (current value: "131072")
                          Maximum RDMA size, in bytes (must be >= 1).
                 MCA btl: parameter "btl_udapl_flags" (current value: "2")
                          BTL flags, added together: PUT=2 (cannot be 0).
MCA btl: parameter "btl_udapl_bandwidth" (current value: "225") Approximate maximum bandwidth of network (must be >= 1).
                 MCA btl: parameter "btl_udapl_priority" (current value: "0")
                 MCA btl: parameter "btl_base_include" (current value: <none>)
                 MCA btl: parameter "btl_base_exclude" (current value: <none>)
MCA btl: parameter "btl_base_warn_component_unused" (current value: "0") This parameter is used to turn on warning messages when certain
                          NICs are not used
                 MCA mtl: parameter "mtl" (current value: <none>)
Default selection set of components for the mtl framework
                          (<none> means "use all components that can be found")
                 MCA mtl: parameter "mtl_base_verbose" (current value: "0")
Verbosity level for the mtl framework (0 = no verbosity)
                MCA topo: parameter "topo" (current value: <none>)
Default selection set of components for the topo framework
                          (<none> means "use all components that can be found")
                MCA topo: parameter "topo_base_verbose" (current value: "0")
Verbosity level for the topo framework (0 = no verbosity)
                 MCA osc: parameter "osc" (current value: <none>)
Default selection set of components for the osc framework
                          (<none> means "use all components that can be found")
                 MCA osc: parameter "osc_base_verbose" (current value: "0")
Verbosity level for the osc framework (0 = no verbosity)
                 MCA osc: parameter "osc_pt2pt_no_locks" (current value: "0")
Enable optimizations available only if MPI_LOCK is not used. MCA osc: parameter "osc_pt2pt_eager_limit" (current value: "16384")
                          Max size of eagerly sent data
                 MCA osc: parameter "osc_pt2pt_priority" (current value: "0")
              MCA errmgr: parameter "errmgr" (current value: <none>)
Default selection set of components for the errmgr framework
                          (<none> means "use all components that can be found")
              MCA errmgr: parameter "errmgr_hnp_debug" (current value: "0")
              MCA errmgr: parameter "errmgr_hnp_priority" (current value: "0")
              MCA errmgr: parameter "errmgr_orted_debug" (current value: "0")
MCA errmgr: parameter "errmgr_orted_priority" (current value: "0")
              MCA errmgr: parameter "errmgr_proxy_debug" (current value: "0")
MCA errmgr: parameter "errmgr_proxy_priority" (current value: "0") MCA gpr: parameter "gpr_base_maxsize" (current value: "2147483647")
                 MCA gpr: parameter "gpr_base_blocksize" (current value: "512")
                 MCA gpr: parameter "gpr" (current value: <none>)
Default selection set of components for the gpr framework
                          (<none> means "use all components that can be found")
                 MCA gpr: parameter "gpr_null_priority" (current value: "0")
                 MCA gpr: parameter "gpr_proxy_debug" (current value: "0")
                 MCA gpr: parameter "gpr_proxy_priority" (current value: "0")
                 MCA gpr: parameter "gpr_replica_debug" (current value: "0")
                 MCA gpr: parameter "gpr_replica_isolate" (current value: "0")
                 MCA gpr: parameter "gpr_replica_priority" (current value: "0")
MCA iof: parameter "iof_base_window_size" (current value: "4096")
                 MCA iof: parameter "iof_base_service" (current value: "0.0.0")
                 MCA iof: parameter "iof" (current value: <none>)
Default selection set of components for the iof framework
                          (<none> means "use all components that can be found")
                 MCA iof: parameter "iof_proxy_debug" (current value: "1")
                 MCA iof: parameter "iof_proxy_priority" (current value: "0")
                 MCA iof: parameter "iof_svc_debug" (current value: "1")
                 MCA iof: parameter "iof_svc_priority" (current value: "0")
                  MCA ns: parameter "ns" (current value: <none>)
Default selection set of components for the ns framework (<none>
                          means "use all components that can be found")
                  MCA ns: parameter "ns_proxy_debug" (current value: "0")
MCA ns: parameter "ns_proxy_maxsize" (current value: "2147483647")
                  MCA ns: parameter "ns_proxy_blocksize" (current value: "512")
                  MCA ns: parameter "ns_proxy_priority" (current value: "0")
                  MCA ns: parameter "ns_replica_debug" (current value: "0")
                  MCA ns: parameter "ns_replica_isolate" (current value: "0")
MCA ns: parameter "ns_replica_maxsize" (current value: "2147483647") MCA ns: parameter "ns_replica_blocksize" (current value: "512")
                  MCA ns: parameter "ns_replica_priority" (current value: "0")
                 MCA oob: parameter "oob" (current value: <none>)
Default selection set of components for the oob framework
                          (<none> means "use all components that can be found")
                 MCA oob: parameter "oob_base_verbose" (current value: "0")
Verbosity level for the oob framework (0 = no verbosity)
                 MCA oob: parameter "oob_tcp_peer_limit" (current value: "-1")
MCA oob: parameter "oob_tcp_peer_retries" (current value: "60")
                 MCA oob: parameter "oob_tcp_debug" (current value: "0")
                 MCA oob: parameter "oob_tcp_include" (current value: <none>)
                 MCA oob: parameter "oob_tcp_exclude" (current value: <none>)
                 MCA oob: parameter "oob_tcp_sndbuf" (current value: "131072")
                 MCA oob: parameter "oob_tcp_rcvbuf" (current value: "131072")
MCA oob: parameter "oob_tcp_connect_timeout" (current value: "600") connect() timeout in seconds, before trying next interface MCA oob: parameter "oob_tcp_connect_sleep" (current value: "1") Enable (1) /Disable (0) random sleep for connection wireup MCA oob: parameter "oob_tcp_listen_mode" (current value: "event")
                          Mode for HNP to accept incoming connections: event,
MCA oob: parameter "oob_tcp_listen_thread_max_queue" (current value:
                          High water mark for queued accepted socket list size
MCA oob: parameter "oob_tcp_listen_thread_max_time" (current value:
Maximum amount of time (in milliseconds) to wait between
                          processing accepted socket list
MCA oob: parameter "oob_tcp_accept_spin_count" (current value: "10") Number of times to let accept return EWOULDBLOCK before updating
                          accepted socket list
                 MCA oob: parameter "oob_tcp_priority" (current value: "0")
                 MCA ras: parameter "ras" (current value: <none>)
MCA ras: parameter "ras_dash_host_priority" (current value: "5")
                          Selection priority for the dash_host RAS component
                 MCA ras: parameter "ras_gridengine_debug" (current value: "0")
Enable debugging output for the gridengine ras component MCA ras: parameter "ras_gridengine_priority" (current value: "100")
                          Priority of the gridengine ras component
MCA ras: parameter "ras_gridengine_verbose" (current value: "0") Enable verbose output for the gridengine ras component MCA ras: parameter "ras_gridengine_show_jobid" (current value: "0")
                          Show the JOB_ID of the Grid Engine job
MCA ras: parameter "ras_localhost_priority" (current value: "0")
                          Selection priority for the localhost RAS component
                 MCA ras: parameter "ras_tm_priority" (current value: "100")
                          Priority of the tm ras component
                 MCA rds: parameter "rds" (current value: <none>)
                 MCA rds: parameter "rds_hostfile_debug" (current value: "0")
                          Toggle debug output for hostfile RDS component
                 MCA rds: parameter "rds_hostfile_path" (current value:
                          ORTE Host filename
MCA rds: parameter "rds_hostfile_priority" (current value: "0")
                 MCA rds: parameter "rds_proxy_priority" (current value: "0")
                 MCA rds: parameter "rds_resfile_debug" (current value: "0")
                          Toggle debug output for resfile RDS component
                 MCA rds: parameter "rds_resfile_name" (current value: <none>)
                          ORTE Resource filename
                 MCA rds: parameter "rds_resfile_priority" (current value: "0")
               MCA rmaps: parameter "rmaps_base_verbose" (current value: "0")
                          Verbosity level for the rmaps framework
MCA rmaps: parameter "rmaps_base_schedule_policy" (current value:
                          Scheduling Policy for RMAPS. [slot | node]
               MCA rmaps: parameter "rmaps_base_pernode" (current value: "0")
                          Launch one ppn as directed
MCA rmaps: parameter "rmaps_base_n_pernode" (current value: "-1")
                          Launch n procs/node
MCA rmaps: parameter "rmaps_base_schedule_local" (current value: "1") If nonzero, allow scheduling MPI applications on the same node as mpirun (default). If zero, do not schedule any MPI
                          applications on the same node as mpirun
MCA rmaps: parameter "rmaps_base_no_oversubscribe" (current value: "0") If nonzero, then do not allow oversubscription of nodes - mpirun will return an error if there aren't enough nodes to launch all
                          processes without oversubscribing
               MCA rmaps: parameter "rmaps" (current value: <none>)
Default selection set of components for the rmaps framework
                          (<none> means "use all components that can be found")
MCA rmaps: parameter "rmaps_round_robin_debug" (current value: "1")
                          Toggle debug output for Round Robin RMAPS component
MCA rmaps: parameter "rmaps_round_robin_priority" (current value: "1")
                          Selection priority for Round Robin RMAPS component
                MCA rmgr: parameter "rmgr" (current value: <none>)
Default selection set of components for the rmgr framework
                          (<none> means "use all components that can be found")
                MCA rmgr: parameter "rmgr_proxy_priority" (current value: "0")
                MCA rmgr: parameter "rmgr_urm_priority" (current value: "0")
                 MCA rml: parameter "rml" (current value: <none>)
Default selection set of components for the rml framework
                          (<none> means "use all components that can be found")
                 MCA rml: parameter "rml_base_verbose" (current value: "0")
Verbosity level for the rml framework (0 = no verbosity)
                 MCA rml: parameter "rml_oob_priority" (current value: "0")
MCA pls: parameter "pls_base_reuse_daemons" (current value: "0") If nonzero, reuse daemons to launch dynamically spawned
                          processes.  If zero, do not reuse daemons (default)
                 MCA pls: parameter "pls" (current value: <none>)
Default selection set of components for the pls framework
                          (<none> means "use all components that can be found")
                 MCA pls: parameter "pls_base_verbose" (current value: "0")
Verbosity level for the pls framework (0 = no verbosity)
                 MCA pls: parameter "pls_gridengine_debug" (current value: "0")
                          Enable debugging of gridengine pls component
MCA pls: parameter "pls_gridengine_verbose" (current value: "0") Enable verbose output of the gridengine qrsh -inherit command MCA pls: parameter "pls_gridengine_priority" (current value: "100")
                          Priority of the gridengine pls component
MCA pls: parameter "pls_gridengine_orted" (current value: "orted") The command name that the gridengine pls component will invoke
                          for the ORTE daemon
                 MCA pls: parameter "pls_proxy_priority" (current value: "0")
                 MCA pls: parameter "pls_rsh_debug" (current value: "0")
Whether or not to enable debugging output for the rsh pls
                          component (0 or 1)
MCA pls: parameter "pls_rsh_num_concurrent" (current value: "128") How many pls_rsh_agent instances to invoke concurrently (must be
                          > 0)
                 MCA pls: parameter "pls_rsh_force_rsh" (current value: "0")
Force the launcher to always use rsh, even for local daemons
                 MCA pls: parameter "pls_rsh_orted" (current value: "orted")
The command name that the rsh pls component will invoke for the
                          ORTE daemon
                 MCA pls: parameter "pls_rsh_priority" (current value: "10")
                          Priority of the rsh pls component
                 MCA pls: parameter "pls_rsh_delay" (current value: "1")
Delay (in seconds) between invocations of the remote agent, but only used when the "debug" MCA parameter is true, or the top-level MCA debugging is enabled (otherwise this value is
                 MCA pls: parameter "pls_rsh_reap" (current value: "1")
If set to 1, wait for all the processes to complete before exiting. Otherwise, quit immediately -- without waiting for
                          confirmation that all other processes in the job have
MCA pls: parameter "pls_rsh_assume_same_shell" (current value: "1") If set to 1, assume that the shell on the remote node is the same as the shell on the local node. Otherwise, probe for what
                          the remote shell.
MCA pls: parameter "pls_rsh_agent" (current value: "ssh : rsh") The command used to launch executables on remote nodes
                          (typically either "ssh" or "rsh")
                 MCA pls: parameter "pls_tm_debug" (current value: "0")
                          Enable debugging of the TM pls
                 MCA pls: parameter "pls_tm_verbose" (current value: "0")
                          Enable verbose output of the TM pls
                 MCA pls: parameter "pls_tm_priority" (current value: "75")
                          Default selection priority
                 MCA pls: parameter "pls_tm_orted" (current value: "orted")
                          Command to use to start proxy orted
MCA pls: parameter "pls_tm_want_path_check" (current value: "1") Whether the launching process should check for the pls_tm_orted executable in the PATH before launching (the TM API does not
                          give an idication of failure; this is a somewhat-lame
                          workaround; non-zero values enable this check)
                 MCA sds: parameter "sds" (current value: <none>)
Default selection set of components for the sds framework
                          (<none> means "use all components that can be found")
                 MCA sds: parameter "sds_base_verbose" (current value: "0")
Verbosity level for the sds framework (0 = no verbosity)
                 MCA sds: parameter "sds_env_priority" (current value: "0")
                 MCA sds: parameter "sds_pipe_priority" (current value: "0")
                 MCA sds: parameter "sds_seed_priority" (current value: "0")
MCA sds: parameter "sds_singleton_priority" (current value: "0")

Reply via email to