Brian, Ralph,I neglected to mention in my first email that the application hasn't completed when I see the "HNP lost" messages. All processes of the application are still running on the nodes (well consuming cpu cycles really). I should check to see if mpirun is still there.
Further investigation has revealed that the IB switch is logging a fault on both power supplies (even though all status lights are green). So I suspect I do have a flaky IB problem. I'll start there anyway.
One quick question - what would be a good TCP stress test to do? It might help me tie down the problem to the switch or the adapter cards.
Thanks again, Glenn
What Ralph said is generally true. If your application completed, this is nothing to worry about. It means that an error occurred on the socket between mpirun ad some other process. However, combind with the travor0 errors in the log files, it could mean that your IPoIB network is acting flaky. That would have me slightly concerned. Enough that I'd consider running some TCP stress tests on the network to make sure it's acting normally.Hope this helps, Brian On Jul 10, 2007, at 11:32 AM, Ralph H Castain wrote:On 7/10/07 11:08 AM, "Glenn Carver" <glenn.car...@atm.ch.cam.ac.uk> wrote:Hi, I'd be grateful if someone could explain the meaning of this error message to me and whether it indicates a hardware problem or application software issue: [node2:11881] OOB: Connection to HNP lost [node1:09876] OOB: Connection to HNP lostThis message is nothing to be concerned about - all it indicates is that mpirun exited before our daemon on your backend nodes did. It's relatively harmless and probably should be eliminated in some future version (exceptwhen developers are running in debug mode).The message can appear when the timing changes between front and backendnodes. What happens is:1. mpirun detects that your processes have all completed. It then orders theshutdown of the daemons on your backend nodes.2. each daemon does an orderly shutdown. Just before it terminates, it tellsmpirun that it is done cleaning up and is about to exit3. when mpirun hears that all daemons are done cleaning up, it exits itself. This is where the timing issue comes into play - if mpirun exits before thedaemon, then you get that error message as the daemon is terminating.So it's all a question of whether mpirun completes the last few steps to exit before the daemons do. In most cases, the daemons complete first as they have less to do. Sometimes, mpirun manages to get out first, and youget the message.I doubt it has anything to do with your hardware issues. Personally, I would just ignore the message - I'll see it gets removed in later releases toavoid unnecessary confusion. Hope that helps RalphI have a small cluster which until last week was just fine. Unfortunately we were hit by a sudden power dip which brought the cluster down and did significant damage to other servers (blew power supplies and disk). Although the cluster machines and the Infiniband link is up and running jobs I am now getting these errors in user applications which we've never had before. The system messages file reports (for node2): Jul 5 12:08:28 node1 genunix: [ID 408789 kern.notice] NOTICE:>> tavor0: fault cleared external to device; service availableJul 5 12:08:28 node1 genunix: [ID 451854 kern.notice] NOTICE: tavor0: port 1 up Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info] /pci@1,0/pci1022,7450@2/pci15b3,5a46@1/pci15b3,5a44@0 (tavor0) onlineJul 7 16:18:32 node1 ib: [ID 842868 kern.info] IB device: daplt@0, daplt0Jul 7 16:18:32 node1 genunix: [ID 936769 kern.info] daplt0 is /ib/ daplt@0 Jul 7 16:18:32 node1 genunix: [ID 408114 kern.info] /ib/daplt@0>> (daplt0) onlineJul 7 16:18:32 node1 genunix: [ID 834635 kern.info] /ib/daplt@0 (daplt0) multipath status: degraded, path /pci@1,0/pci1022,7450@2/pci15 b3,5a46@1/pci15b3,5a44@0 (tavor0) to target address: daplt,0 is online Load balancing: round-robin I wonder if this messages are indicative of a hardware problem, possibly on the Infiniband switch or the host adapters on the cluster machines. The cluster software has not been altered but there have been small changes to the application codes. But I want to rule out hardware issues because of the power dip first. Anyone seen this message before and know whether to investigate hardware first? I did check the archives but it didn't help. More info provided below. Any help appreciate, thanks. Glenn -- Details: Cluster uses mix of Sun's X4100/X4200 machines linked with Sun supplied Infiniband and host adapters. All machines are running Solaris 10_x86 (11/06) with latest kernel patches Software is Sun Clustertools 7. Node2 $ ifconfig ibd1ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3 inet 192.168.50.202 netmask ffffff00 broadcast 192.168.50.255Node1 $ ifconfig ibd1ibd1: flags=1000843<UP,BROADCAST,RUNNING,MULTICAST,IPv4> mtu 2044 index 3 inet 192.168.50.201 netmask ffffff00 broadcast 192.168.50.255ompi_info -a Open MPI: 1.2.1r14096-ct7b030r1838 Open MPI SVN revision: 0 Open RTE: 1.2.1r14096-ct7b030r1838 Open RTE SVN revision: 0 OPAL: 1.2.1r14096-ct7b030r1838 OPAL SVN revision: 0MCA backtrace: printstack (MCA v1.0, API v1.0, Component v1.2.1) MCA paffinity: solaris (MCA v1.0, API v1.0, Component v1.2.1) MCA maffinity: first_use (MCA v1.0, API v1.0, Component v1.2.1) MCA timer: solaris (MCA v1.0, API v1.0, Component v1.2.1)MCA allocator: basic (MCA v1.0, API v1.0, Component v1.0)MCA allocator: bucket (MCA v1.0, API v1.0, Component v1.0) MCA coll: basic (MCA v1.0, API v1.0, Component v1.2.1) MCA coll: self (MCA v1.0, API v1.0, Component v1.2.1)MCA coll: sm (MCA v1.0, API v1.0, Component v1.2.1)MCA coll: tuned (MCA v1.0, API v1.0, Component v1.2.1) MCA io: romio (MCA v1.0, API v1.0, Component v1.2.1)MCA mpool: sm (MCA v1.0, API v1.0, Component v1.2.1)MCA mpool: udapl (MCA v1.0, API v1.0, Component v1.2.1)MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.1) MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.1) MCA bml: r2 (MCA v1.0, API v1.0, Component v1.2.1) MCA rcache: rb (MCA v1.0, API v1.0, Component v1.2.1) MCA rcache: vma (MCA v1.0, API v1.0, Component v1.2.1)MCA btl: self (MCA v1.0, API v1.0.1, Component v1.2.1) MCA btl: sm (MCA v1.0, API v1.0.1, Component v1.2.1)MCA btl: tcp (MCA v1.0, API v1.0.1, Component v1.0)MCA btl: udapl (MCA v1.0, API v1.0, Component v1.2.1) MCA topo: unity (MCA v1.0, API v1.0, Component v1.2.1)MCA osc: pt2pt (MCA v1.0, API v1.0, Component>> v1.2.1)MCA errmgr: hnp (MCA v1.0, API v1.3, Component v1.2.1)MCA errmgr: orted (MCA v1.0, API v1.3, Component v1.2.1) MCA errmgr: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA gpr: null (MCA v1.0, API v1.0, Component v1.2.1) MCA gpr: proxy (MCA v1.0, API v1.0, Component v1.2.1) MCA gpr: replica (MCA v1.0, API v1.0, Component v1.2.1) MCA iof: proxy (MCA v1.0, API v1.0, Component>> v1.2.1)MCA iof: svc (MCA v1.0, API v1.0, Component v1.2.1)MCA ns: proxy (MCA v1.0, API v2.0, Component v1.2.1) MCA ns: replica (MCA v1.0, API v2.0, Component v1.2.1)MCA oob: tcp (MCA v1.0, API v1.0, Component v1.0)MCA ras: dash_host (MCA v1.0, API v1.3, Component v1.2.1) MCA ras: gridengine (MCA v1.0, API v1.3, Component v1.2.1) MCA ras: localhost (MCA v1.0, API v1.3, Component v1.2.1)MCA ras: tm (MCA v1.0, API v1.3, Component v1.2.1)MCA rds: hostfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rds: proxy (MCA v1.0, API v1.3, Component v1.2.1) MCA rds: resfile (MCA v1.0, API v1.3, Component v1.2.1) MCA rmaps: round_robin (MCA v1.0, API v1.3, Component v1.2.1) MCA rmgr: proxy (MCA v1.0, API v2.0, Component v1.2.1)MCA rmgr: urm (MCA v1.0, API v2.0, Component v1.2.1) MCA rml: oob (MCA v1.0, API v1.0, Component v1.2.1)MCA pls: gridengine (MCA v1.0, API v1.3, Component v1.2.1) MCA pls: proxy (MCA v1.0, API v1.3, Component v1.2.1)MCA pls: rsh (MCA v1.0, API v1.3, Component v1.2.1) MCA pls: tm (MCA v1.0, API v1.3, Component v1.2.1) MCA sds: env (MCA v1.0, API v1.0, Component v1.2.1)MCA sds: pipe (MCA v1.0, API v1.0, Component v1.2.1) MCA sds: seed (MCA v1.0, API v1.0, Component v1.2.1) MCA sds: singleton (MCA v1.0, API v1.0, Component v1.2.1)Prefix: /opt/SUNWhpc/HPC7.0 Bindir: /opt/SUNWhpc/HPC7.0/bin Libdir: /opt/SUNWhpc/HPC7.0/lib Incdir: /opt/SUNWhpc/HPC7.0/include Pkglibdir: /opt/SUNWhpc/HPC7.0/lib/openmpi Sysconfdir: /opt/SUNWhpc/HPC7.0/etc Configured architecture: i386-pc-solaris2.10 Configured by: root Configured on: Fri Mar 30 13:40:12 EDT 2007 Configure host: burpen-csx10-0 Built by: root Built on: Fri Mar 30 13:57:25 EDT 2007 Built host: burpen-csx10-0 C bindings: yes C++ bindings: yes Fortran77 bindings: yes (all) Fortran90 bindings: yes Fortran90 bindings size: trivial C compiler: cc C compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/cc C char size: 1 C bool size: 1 C short size: 2 C int size: 4 C long size: 4 C float size: 4 C double size: 8 C pointer size: 4 C char align: 1 C bool align: 1 C int align: 4 C float align: 4 C double align: 4 C++ compiler: CC C++ compiler absolute: /ws/ompi-tools/SUNWspro/SOS11/bin/CC Fortran77 compiler: f77 Fortran77 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f77 Fortran90 compiler: f95 Fortran90 compiler abs: /ws/ompi-tools/SUNWspro/SOS11/bin/f95 Fort integer size: 4 Fort logical size: 4>> Fort logical value true: 1Fort have integer1: yes Fort have integer2: yes Fort have integer4: yes Fort have integer8: yes Fort have integer16: no Fort have real4: yes Fort have real8: yes Fort have real16: no Fort have complex8: yes Fort have complex16: yes Fort have complex32: no Fort integer1 size: 1 Fort integer2 size: 2 Fort integer4 size: 4 Fort integer8 size: 8 Fort integer16 size: -1>> Fort real size: 4Fort real4 size: 4 Fort real8 size: 8 Fort real16 size: -1 Fort dbl prec size: 4 Fort cplx size: 4 Fort dbl cplx size: 4 Fort cplx8 size: 8 Fort cplx16 size: 16 Fort cplx32 size: -1 Fort integer align: 4 Fort integer1 align: 1 Fort integer2 align: 2 Fort integer4 align: 4 Fort integer8 align: 4 Fort integer16 align: -1 Fort real align: 4 Fort real4 align: 4 Fort real8 align: 4 Fort real16 align: -1 Fort dbl prec align: 4 Fort cplx align: 4 Fort dbl cplx align: 4 Fort cplx8 align: 4 Fort cplx16 align: 4 Fort cplx32 align: -1 C profiling: yes C++ profiling: yes Fortran77 profiling: yes Fortran90 profiling: yes C++ exceptions: yes Thread support: no Build CFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 - xprefetch -xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5 Build CXXFLAGS: -DNDEBUG -xtarget=opteron -xarch=sse2 - xprefetch -xprefetch_level=2 -xvector=simd -xdepend=yes -xbuiltin=%all -xO5 Build FFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch -xprefetch_level=2 -xvector=simd -stackvar -xO5 Build FCFLAGS: -xtarget=opteron -xarch=sse2 -xprefetch -xprefetch_level=2 -xvector=simd -stackvar -xO5 Build LDFLAGS: -export-dynamic -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/amd64 -R/opt/SUNWhpc/ HPC7.0/lib/amd64 -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/amd64 -R/opt/SUNWhpc/HPC7.0/lib/amd64 -R/opt/ mx/lib -R/opt/SUNWhpc/HPC7.0/lib -R/opt/mx/lib/ amd64 -R/opt/SUNWhpc/HPC7.0/lib/amd64 Build LIBS: -lsocket -lnsl -lrt -lm Wrapper extra CFLAGS: Wrapper extra CXXFLAGS: Wrapper extra FFLAGS: Wrapper extra FCFLAGS: Wrapper extra LDFLAGS: -R/opt/mx/lib -R/opt/SUNWhpc/HPC7.0/ lib -R/opt/mx/lib/amd64 -R/opt/SUNWhpc/HPC7.0/lib/amd64 Wrapper extra LIBS: -lsocket -lnsl -lrt -lm -ldl Internal debug support: no MPI parameter check: runtime Memory profiling support: no Memory debugging support: no libltdl support: yes Heterogeneous support: yes mpirun default --prefix: yesMCA mca: parameter "mca_param_files" (current value:"/home/tomcat/.openmpi/mca-params.conf:/opt/SUNWhpc/HPC7.0/etc/ openmpi-mca-par ams.conf")Path for MCA configuration files containingdefault parameter valuesMCA mca: parameter "mca_component_path" (current value:"/opt/SUNWhpc/HPC7.0/lib/openmpi:/home/tomcat/.openmpi/components")Path where to look for Open MPI and ORTE componentsMCA mca: parameter "mca_verbose" (current value:>> <none>)>> Whether you want MPI API parametersTop-level verbosity parameter MCA mca: parameter "mca_component_show_load_errors" (current value: "0") Whether to show errors for components that failed to load or not MCA mca: parameter "mca_component_disable_dlopen" (current value: "0") Whether to attempt to disable opening dynamic components or notMCA mpi: parameter "mpi_param_check" (current value: "1")checked at run-time or not. Possible values are 0 (no checking) and 1 (perform checking at run-time)MCA mpi: parameter "mpi_yield_when_idle" (current value:"0") Yield the processor when waiting for MPI communication (for MPI processes, will default to 1 when oversubscribing nodes)MCA mpi: parameter "mpi_event_tick_rate" (current value:"-1")How often to progress TCP communications (0= never, otherwise specified in microseconds) MCA mpi: parameter "mpi_show_handle_leaks" (current value: "0") Whether MPI_FINALIZE shows all MPI handles that were not freed or notMCA mpi: parameter "mpi_no_free_handles" (current value:"0") Whether to actually free MPI objects when their handles are freedMCA mpi: parameter "mpi_show_mca_params" (current value:"0") Whether to show all MCA parameter value during MPI_INIT or not (good for reproducability of MPI jobs) MCA mpi: parameter "mpi_show_mca_params_file" (current value: <none>) If mpi_show_mca_params is true, setting this string to a validfilename tells Open MPI to dump all the MCAparameter values into a file suitable for reading via the mca_param_filesparameter (good for reproducability of MPI jobs) MCA mpi: parameter "mpi_paffinity_alone" (current value:"0")If nonzero, assume that this job is the only (setof) process(es) running on each node and bind processes to processors, starting with processor ID 0 MCA mpi: parameter "mpi_keep_peer_hostnames" (current value: "1") If nonzero, save the string hostnames of all MPI peer processes (mostly for error / debugging output messages). This can addquite a bit of memory usage to each MPI process. MCA mpi: parameter "mpi_abort_delay" (current value: "0")If nonzero, print out an identifying message when MPI_ABORT is invoked (hostname, PID of the process that called MPI_ABORT) and delay for that many seconds before exiting (a negative delay value means to never abort). This allows attaching of a debugger before quitting the job.MCA mpi: information "mpi_abort_print_stack" (value: "0")If nonzero, print out a stack trace when MPI_ABORT is invokedMCA mpi: parameter "mpi_preconnect_all" (current value: "0")Whether to force MPI processes to create connections / warmup with *all* peers during MPI_INIT (vs.>> making connections lazily>> this setting can help bandwidth-- upon the first MPI traffic between each process peer pair)MCA mpi: parameter "mpi_preconnect_oob" (current value: "0")Whether to force MPI processes to fully wire-up the OOB system between MPI processes.MCA mpi: parameter "mpi_leave_pinned" (current value: "0")Whether to use the "leave pinned" protocol or not. Enabling>> Default selection set of components forperformance when repeatedly sending and receiving large messages with the same buffers over RDMA-based networks. MCA mpi: parameter "mpi_leave_pinned_pipeline" (current value: "0") Whether to use the "leave pinned pipeline" protocol or not.MCA orte: parameter "orte_debug" (current value: "0")Top-level ORTE debug switchMCA orte: parameter "orte_no_daemonize" (current value: "0") Whether to properly daemonize the ORTE daemons ornot MCA orte: parameter "orte_base_user_debugger" (current value: "totalview@mpirun@ -a @mpirun_args@ : fxp @mpirun@ -a@mpirun_args@") Sequence of user-level debuggers to search for in orterunMCA orte: parameter "orte_abort_timeout" (current value:"10") Time to wait [in seconds] before giving up on aborting an ORTE operationMCA orte: parameter "orte_timing" (current value: "0") Request that critical timing loops be measuredMCA opal: parameter "opal_signal" (current value: "6,10,8,11") If a signal is received, display the stack trace frameMCA backtrace: parameter "backtrace" (current value: <none>) Default selection set of components for thebacktrace framework(<none> means "use all components that can befound")MCA backtrace: parameter "backtrace_base_verbose" (currentvalue: "0")Verbosity level for the backtrace framework(0 = no verbosity) MCA backtrace: parameter "backtrace_printstack_priority" (current value: "0") MCA memory: parameter "memory" (current value: <none>)Default selection set of components for thememory framework(<none> means "use all components that can befound")MCA memory: parameter "memory_base_verbose" (current value:"0")Verbosity level for the memory framework (0= no verbosity)MCA paffinity: parameter "paffinity" (current value: <none>) Default selection set of components for thepaffinity framework(<none> means "use all components that can befound") MCA paffinity: parameter "paffinity_solaris_priority" (current value: "10")Priority of the solaris paffinity component MCA maffinity: parameter "maffinity" (current value: <none>) Default selection set of components for themaffinity framework(<none> means "use all components that can befound") MCA maffinity: parameter "maffinity_first_use_priority" (current value: "10")Priority of the first_use maffinity componentMCA timer: parameter "timer" (current value: <none>)>> Default selection set of components forthe timer framework(<none> means "use all components that can befound")MCA timer: parameter "timer_base_verbose" (current value: "0")Verbosity level for the timer framework (0 = no verbosity)MCA timer: parameter "timer_solaris_priority" (currentvalue: "0")MCA allocator: parameter "allocator" (current value: <none>)the allocator framework(<none> means "use all components that can befound")MCA allocator: parameter "allocator_base_verbose" (currentvalue: "0")Verbosity level for the allocator framework(0 = no verbosity) MCA allocator: parameter "allocator_basic_priority" (current value: "0") MCA allocator: parameter "allocator_bucket_num_buckets" (current value: "30") MCA allocator: parameter "allocator_bucket_priority" (current value: "0") MCA coll: parameter "coll" (current value: <none>)Default selection set of components for thecoll framework(<none> means "use all components that can befound")MCA coll: parameter "coll_base_verbose" (current value: "0") Verbosity level for the coll framework (0 =no verbosity)MCA coll: parameter "coll_basic_priority" (current value:"10") Priority of the basic coll componentMCA coll: parameter "coll_basic_crossover" (current value:"4") Minimum number of processes in a communicator before using the logarithmic algorithmsMCA coll: parameter "coll_self_priority" (current value:"75")MCA coll: parameter "coll_sm_priority" (current value: "0")Priority of the sm coll component MCA coll: parameter "coll_sm_control_size" (current value: "4096") Length of the control data -- should usually be either thelength of a cache line on most SMPs, or thesize of a page on machines that support direct memory affinity page placement (in bytes) MCA coll: parameter "coll_sm_bootstrap_filename" (current value: "shared_mem_sm_bootstrap") Filename (in the Open MPI session directory) of the coll sm component bootstrap rendezvous mmap file MCA coll: parameter "coll_sm_bootstrap_num_segments" (current value: "8") Number of segments in the bootstrap file MCA coll: parameter "coll_sm_fragment_size" (current value: "8192") Fragment size (in bytes) used for passing data through shared memory (will be rounded up to the nearest control_size size)MCA coll: parameter "coll_sm_mpool" (current value: "sm")Name of the mpool component to use MCA coll: parameter "coll_sm_comm_in_use_flags" (current value: "2") Number of "in use" flags, used to mark a message passing area segment as currently being used or not (must be >= 2 and <= comm_num_segments) MCA coll: parameter "coll_sm_comm_num_segments" (current value: "8") Number of segments in each communicator's>> shared memory messagepassing area (must be >= 2, and must be a multipleof comm_in_use_flags)MCA coll: parameter "coll_sm_tree_degree" (current value:"4") Degree of the tree for tree-based operations (must be => 1 and <= min(control_size, 255)) MCA coll: information "coll_sm_shared_mem_used_bootstrap" (value: "160") Amount of shared memory used in the shared>> memory bootstrap area(in bytes)MCA coll: parameter "coll_sm_info_num_procs" (currentvalue: "4")Number of processes to use for the calculation ofthe shared_mem_size MCA information parameter (must be => 2) MCA coll: information "coll_sm_shared_mem_used_data" (value: "548864") Amount of shared memory used in the shared memory data area for info_num_procs processes (in bytes)MCA coll: parameter "coll_tuned_priority" (current value:"30") Priority of the tuned coll component MCA coll: parameter "coll_tuned_pre_allocate_memory_comm_size_limit" (current value: "32768") Size of communicator were we stop pre-allocating memory for the fixed internal buffer used for message requests etc that is hung off the communicator data segment. I.e. if you have a 100'000 nodes you might not want to pre-allocate 200'000 request handle slots per communicator instance! MCA coll: parameter "coll_tuned_init_tree_fanout" (current value: "4") Inital fanout used in the tree topologies for each communicator. This is only an initial guess, if a tuned collective needs adifferent fanout for an operation, it buildit dynamically. This parameter is only for the first guess and might save a little time MCA coll: parameter "coll_tuned_init_chain_fanout" (current value: "4") Inital fanout used in the chain (fanout followed by pipeline) topologies for each communicator. This is only an initial guess, if a tuned collective needs a different fanout for an operation, it build it dynamically. This parameter is only for the first guess and might save a little time MCA coll: parameter "coll_tuned_use_dynamic_rules" (current value: "0") Switch used to decide if we use static (compiled/if statements)or dynamic (built at runtime) decision functionrules MCA io: parameter "io_base_freelist_initial_size" (current value: "16") Initial MPI-2 IO request freelist size MCA io: parameter "io_base_freelist_max_size" (current value: "64") Max size of the MPI-2 IO request freelist MCA io: parameter "io_base_freelist_increment" (current value: "16")Increment size of the MPI-2 IO request freelistMCA io: parameter "io" (current value: <none>)Default selection set of components for theio framework (<none>means "use all components that can be found")MCA io: parameter "io_base_verbose" (current>> value: "0")Verbosity level for the io framework (0 = no verbosity)MCA io: parameter "io_romio_priority" (current value: "10")Priority of the io romio component MCA io: parameter "io_romio_delete_priority" (current value: "10") Delete priority of the io romio component MCA io: parameter "io_romio_enable_parallel_optimizations" (current value: "0") Enable set of Open MPI-added options to>> improve collective filei/o performance MCA mpool: parameter "mpool" (current value: <none>)Default selection set of components for thempool framework(<none> means "use all components that can befound")MCA mpool: parameter "mpool_base_verbose" (current value: "0")Verbosity level for the mpool framework (0 = no verbosity) MCA mpool: parameter "mpool_sm_allocator" (current value: "bucket")Name of allocator component to use with sm mpoolMCA mpool: parameter "mpool_sm_max_size" (current value: "536870912")Maximum size of the sm mpool shared memory fileMCA mpool: parameter "mpool_sm_min_size" (current value: "134217728")Minimum size of the sm mpool shared memory file MCA mpool: parameter "mpool_sm_per_peer_size" (currentvalue: "33554432") Size (in bytes) to allocate per local peer in the sm mpoolshared memory file, bounded by min_size andmax_sizeMCA mpool: parameter "mpool_sm_priority" (current value: "0") MCA mpool: parameter "mpool_udapl_priority" (current value:"0") MCA mpool: parameter "mpool_base_use_mem_hooks" (current value: "0")use memory hooks for deregistering freed memory MCA mpool: parameter "mpool_use_mem_hooks" (current value:"0") (deprecated, use mpool_base_use_mem_hooks) MCA pml: parameter "pml" (current value: <none>)Default selection set of components for thepml framework(<none> means "use all components that can befound")MCA pml: parameter "pml_base_verbose" (current value: "0")Verbosity level for the pml framework (0 = no verbosity)MCA pml: parameter "pml_cm_free_list_num" (current value:"4") Initial size of request free lists MCA pml: parameter "pml_cm_free_list_max" (current value: "-1") Maximum size of request free lists MCA pml: parameter "pml_cm_free_list_inc" (current value: "64") Number of elements to add when growing request free listsMCA pml: parameter "pml_cm_priority" (current value: "30")CM PML selection priority MCA pml: parameter "pml_ob1_free_list_num" (current value: "4") MCA pml: parameter "pml_ob1_free_list_max" (current value: "-1") MCA pml: parameter "pml_ob1_free_list_inc" (current value: "64")MCA pml: parameter "pml_ob1_priority" (current value: "20")MCA pml: parameter "pml_ob1_eager_limit" (current value: "131072") MCA pml: parameter "pml_ob1_send_pipeline_depth" (current value: "3") MCA pml: parameter "pml_ob1_recv_pipeline_depth">> (current value: "4")MCA bml: parameter "bml" (current value: <none>)Default selection set of components for thebml framework(<none> means "use all components that can befound")MCA bml: parameter "bml_base_verbose" (current value: "0")Verbosity level for the bml framework (0 = no verbosity) MCA bml: parameter "bml_r2_show_unreach_errors" (current value: "1")Show error message when procs are>> unreachableMCA bml: parameter "bml_r2_priority" (current value: "0")MCA rcache: parameter "rcache" (current value: <none>)Default selection set of components for thercache framework(<none> means "use all components that can befound")MCA rcache: parameter "rcache_base_verbose" (current value:"0")Verbosity level for the rcache framework (0= no verbosity)MCA rcache: parameter "rcache_rb_priority" (current value: "0") MCA rcache: parameter "rcache_vma_mru_len" (current value:"256") The maximum size IN ENTRIES of the MRU (most recently used) rcache list MCA rcache: parameter "rcache_vma_mru_size" (current value: "1073741824") The maximum size IN BYTES of the MRU (most recently used) rcache listMCA rcache: parameter "rcache_vma_priority" (current value:"0")MCA btl: parameter "btl_base_debug" (current value: "0")If btl_base_debug is 1 standard debug is output, if > 1 verbose debug is output MCA btl: parameter "btl" (current value: <none>)Default selection set of components for thebtl framework(<none> means "use all components that can befound")MCA btl: parameter "btl_base_verbose" (current value: "0")Verbosity level for the btl framework (0 = no verbosity)MCA btl: parameter "btl_self_free_list_num" (currentvalue: "0") Number of fragments by defaultMCA btl: parameter "btl_self_free_list_max" (currentvalue: "-1") Maximum number of fragmentsMCA btl: parameter "btl_self_free_list_inc" (currentvalue: "32") Increment by this number of fragments MCA btl: parameter "btl_self_eager_limit" (current value: "131072") Eager size fragmeng (before the rendez- vous ptotocol)MCA btl: parameter "btl_self_min_send_size" (currentvalue: "262144") Minimum fragment size after the rendez- vousMCA btl: parameter "btl_self_max_send_size" (currentvalue: "262144") Maximum fragment size after the rendez- vousMCA btl: parameter "btl_self_min_rdma_size" (current value:"2147483647")Maximum fragment size for the RDMA transfer MCA btl: parameter "btl_self_max_rdma_size" (current value:"2147483647")Maximum fragment size for the RDMA transferMCA btl: parameter "btl_self_exclusivity" (current value: "65536") Device exclusivity MCA btl: parameter "btl_self_flags" (current>> value: "10")Active behavior flagsMCA btl: parameter "btl_self_priority" (current value: "0") MCA btl: parameter "btl_sm_free_list_num" (current value:"8") MCA btl: parameter "btl_sm_free_list_max" (current value: "-1") MCA btl: parameter "btl_sm_free_list_inc" (current value: "64") MCA btl: parameter "btl_sm_exclusivity" (current value: "65535")MCA btl: parameter "btl_sm_latency" (current value: "100") MCA btl: parameter "btl_sm_max_procs" (current>> value: "-1")MCA btl: parameter "btl_sm_sm_extra_procs" (current value: "2")MCA btl: parameter "btl_sm_mpool" (current value: "sm")MCA btl: parameter "btl_sm_eager_limit" (current value: "4096") MCA btl: parameter "btl_sm_max_frag_size" (current value: "32768") MCA btl: parameter "btl_sm_size_of_cb_queue" (current value: "128") MCA btl: parameter "btl_sm_cb_lazy_free_freq" (current value: "120")MCA btl: parameter "btl_sm_priority" (current value: "0")MCA btl: parameter "btl_tcp_if_include" (current value: <none>)MCA btl: parameter "btl_tcp_if_exclude" (current value:"lo") MCA btl: parameter "btl_tcp_free_list_num" (current value: "8") MCA btl: parameter "btl_tcp_free_list_max" (current value: "-1") MCA btl: parameter "btl_tcp_free_list_inc" (current value: "32") MCA btl: parameter "btl_tcp_sndbuf" (current value: "131072") MCA btl: parameter "btl_tcp_rcvbuf" (current value: "131072")MCA btl: parameter "btl_tcp_endpoint_cache" (currentvalue: "30720")MCA btl: parameter "btl_tcp_exclusivity" (current value:"0") MCA btl: parameter "btl_tcp_eager_limit" (current value: "65536") MCA btl: parameter "btl_tcp_min_send_size" (current value: "65536") MCA btl: parameter "btl_tcp_max_send_size" (current value: "131072") MCA btl: parameter "btl_tcp_min_rdma_size" (current value: "131072") MCA btl: parameter "btl_tcp_max_rdma_size" (current value: "2147483647")MCA btl: parameter "btl_tcp_flags" (current value: "122") MCA btl: parameter "btl_tcp_priority" (current value: "0")MCA btl: parameter "btl_udapl_free_list_num" (current value: "8") Initial size of free lists (must be >= 1). MCA btl: parameter "btl_udapl_free_list_max" (current value: "-1") Maximum size of free lists (-1 = infinite, otherwise must be >= 1). MCA btl: parameter "btl_udapl_free_list_inc" (current value: "8")Increment size of free lists (must be >= 1). MCA btl: parameter "btl_udapl_mpool" (current value:"udapl") Name of the memory pool to be used. MCA btl: parameter "btl_udapl_max_modules" (current value: "8") Maximum number of supported HCAs.MCA btl: parameter "btl_udapl_num_recvs" (current value:"8") Total number of receive buffers to keep posted per endpoint (must be >= 1).MCA btl: parameter "btl_udapl_num_sends" (current value:"7") Maximum number of sends to post on an endpoint (must be >= 1).MCA btl: parameter "btl_udapl_sr_win" (current value: "4")>> Window size at which point an explicit>> be >= 0; if zero then RDMA will not becredit message will be generated (must be >= 1). MCA btl: parameter "btl_udapl_eager_rdma_num" (current value: "32") Number of RDMA buffers to allocate for small messages (must be= 1).MCA btl: parameter "btl_udapl_max_eager_rdma_peers" (current value: "16")Maximum number of peers allowed to use RDMAfor short messages (independently RDMA will still be used for large messages, (mustused for short messages). MCA btl: parameter "btl_udapl_eager_rdma_win" (current value: "28") Window size at which point an explicit credit message will be generated (must be >= 1). MCA btl: parameter "btl_udapl_timeout" (current value: "10000000") Connection timeout, in microseconds. MCA btl: parameter "btl_udapl_conn_priv_data" (current value: "1") Use connect private data to establish connections (not supported by all uDAPL implementations).MCA btl: parameter "btl_udapl_async_events" (currentvalue: "100000000") The asynchronous event queue will only be checked after entering progress this number of times. MCA btl: parameter "btl_udapl_buffer_alignment" (current value: "256") Preferred communication buffer alignment, in bytes (must be >= 1). MCA btl: parameter "btl_udapl_async_evd_qlen" (current value: "256")The asynchronous event dispatcher queue length.MCA btl: parameter "btl_udapl_conn_evd_qlen" (current value: "256") The connection event dispatcher queue length is a function of the number of connections expected.MCA btl: parameter "btl_udapl_dto_evd_qlen" (currentvalue: "256") The data transfer operation event dispatcher queue length is a function of the number of connections as well as the maximumnumber of outstanding data transfer operations.MCA btl: parameter "btl_udapl_max_request_dtos" (current value: "76") Maximum number of outstanding submitted sends and rdmaoperations per endpoint, (see Section 6.6.6of uDAPL Spec.). MCA btl: parameter "btl_udapl_max_recv_dtos" (current value: "8") Maximum number of outstanding submitted receive operations perendpoint, (see Section 6.6.6 of uDAPL Spec.).MCA btl: parameter "btl_udapl_exclusivity" (current value: "1014") uDAPL BTL exclusivity (must be >= 0). MCA btl: parameter "btl_udapl_eager_limit" (current value: "8192") Eager send limit, in bytes (must be >= 1). MCA btl: parameter "btl_udapl_min_send_size" (current value: "16384")Minimum send size, in bytes (must be >= 1).MCA btl: parameter "btl_udapl_max_send_size" (current value: "65536")Maximum send size, in bytes (must be >= 1).MCA btl: parameter "btl_udapl_min_rdma_size" (current value: "524288") Minimum RDMA size, in bytes (must be >=>> 1).MCA btl: parameter "btl_udapl_max_rdma_size" (current value: "131072")Maximum RDMA size, in bytes (must be >= 1). MCA btl: parameter "btl_udapl_flags" (current value: "2") BTL flags, added together: PUT=2 (cannot be 0).MCA btl: parameter "btl_udapl_bandwidth" (current value: "225") Approximate maximum bandwidth of network (must be >= 1).MCA btl: parameter "btl_udapl_priority" (current value: "0") MCA btl: parameter "btl_base_include" (current>> value:<none>)MCA btl: parameter "btl_base_exclude" (current value:<none>) MCA btl: parameter "btl_base_warn_component_unused" (current value: "0") This parameter is used to turn on warning messages when certain NICs are not used MCA mtl: parameter "mtl" (current value: <none>)Default selection set of components for themtl framework(<none> means "use all components that can befound")MCA mtl: parameter "mtl_base_verbose" (current value: "0")Verbosity level for the mtl framework (0 = no verbosity) MCA topo: parameter "topo" (current value: <none>)Default selection set of components for thetopo framework(<none> means "use all components that can befound")MCA topo: parameter "topo_base_verbose" (current value: "0") Verbosity level for the topo framework (0 =no verbosity) MCA osc: parameter "osc" (current value: <none>)Default selection set of components for theosc framework(<none> means "use all components that can befound")MCA osc: parameter "osc_base_verbose" (current value: "0")Verbosity level for the osc framework (0 = no verbosity)MCA osc: parameter "osc_pt2pt_no_locks" (current value: "0")Enable optimizations available only if MPI_LOCK is not used. MCA osc: parameter "osc_pt2pt_eager_limit" (current value: "16384") Max size of eagerly sent dataMCA osc: parameter "osc_pt2pt_priority" (current value: "0")MCA errmgr: parameter "errmgr" (current value: <none>)Default selection set of components for theerrmgr framework(<none> means "use all components that can befound")MCA errmgr: parameter "errmgr_hnp_debug" (current value: "0") MCA errmgr: parameter "errmgr_hnp_priority" (current value:"0")MCA errmgr: parameter "errmgr_orted_debug" (current value: "0")MCA errmgr: parameter "errmgr_orted_priority" (current value: "0")MCA errmgr: parameter "errmgr_proxy_debug" (current value: "0")MCA errmgr: parameter "errmgr_proxy_priority" (current value: "0") MCA gpr: parameter "gpr_base_maxsize" (current value: "2147483647")MCA gpr: parameter "gpr_base_blocksize" (current value:"512") MCA gpr: parameter "gpr" (current value: <none>)Default selection set of components for thegpr framework(<none> means "use all components that can befound")MCA gpr: parameter "gpr_null_priority" (current value: "0")MCA gpr: parameter "gpr_proxy_debug" (current>> value: "0")MCA gpr: parameter "gpr_proxy_priority" (current value: "0") MCA gpr: parameter "gpr_replica_debug" (current value: "0") MCA gpr: parameter "gpr_replica_isolate" (current value:"0")MCA gpr: parameter "gpr_replica_priority" (current value:"0") MCA iof: parameter "iof_base_window_size" (current value: "4096")MCA iof: parameter "iof_base_service" (current value:"0.0.0") MCA iof: parameter "iof" (current value: <none>)Default selection set of components for>> theiof framework(<none> means "use all components that can befound")MCA iof: parameter "iof_proxy_debug" (current value: "1") MCA iof: parameter "iof_proxy_priority" (current value: "0") MCA iof: parameter "iof_svc_debug" (current value: "1") MCA iof: parameter "iof_svc_priority" (current value: "0")MCA ns: parameter "ns" (current value: <none>)Default selection set of components for thens framework (<none>means "use all components that can be found") MCA ns: parameter "ns_proxy_debug" (current value: "0")MCA ns: parameter "ns_proxy_maxsize" (current value: "2147483647")MCA ns: parameter "ns_proxy_blocksize" (current value:"512")MCA ns: parameter "ns_proxy_priority" (current value: "0") MCA ns: parameter "ns_replica_debug" (current value: "0") MCA ns: parameter "ns_replica_isolate" (current value: "0")MCA ns: parameter "ns_replica_maxsize" (current value: "2147483647") MCA ns: parameter "ns_replica_blocksize" (current value: "512")MCA ns: parameter "ns_replica_priority" (current value:"0") MCA oob: parameter "oob" (current value: <none>)Default selection set of components for theoob framework(<none> means "use all components that can befound")MCA oob: parameter "oob_base_verbose" (current value: "0")Verbosity level for the oob framework (0 = no verbosity)MCA oob: parameter "oob_tcp_peer_limit" (current value:"-1") MCA oob: parameter "oob_tcp_peer_retries" (current value: "60")MCA oob: parameter "oob_tcp_debug" (current value: "0") MCA oob: parameter "oob_tcp_include" (current value: <none>) MCA oob: parameter "oob_tcp_exclude" (current value: <none>)MCA oob: parameter "oob_tcp_sndbuf" (current value: "131072") MCA oob: parameter "oob_tcp_rcvbuf" (current value: "131072") MCA oob: parameter "oob_tcp_connect_timeout" (current value: "600")connect() timeout in seconds, before tryingnext interface MCA oob: parameter "oob_tcp_connect_sleep" (current value: "1") Enable (1) /Disable (0) random sleep for connection wireup MCA oob: parameter "oob_tcp_listen_mode" (current value: "event")Mode for HNP to accept incoming connections: event,listen_threadMCA oob: parameter "oob_tcp_listen_thread_max_queue"(current value: "10")High water mark for queued accepted socket listsize MCA oob: parameter "oob_tcp_listen_thread_max_time">> (current value:>> Selection priority for the dash_host"10")Maximum amount of time (in milliseconds) towait between processing accepted socket list MCA oob: parameter "oob_tcp_accept_spin_count" (current value: "10") Number of times to let accept return EWOULDBLOCK before updating accepted socket listMCA oob: parameter "oob_tcp_priority" (current value: "0")MCA ras: parameter "ras" (current value: <none>)MCA ras: parameter "ras_dash_host_priority" (currentvalue: "5")RAS componentMCA ras: parameter "ras_gridengine_debug" (current value:"0") Enable debugging output for the gridengine ras component MCA ras: parameter "ras_gridengine_priority" (current value: "100") Priority of the gridengine ras componentMCA ras: parameter "ras_gridengine_verbose" (currentvalue: "0") Enable verbose output for the gridengine ras component MCA ras: parameter "ras_gridengine_show_jobid" (current value: "0") Show the JOB_ID of the Grid Engine jobMCA ras: parameter "ras_localhost_priority" (currentvalue: "0")Selection priority for the localhost RAS component MCA ras: parameter "ras_tm_priority" (current value: "100")Priority of the tm ras component MCA rds: parameter "rds" (current value: <none>)MCA rds: parameter "rds_hostfile_debug" (current value: "0") Toggle debug output for hostfile RDS component MCA rds: parameter "rds_hostfile_path" (current value:"/opt/SUNWhpc/HPC7.0/etc/openmpi- default-hostfile") ORTE Host filename MCA rds: parameter "rds_hostfile_priority" (current value: "0")MCA rds: parameter "rds_proxy_priority" (current value: "0") MCA rds: parameter "rds_resfile_debug" (current value: "0") Toggle debug output for resfile RDS component MCA rds: parameter "rds_resfile_name" (current value:<none>) ORTE Resource filenameMCA rds: parameter "rds_resfile_priority" (current value:"0")MCA rmaps: parameter "rmaps_base_verbose" (current value: "0")Verbosity level for the rmaps framework MCA rmaps: parameter "rmaps_base_schedule_policy" (current value: "unspec") Scheduling Policy for RMAPS. [slot | node]MCA rmaps: parameter "rmaps_base_pernode" (current value: "0")Launch one ppn as directed MCA rmaps: parameter "rmaps_base_n_pernode" (current value: "-1") Launch n procs/node MCA rmaps: parameter "rmaps_base_schedule_local" (current value: "1") If nonzero, allow scheduling MPI applications on the same node as mpirun (default). If zero, do not schedule any MPI applications on the same node as mpirun MCA rmaps: parameter "rmaps_base_no_oversubscribe" (current value: "0") If nonzero, then do not allow oversubscription of nodes - mpirunwill return an error if there aren't enough>> nodes to launch all>> Default selection set of components forprocesses without oversubscribing MCA rmaps: parameter "rmaps" (current value: <none>)Default selection set of components for thermaps framework(<none> means "use all components that can befound") MCA rmaps: parameter "rmaps_round_robin_debug" (current value: "1")Toggle debug output for Round Robin RMAPS componentMCA rmaps: parameter "rmaps_round_robin_priority" (current value: "1")Selection priority for Round Robin RMAPS componentMCA rmgr: parameter "rmgr" (current value: <none>)the rmgr framework(<none> means "use all components that can befound")MCA rmgr: parameter "rmgr_proxy_priority" (current value:"0")MCA rmgr: parameter "rmgr_urm_priority" (current value: "0")MCA rml: parameter "rml" (current value: <none>)Default selection set of components for therml framework(<none> means "use all components that can befound")MCA rml: parameter "rml_base_verbose" (current value: "0")Verbosity level for the rml framework (0 = no verbosity)MCA rml: parameter "rml_oob_priority" (current value: "0") MCA pls: parameter "pls_base_reuse_daemons" (currentvalue: "0") If nonzero, reuse daemons to launch dynamically spawnedprocesses. If zero, do not reuse daemons (default)MCA pls: parameter "pls" (current value: <none>)Default selection set of components for thepls framework(<none> means "use all components that can befound")MCA pls: parameter "pls_base_verbose" (current value: "0")Verbosity level for the pls framework (0 = no verbosity)MCA pls: parameter "pls_gridengine_debug" (current value:"0")Enable debugging of gridengine pls component MCA pls: parameter "pls_gridengine_verbose" (currentvalue: "0") Enable verbose output of the gridengine qrsh -inherit command MCA pls: parameter "pls_gridengine_priority" (current value: "100") Priority of the gridengine pls component MCA pls: parameter "pls_gridengine_orted" (current value: "orted") The command name that the gridengine pls component will invoke for the ORTE daemonMCA pls: parameter "pls_proxy_priority" (current value: "0") MCA pls: parameter "pls_rsh_debug" (current value: "0")Whether or not to enable debugging output for the rsh pls component (0 or 1)MCA pls: parameter "pls_rsh_num_concurrent" (currentvalue: "128") How many pls_rsh_agent instances to invoke concurrently (must beMCA pls: parameter "pls_rsh_force_rsh" (current value: "0")0)Force the launcher to always use rsh, even for local daemonsMCA pls: parameter "pls_rsh_orted" (current value: "orted") The command name that the rsh pls componentwill invoke for the ORTE daemon MCA pls: parameter "pls_rsh_priority" (current>> value: "10")Priority of the rsh pls componentMCA pls: parameter "pls_rsh_delay" (current value: "1")Delay (in seconds) between invocations of the remote agent, butonly used when the "debug" MCA parameter istrue, or the top-level MCA debugging is enabled (otherwise this value is ignored)MCA pls: parameter "pls_rsh_reap" (current value: "1")If set to 1, wait for all the processes to complete before exiting. Otherwise, quit immediately -- without waiting forconfirmation that all other processes>> in the jobhave completed. MCA pls: parameter "pls_rsh_assume_same_shell" (current value: "1") If set to 1, assume that the shell on the remote node is the same as the shell on the local node. Otherwise, probe for what the remote shell. MCA pls: parameter "pls_rsh_agent" (current value: "ssh : rsh") The command used to launch executables on remote nodes (typically either "ssh" or "rsh")MCA pls: parameter "pls_tm_debug" (current value: "0")Enable debugging of the TM plsMCA pls: parameter "pls_tm_verbose" (current value: "0")Enable verbose output of the TM plsMCA pls: parameter "pls_tm_priority" (current value: "75")Default selection priorityMCA pls: parameter "pls_tm_orted" (current value: "orted")Command to use to start proxy ortedMCA pls: parameter "pls_tm_want_path_check" (currentvalue: "1") Whether the launching process should check for the pls_tm_orted executable in the PATH before launching (the TM API does not give an idication of failure; this is a somewhat-lameworkaround; non-zero values enable this check)MCA sds: parameter "sds" (current value: <none>)Default selection set of components for thesds framework(<none> means "use all components that can befound")MCA sds: parameter "sds_base_verbose" (current value: "0")Verbosity level for the sds framework (0 = no verbosity)MCA sds: parameter "sds_env_priority" (current value: "0") MCA sds: parameter "sds_pipe_priority" (current value: "0") MCA sds: parameter "sds_seed_priority" (current value: "0") MCA sds: parameter "sds_singleton_priority" (currentvalue: "0") _______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users_______________________________________________ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users