Hmmm…certainly looks that way. I’ll investigate.
> On Apr 14, 2015, at 6:06 AM, Andy Riebs <andy.ri...@hp.com> wrote:
>
> Hi Ralph,
>
> Still no happiness... It looks like my LD_LIBRARY_PATH just isn't getting
> propagated?
>
> $ ldd /home/ariebs/mic/mpi-nightly/bin/orted
> linux-vdso.so.1 => (0x00007fffa1d3b000)
> libopen-rte.so.0 => /home/ariebs/mic/mpi-nightly/lib/libopen-rte.so.0
> (0x00002ab6ce464000)
> libopen-pal.so.0 => /home/ariebs/mic/mpi-nightly/lib/libopen-pal.so.0
> (0x00002ab6ce7d3000)
> libm.so.6 => /lib64/libm.so.6 (0x00002ab6cebbd000)
> libdl.so.2 => /lib64/libdl.so.2 (0x00002ab6ceded000)
> librt.so.1 => /lib64/librt.so.1 (0x00002ab6ceff1000)
> libutil.so.1 => /lib64/libutil.so.1 (0x00002ab6cf1f9000)
> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002ab6cf3fc000)
> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002ab6cf60f000)
> libc.so.6 => /lib64/libc.so.6 (0x00002ab6cf82c000)
> libimf.so =>
> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so
> (0x00002ab6cfb84000)
> libsvml.so =>
> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libsvml.so
> (0x00002ab6cffd6000)
> libirng.so =>
> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libirng.so
> (0x00002ab6d086f000)
> libintlc.so.5 =>
> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libintlc.so.5
> (0x00002ab6d0a82000)
> /lib64/ld-linux-k1om.so.2 (0x00002ab6ce243000)
>
> $ echo $LD_LIBRARY_PATH
> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic:/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/15.0/composer_xe_2015.2.164/mpirt/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/ipp/../compiler/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/ipp/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/ipp/tools/intel64/perfsys:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/mkl/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/tbb/lib/intel64/gcc4.1:/opt/intel/15.0/composer_xe_2015.2.164/debugger/ipt/ia32/lib
>
> $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H mic1 -N 2 --mca spml yoda --mca
> btl sm,self,tcp --mca plm_base_verbose 5 --mca memheap_base_verbose 100
> --leave-session-attached --mca mca_component_show_load_errors 1 $PWD/mic.out
> --------------------------------------------------------------------------
> A deprecated MCA variable value was specified in the environment or
> on the command line. Deprecated MCA variables should be avoided;
> they may disappear in future releases.
>
> Deprecated variable: mca_component_show_load_errors
> New variable: mca_base_component_show_load_errors
> --------------------------------------------------------------------------
> [atl1-02-mic0:16183] mca:base:select:( plm) Querying component [rsh]
> [atl1-02-mic0:16183] [[INVALID],INVALID] plm:rsh_lookup on agent ssh : rsh
> path NULL
> [atl1-02-mic0:16183] mca:base:select:( plm) Query of component [rsh] set
> priority to 10
> [atl1-02-mic0:16183] mca:base:select:( plm) Querying component [isolated]
> [atl1-02-mic0:16183] mca:base:select:( plm) Query of component [isolated]
> set priority to 0
> [atl1-02-mic0:16183] mca:base:select:( plm) Querying component [slurm]
> [atl1-02-mic0:16183] mca:base:select:( plm) Skipping component [slurm].
> Query failed to return a module
> [atl1-02-mic0:16183] mca:base:select:( plm) Selected component [rsh]
> [atl1-02-mic0:16183] plm:base:set_hnp_name: initial bias 16183 nodename hash
> 4238360777
> [atl1-02-mic0:16183] plm:base:set_hnp_name: final jobfam 33630
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh_setup on agent ssh : rsh path NULL
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:receive start comm
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:setup_job
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:setup_vm
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:setup_vm creating map
> [atl1-02-mic0:16183] [[33630,0],0] setup:vm: working unmanaged allocation
> [atl1-02-mic0:16183] [[33630,0],0] using dash_host
> [atl1-02-mic0:16183] [[33630,0],0] checking node mic1
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:setup_vm add new daemon
> [[33630,0],1]
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:setup_vm assigning new daemon
> [[33630,0],1] to node mic1
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: launching vm
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: local shell: 0 (bash)
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: assuming same remote shell as
> local shell
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: remote shell: 0 (bash)
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: final template argv:
> /usr/bin/ssh <template>
> PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ; export PATH ;
> LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH ; export
> LD_LIBRARY_PATH ;
> DYLD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$DYLD_LIBRARY_PATH ;
> export DYLD_LIBRARY_PATH ; /home/ariebs/mic/mpi-nightly/bin/orted -mca
> orte_leave_session_attached "1" --hnp-topo-sig
> 0N:1S:0L3:61L2:61L1:61C:244H:k1om -mca ess "env" -mca orte_ess_jobid
> "2203975680" -mca orte_ess_vpid "<template>" -mca orte_ess_num_procs "2" -mca
> orte_hnp_uri
> "2203975680.0;usock;tcp://16.113.180.127,192.0.0.122:34640;ud://2883658.78.1"
> --tree-spawn --mca spml "yoda" --mca btl "sm,self,tcp" --mca plm_base_verbose
> "5" --mca memheap_base_verbose "100" --mca mca_component_show_load_errors "1"
> -mca plm "rsh" -mca rmaps_ppr_n_pernode "2"
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh:launch daemon 0 not a child of mine
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: adding node mic1 to launch list
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: activating launch event
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: recording launch of daemon
> [[33630,0],1]
> [atl1-02-mic0:16183] [[33630,0],0] plm:rsh: executing: (/usr/bin/ssh)
> [/usr/bin/ssh mic1 PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ; export
> PATH ; LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH
> ; export LD_LIBRARY_PATH ;
> DYLD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$DYLD_LIBRARY_PATH ;
> export DYLD_LIBRARY_PATH ; /home/ariebs/mic/mpi-nightly/bin/orted -mca
> orte_leave_session_attached "1" --hnp-topo-sig
> 0N:1S:0L3:61L2:61L1:61C:244H:k1om -mca ess "env" -mca orte_ess_jobid
> "2203975680" -mca orte_ess_vpid 1 -mca orte_ess_num_procs "2" -mca
> orte_hnp_uri
> "2203975680.0;usock;tcp://16.113.180.127,192.0.0.122:34640;ud://2883658.78.1"
> --tree-spawn --mca spml "yoda" --mca btl "sm,self,tcp" --mca plm_base_verbose
> "5" --mca memheap_base_verbose "100" --mca mca_component_show_load_errors "1"
> -mca plm "rsh" -mca rmaps_ppr_n_pernode "2"]
> /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared libraries:
> libimf.so: cannot open shared object file: No such file or directory
> [atl1-02-mic0:16183] [[33630,0],0] daemon 1 failed with status 127
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:orted_cmd sending orted_exit
> commands
> --------------------------------------------------------------------------
> ORTE was unable to reliably start one or more daemons.
> This usually is caused by:
>
> * not finding the required libraries and/or binaries on
> one or more nodes. Please check your PATH and LD_LIBRARY_PATH
> settings, or configure OMPI with --enable-orterun-prefix-by-default
>
> * lack of authority to execute on one or more specified nodes.
> Please verify your allocation and authorities.
>
> * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base).
> Please check with your sys admin to determine the correct location to use.
>
> * compilation of the orted with dynamic libraries when static are required
> (e.g., on Cray). Please check your configure cmd line and consider using
> one of the contrib/platform definitions for your system type.
>
> * an inability to create a connection back to mpirun due to a
> lack of common network interfaces and/or no route found between
> them. Please check network connectivity (including firewalls
> and network routing requirements).
> --------------------------------------------------------------------------
> [atl1-02-mic0:16183] [[33630,0],0] plm:base:receive stop comm
>
>
> On 04/13/2015 07:47 PM, Ralph Castain wrote:
>> Weird. I’m not sure what to try at that point - IIRC, building static won’t
>> resolve this problem (but you could try and see). You could add the
>> following to the cmd line and see if it tells us anything useful:
>>
>> —leave-session-attached —mca mca_component_show_load_errors 1
>>
>> You might also do an ldd on /home/ariebs/mic/mpi-nightly/bin/orted and see
>> where it is looking for libimf since it (and not mic.out) is the one
>> complaining
>>
>>
>>> On Apr 13, 2015, at 1:58 PM, Andy Riebs <andy.ri...@hp.com
>>> <mailto:andy.ri...@hp.com>> wrote:
>>>
>>> Ralph and Nathan,
>>>
>>> The problem may be something trivial, as I don't typically use "shmemrun"
>>> to start jobs. With the following, I *think* I've demonstrated that the
>>> problem library is where it belongs on the remote system:
>>>
>>> $ ldd mic.out
>>> linux-vdso.so.1 => (0x00007fffb83ff000)
>>> liboshmem.so.0 => /home/ariebs/mic/mpi-nightly/lib/liboshmem.so.0
>>> (0x00002b059cfbb000)
>>> libmpi.so.0 => /home/ariebs/mic/mpi-nightly/lib/libmpi.so.0
>>> (0x00002b059d35a000)
>>> libopen-rte.so.0 =>
>>> /home/ariebs/mic/mpi-nightly/lib/libopen-rte.so.0 (0x00002b059d7e3000)
>>> libopen-pal.so.0 =>
>>> /home/ariebs/mic/mpi-nightly/lib/libopen-pal.so.0 (0x00002b059db53000)
>>> libm.so.6 => /lib64/libm.so.6 (0x00002b059df3d000)
>>> libdl.so.2 => /lib64/libdl.so.2 (0x00002b059e16c000)
>>> libutil.so.1 => /lib64/libutil.so.1 (0x00002b059e371000)
>>> libgcc_s.so.1 => /lib64/libgcc_s.so.1 (0x00002b059e574000)
>>> libpthread.so.0 => /lib64/libpthread.so.0 (0x00002b059e786000)
>>> libc.so.6 => /lib64/libc.so.6 (0x00002b059e9a4000)
>>> librt.so.1 => /lib64/librt.so.1 (0x00002b059ecfc000)
>>> libimf.so =>
>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so
>>> (0x00002b059ef04000)
>>> libsvml.so =>
>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libsvml.so
>>> (0x00002b059f356000)
>>> libirng.so =>
>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libirng.so
>>> (0x00002b059fbef000)
>>> libintlc.so.5 =>
>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libintlc.so.5
>>> (0x00002b059fe02000)
>>> /lib64/ld-linux-k1om.so.2 (0x00002b059cd9a000)
>>> $ echo $LD_LIBRARY_PATH
>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic:/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/intel64:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/15.0/composer_xe_2015.2.164/mpirt/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/ipp/../compiler/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/ipp/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/ipp/tools/intel64/perfsys:/opt/intel/mic/coi/host-linux-release/lib:/opt/intel/mic/myo/lib:/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/mkl/lib/intel64:/opt/intel/15.0/composer_xe_2015.2.164/tbb/lib/intel64/gcc4.1:/opt/intel/15.0/composer_xe_2015.2.164/debugger/ipt/ia32/lib
>>> $ ssh mic1 file
>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so
>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so: ELF
>>> 64-bit LSB shared object, Intel Xeon Phi coprocessor (k1om), version 1
>>> (SYSV), dynamically linked, not stripped
>>> $ shmemrun -H mic1 -N 2 --mca btl scif,self $PWD/mic.out
>>> /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared
>>> libraries: libimf.so: cannot open shared object file: No such file or
>>> directory
>>> ...
>>>
>>>
>>> On 04/13/2015 04:25 PM, Nathan Hjelm wrote:
>>>> For talking between PHIs on the same system I recommend using the scif
>>>> BTL NOT tcp.
>>>>
>>>> That said, it looks like the LD_LIBRARY_PATH is wrong on the remote
>>>> system. It looks like it can't find the intel compiler libraries.
>>>>
>>>> -Nathan Hjelm
>>>> HPC-5, LANL
>>>>
>>>> On Mon, Apr 13, 2015 at 04:06:21PM -0400, Andy Riebs wrote:
>>>>> Progress! I can run my trivial program on the local PHI, but not the
>>>>> other PHI, on the system. Here are the interesting parts:
>>>>>
>>>>> A pretty good recipe with last night's nightly master:
>>>>>
>>>>> $ ./configure --prefix=/home/ariebs/mic/mpi-nightly CC="icc -mmic"
>>>>> CXX="icpc -mmic" \
>>>>> --build=x86_64-unknown-linux-gnu --host=x86_64-k1om-linux \
>>>>> AR=x86_64-k1om-linux-ar RANLIB=x86_64-k1om-linux-ranlib
>>>>> LD=x86_64-k1om-linux-ld \
>>>>> --enable-mpirun-prefix-by-default --disable-io-romio
>>>>> --disable-mpi-fortran \
>>>>> --enable-orterun-prefix-by-default \
>>>>> --enable-debug
>>>>> $ make && make install
>>>>> $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H localhost -N 2 --mca spml
>>>>> yoda --mca btl sm,self,tcp $PWD/mic.out
>>>>> Hello World from process 0 of 2
>>>>> Hello World from process 1 of 2
>>>>> $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H localhost -N 2 --mca spml
>>>>> yoda --mca btl openib,sm,self $PWD/mic.out
>>>>> Hello World from process 0 of 2
>>>>> Hello World from process 1 of 2
>>>>> $
>>>>>
>>>>> However, I can't seem to cross the fabric. I can ssh freely back and
>>>>> forth
>>>>> between mic0 and mic1. However, running the next 2 tests from mic0, it
>>>>> certainly seems like the second one should work, too:
>>>>>
>>>>> $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H mic0 -N 2 --mca spml yoda
>>>>> --mca btl sm,self,tcp $PWD/mic.out
>>>>> Hello World from process 0 of 2
>>>>> Hello World from process 1 of 2
>>>>> $ shmemrun -x SHMEM_SYMMETRIC_HEAP_SIZE=1M -H mic1 -N 2 --mca spml yoda
>>>>> --mca btl sm,self,tcp $PWD/mic.out
>>>>> /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared
>>>>> libraries: libimf.so: cannot open shared object file: No such file or
>>>>> directory
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> ORTE was unable to reliably start one or more daemons.
>>>>> This usually is caused by:
>>>>>
>>>>> * not finding the required libraries and/or binaries on
>>>>> one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>>>> settings, or configure OMPI with --enable-orterun-prefix-by-default
>>>>>
>>>>> * lack of authority to execute on one or more specified nodes.
>>>>> Please verify your allocation and authorities.
>>>>>
>>>>> * the inability to write startup files into /tmp
>>>>> (--tmpdir/orte_tmpdir_base).
>>>>> Please check with your sys admin to determine the correct location to
>>>>> use.
>>>>>
>>>>> * compilation of the orted with dynamic libraries when static are
>>>>> required
>>>>> (e.g., on Cray). Please check your configure cmd line and consider
>>>>> using
>>>>> one of the contrib/platform definitions for your system type.
>>>>>
>>>>> * an inability to create a connection back to mpirun due to a
>>>>> lack of common network interfaces and/or no route found between
>>>>> them. Please check network connectivity (including firewalls
>>>>> and network routing requirements).
>>>>> ...
>>>>> $
>>>>>
>>>>> (Note that I get the same results with "--mca btl openib,sm,self"....)
>>>>>
>>>>> $ ssh mic1 file
>>>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so
>>>>> /opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so: ELF
>>>>> 64-bit LSB shared object, Intel Xeon Phi coprocessor (k1om), version 1
>>>>> (SYSV), dynamically linked, not stripped
>>>>> $ shmemrun -x
>>>>>
>>>>> LD_PRELOAD=/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so
>>>>> -H mic1 -N 2 --mca spml yoda --mca btl sm,self,tcp $PWD/mic.out
>>>>> /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared
>>>>> libraries: libimf.so: cannot open shared object file: No such file or
>>>>> directory
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> ORTE was unable to reliably start one or more daemons.
>>>>> This usually is caused by:
>>>>>
>>>>> * not finding the required libraries and/or binaries on
>>>>> one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>>>> settings, or configure OMPI with --enable-orterun-prefix-by-default
>>>>>
>>>>> * lack of authority to execute on one or more specified nodes.
>>>>> Please verify your allocation and authorities.
>>>>>
>>>>> * the inability to write startup files into /tmp
>>>>> (--tmpdir/orte_tmpdir_base).
>>>>> Please check with your sys admin to determine the correct location to
>>>>> use.
>>>>>
>>>>> * compilation of the orted with dynamic libraries when static are
>>>>> required
>>>>> (e.g., on Cray). Please check your configure cmd line and consider
>>>>> using
>>>>> one of the contrib/platform definitions for your system type.
>>>>>
>>>>> * an inability to create a connection back to mpirun due to a
>>>>> lack of common network interfaces and/or no route found between
>>>>> them. Please check network connectivity (including firewalls
>>>>> and network routing requirements).
>>>>>
>>>>> Following here is
>>>>> - IB information
>>>>> - Running the failing case with lots of debugging information. (As you
>>>>> might imagine, I've tried 17 ways from Sunday to try to ensure that
>>>>> libimf.so is found.)
>>>>>
>>>>> $ ibv_devices
>>>>> device node GUID
>>>>> ------ ----------------
>>>>> mlx4_0 24be05ffffa57160
>>>>> scif0 4c79bafffe4402b6
>>>>> $ ibv_devinfo
>>>>> hca_id: mlx4_0
>>>>> transport: InfiniBand (0)
>>>>> fw_ver: 2.11.1250
>>>>> node_guid: 24be:05ff:ffa5:7160
>>>>> sys_image_guid: 24be:05ff:ffa5:7163
>>>>> vendor_id: 0x02c9
>>>>> vendor_part_id: 4099
>>>>> hw_ver: 0x0
>>>>> phys_port_cnt: 2
>>>>> port: 1
>>>>> state: PORT_ACTIVE (4)
>>>>> max_mtu: 2048 (4)
>>>>> active_mtu: 2048 (4)
>>>>> sm_lid: 8
>>>>> port_lid: 86
>>>>> port_lmc: 0x00
>>>>> link_layer: InfiniBand
>>>>>
>>>>> port: 2
>>>>> state: PORT_DOWN (1)
>>>>> max_mtu: 2048 (4)
>>>>> active_mtu: 2048 (4)
>>>>> sm_lid: 0
>>>>> port_lid: 0
>>>>> port_lmc: 0x00
>>>>> link_layer: InfiniBand
>>>>>
>>>>> hca_id: scif0
>>>>> transport: SCIF (2)
>>>>> fw_ver: 0.0.1
>>>>> node_guid: 4c79:baff:fe44:02b6
>>>>> sys_image_guid: 4c79:baff:fe44:02b6
>>>>> vendor_id: 0x8086
>>>>> vendor_part_id: 0
>>>>> hw_ver: 0x1
>>>>> phys_port_cnt: 1
>>>>> port: 1
>>>>> state: PORT_ACTIVE (4)
>>>>> max_mtu: 4096 (5)
>>>>> active_mtu: 4096 (5)
>>>>> sm_lid: 1
>>>>> port_lid: 1001
>>>>> port_lmc: 0x00
>>>>> link_layer: SCIF
>>>>>
>>>>> $ shmemrun -x
>>>>>
>>>>> LD_PRELOAD=/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic/libimf.so
>>>>> -H mic1 -N 2 --mca spml yoda --mca btl sm,self,tcp --mca
>>>>> plm_base_verbose
>>>>> 5 --mca memheap_base_verbose 100 $PWD/mic.out
>>>>> [atl1-01-mic0:191024] mca:base:select:( plm) Querying component [rsh]
>>>>> [atl1-01-mic0:191024] [[INVALID],INVALID] plm:rsh_lookup on agent ssh :
>>>>> rsh path NULL
>>>>> [atl1-01-mic0:191024] mca:base:select:( plm) Query of component [rsh]
>>>>> set
>>>>> priority to 10
>>>>> [atl1-01-mic0:191024] mca:base:select:( plm) Querying component
>>>>> [isolated]
>>>>> [atl1-01-mic0:191024] mca:base:select:( plm) Query of component
>>>>> [isolated] set priority to 0
>>>>> [atl1-01-mic0:191024] mca:base:select:( plm) Querying component
>>>>> [slurm]
>>>>> [atl1-01-mic0:191024] mca:base:select:( plm) Skipping component
>>>>> [slurm].
>>>>> Query failed to return a module
>>>>> [atl1-01-mic0:191024] mca:base:select:( plm) Selected component [rsh]
>>>>> [atl1-01-mic0:191024] plm:base:set_hnp_name: initial bias 191024
>>>>> nodename
>>>>> hash 4121194178
>>>>> [atl1-01-mic0:191024] plm:base:set_hnp_name: final jobfam 29012
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh_setup on agent ssh : rsh
>>>>> path
>>>>> NULL
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:receive start comm
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_job
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm creating map
>>>>> [atl1-01-mic0:191024] [[29012,0],0] setup:vm: working unmanaged
>>>>> allocation
>>>>> [atl1-01-mic0:191024] [[29012,0],0] using dash_host
>>>>> [atl1-01-mic0:191024] [[29012,0],0] checking node mic1
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm add new daemon
>>>>> [[29012,0],1]
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:setup_vm assigning new
>>>>> daemon
>>>>> [[29012,0],1] to node mic1
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: launching vm
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: local shell: 0 (bash)
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: assuming same remote
>>>>> shell as
>>>>> local shell
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: remote shell: 0 (bash)
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: final template argv:
>>>>> /usr/bin/ssh <template>
>>>>> PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ; export PATH ;
>>>>> LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH ;
>>>>> export
>>>>> LD_LIBRARY_PATH ;
>>>>> DYLD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$DYLD_LIBRARY_PATH ;
>>>>> export DYLD_LIBRARY_PATH ; /home/ariebs/mic/mpi-nightly/bin/orted
>>>>> --hnp-topo-sig 0N:1S:0L3:61L2:61L1:61C:244H:k1om -mca ess "env" -mca
>>>>> orte_ess_jobid "1901330432" -mca orte_ess_vpid "<template>" -mca
>>>>> orte_ess_num_procs "2" -mca orte_hnp_uri
>>>>>
>>>>> "1901330432.0;usock;tcp://16.113.180.125,192.0.0.121:34249;ud://2359370.86.1
>>>>> <tcp://16.113.180.125,192.0.0.121:34249;ud://2359370.86.1>"
>>>>> --tree-spawn --mca spml "yoda" --mca btl "sm,self,tcp" --mca
>>>>> plm_base_verbose "5" --mca memheap_base_verbose "100" -mca plm "rsh"
>>>>> -mca
>>>>> rmaps_ppr_n_pernode "2"
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh:launch daemon 0 not a
>>>>> child of
>>>>> mine
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: adding node mic1 to launch
>>>>> list
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: activating launch event
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: recording launch of daemon
>>>>> [[29012,0],1]
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:rsh: executing: (/usr/bin/ssh)
>>>>> [/usr/bin/ssh mic1 PATH=/home/ariebs/mic/mpi-nightly/bin:$PATH ;
>>>>> export PATH ;
>>>>> LD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$LD_LIBRARY_PATH ;
>>>>> export
>>>>> LD_LIBRARY_PATH ;
>>>>> DYLD_LIBRARY_PATH=/home/ariebs/mic/mpi-nightly/lib:$DYLD_LIBRARY_PATH ;
>>>>> export DYLD_LIBRARY_PATH ; /home/ariebs/mic/mpi-nightly/bin/orted
>>>>> --hnp-topo-sig 0N:1S:0L3:61L2:61L1:61C:244H:k1om -mca ess "env" -mca
>>>>> orte_ess_jobid "1901330432" -mca orte_ess_vpid 1 -mca
>>>>> orte_ess_num_procs
>>>>> "2" -mca orte_hnp_uri
>>>>>
>>>>> "1901330432.0;usock;tcp://16.113.180.125,192.0.0.121:34249;ud://2359370.86.1
>>>>> <tcp://16.113.180.125,192.0.0.121:34249;ud://2359370.86.1>"
>>>>> --tree-spawn --mca spml "yoda" --mca btl "sm,self,tcp" --mca
>>>>> plm_base_verbose "5" --mca memheap_base_verbose "100" -mca plm "rsh"
>>>>> -mca
>>>>> rmaps_ppr_n_pernode "2"]
>>>>> /home/ariebs/mic/mpi-nightly/bin/orted: error while loading shared
>>>>> libraries: libimf.so: cannot open shared object file: No such file or
>>>>> directory
>>>>> [atl1-01-mic0:191024] [[29012,0],0] daemon 1 failed with status 127
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:orted_cmd sending
>>>>> orted_exit
>>>>> commands
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> ORTE was unable to reliably start one or more daemons.
>>>>> This usually is caused by:
>>>>>
>>>>> * not finding the required libraries and/or binaries on
>>>>> one or more nodes. Please check your PATH and LD_LIBRARY_PATH
>>>>> settings, or configure OMPI with --enable-orterun-prefix-by-default
>>>>>
>>>>> * lack of authority to execute on one or more specified nodes.
>>>>> Please verify your allocation and authorities.
>>>>>
>>>>> * the inability to write startup files into /tmp
>>>>> (--tmpdir/orte_tmpdir_base).
>>>>> Please check with your sys admin to determine the correct location to
>>>>> use.
>>>>>
>>>>> * compilation of the orted with dynamic libraries when static are
>>>>> required
>>>>> (e.g., on Cray). Please check your configure cmd line and consider
>>>>> using
>>>>> one of the contrib/platform definitions for your system type.
>>>>>
>>>>> * an inability to create a connection back to mpirun due to a
>>>>> lack of common network interfaces and/or no route found between
>>>>> them. Please check network connectivity (including firewalls
>>>>> and network routing requirements).
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [atl1-01-mic0:191024] [[29012,0],0] plm:base:receive stop comm
>>>>>
>>>>> On 04/13/2015 08:50 AM, Andy Riebs wrote:
>>>>>
>>>>> Hi Ralph,
>>>>>
>>>>> Here are the results with last night's "master" nightly,
>>>>> openmpi-dev-1487-g9c6d452.tar.bz2, and adding the
>>>>> memheap_base_verbose
>>>>> option (yes, it looks like the "ERROR_LOG" problem has gone away):
>>>>>
>>>>> $ cat /proc/sys/kernel/shmmax
>>>>> 33554432
>>>>> $ cat /proc/sys/kernel/shmall
>>>>> 2097152
>>>>> $ cat /proc/sys/kernel/shmmni
>>>>> 4096
>>>>> $ export SHMEM_SYMMETRIC_HEAP=1M
>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap --mca
>>>>> plm_base_verbose 5
>>>>> --mca memheap_base_verbose 100 $PWD/mic.out
>>>>> [atl1-01-mic0:190439] mca:base:select:( plm) Querying component
>>>>> [rsh]
>>>>> [atl1-01-mic0:190439] [[INVALID],INVALID] plm:rsh_lookup on agent
>>>>> ssh :
>>>>> rsh path NULL
>>>>> [atl1-01-mic0:190439] mca:base:select:( plm) Query of component
>>>>> [rsh]
>>>>> set priority to 10
>>>>> [atl1-01-mic0:190439] mca:base:select:( plm) Querying component
>>>>> [isolated]
>>>>> [atl1-01-mic0:190439] mca:base:select:( plm) Query of component
>>>>> [isolated] set priority to 0
>>>>> [atl1-01-mic0:190439] mca:base:select:( plm) Querying component
>>>>> [slurm]
>>>>> [atl1-01-mic0:190439] mca:base:select:( plm) Skipping component
>>>>> [slurm]. Query failed to return a module
>>>>> [atl1-01-mic0:190439] mca:base:select:( plm) Selected component
>>>>> [rsh]
>>>>> [atl1-01-mic0:190439] plm:base:set_hnp_name: initial bias 190439
>>>>> nodename hash 4121194178
>>>>> [atl1-01-mic0:190439] plm:base:set_hnp_name: final jobfam 31875
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:rsh_setup on agent ssh : rsh
>>>>> path NULL
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:receive start comm
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_job
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_vm
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_vm creating map
>>>>> [atl1-01-mic0:190439] [[31875,0],0] setup:vm: working unmanaged
>>>>> allocation
>>>>> [atl1-01-mic0:190439] [[31875,0],0] using dash_host
>>>>> [atl1-01-mic0:190439] [[31875,0],0] checking node atl1-01-mic0
>>>>> [atl1-01-mic0:190439] [[31875,0],0] ignoring myself
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:setup_vm only HNP in
>>>>> allocation
>>>>> [atl1-01-mic0:190439] [[31875,0],0] complete_setup on job [31875,1]
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch_apps for job
>>>>> [31875,1]
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch wiring up iof for
>>>>> job [31875,1]
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch [31875,1]
>>>>> registered
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:launch job [31875,1] is
>>>>> not
>>>>> a dynamic spawn
>>>>> [atl1-01-mic0:190441] mca: base: components_register: registering
>>>>> memheap components
>>>>> [atl1-01-mic0:190441] mca: base: components_register: found loaded
>>>>> component buddy
>>>>> [atl1-01-mic0:190441] mca: base: components_register: component buddy
>>>>> has no register or open function
>>>>> [atl1-01-mic0:190442] mca: base: components_register: registering
>>>>> memheap components
>>>>> [atl1-01-mic0:190442] mca: base: components_register: found loaded
>>>>> component buddy
>>>>> [atl1-01-mic0:190442] mca: base: components_register: component buddy
>>>>> has no register or open function
>>>>> [atl1-01-mic0:190442] mca: base: components_register: found loaded
>>>>> component ptmalloc
>>>>> [atl1-01-mic0:190442] mca: base: components_register: component
>>>>> ptmalloc
>>>>> has no register or open function
>>>>> [atl1-01-mic0:190441] mca: base: components_register: found loaded
>>>>> component ptmalloc
>>>>> [atl1-01-mic0:190441] mca: base: components_register: component
>>>>> ptmalloc
>>>>> has no register or open function
>>>>> [atl1-01-mic0:190441] mca: base: components_open: opening memheap
>>>>> components
>>>>> [atl1-01-mic0:190441] mca: base: components_open: found loaded
>>>>> component
>>>>> buddy
>>>>> [atl1-01-mic0:190441] mca: base: components_open: component buddy
>>>>> open
>>>>> function successful
>>>>> [atl1-01-mic0:190441] mca: base: components_open: found loaded
>>>>> component
>>>>> ptmalloc
>>>>> [atl1-01-mic0:190441] mca: base: components_open: component ptmalloc
>>>>> open function successful
>>>>> [atl1-01-mic0:190442] mca: base: components_open: opening memheap
>>>>> components
>>>>> [atl1-01-mic0:190442] mca: base: components_open: found loaded
>>>>> component
>>>>> buddy
>>>>> [atl1-01-mic0:190442] mca: base: components_open: component buddy
>>>>> open
>>>>> function successful
>>>>> [atl1-01-mic0:190442] mca: base: components_open: found loaded
>>>>> component
>>>>> ptmalloc
>>>>> [atl1-01-mic0:190442] mca: base: components_open: component ptmalloc
>>>>> open function successful
>>>>> [atl1-01-mic0:190442] base/memheap_base_alloc.c:38 -
>>>>> mca_memheap_base_alloc_init() Memheap alloc memory: 270532608
>>>>> byte(s), 1
>>>>> segments by method: 1
>>>>> [atl1-01-mic0:190441] base/memheap_base_alloc.c:38 -
>>>>> mca_memheap_base_alloc_init() Memheap alloc memory: 270532608
>>>>> byte(s), 1
>>>>> segments by method: 1
>>>>> [atl1-01-mic0:190442] base/memheap_base_static.c:205 -
>>>>> _load_segments()
>>>>> add: 00600000-00601000 rw-p 00000000 00:11
>>>>> 6029314 /home/ariebs/bench/hello/mic.out
>>>>> [atl1-01-mic0:190441] base/memheap_base_static.c:205 -
>>>>> _load_segments()
>>>>> add: 00600000-00601000 rw-p 00000000 00:11
>>>>> 6029314 /home/ariebs/bench/hello/mic.out
>>>>> [atl1-01-mic0:190442] base/memheap_base_static.c:75 -
>>>>> mca_memheap_base_static_init() Memheap static memory: 3824 byte(s), 2
>>>>> segments
>>>>> [atl1-01-mic0:190442] base/memheap_base_register.c:39 -
>>>>> mca_memheap_base_reg() register seg#00: 0x0xff000000 - 0x0x10f200000
>>>>> 270532608 bytes type=0x1 id=0xFFFFFFFF
>>>>> [atl1-01-mic0:190441] base/memheap_base_static.c:75 -
>>>>> mca_memheap_base_static_init() Memheap static memory: 3824 byte(s), 2
>>>>> segments
>>>>> [atl1-01-mic0:190441] base/memheap_base_register.c:39 -
>>>>> mca_memheap_base_reg() register seg#00: 0x0xff000000 - 0x0x10f200000
>>>>> 270532608 bytes type=0x1 id=0xFFFFFFFF
>>>>> [atl1-01-mic0:190442] Error base/memheap_base_register.c:130 -
>>>>> _reg_segment() Failed to register segment
>>>>> [atl1-01-mic0:190441] Error base/memheap_base_register.c:130 -
>>>>> _reg_segment() Failed to register segment
>>>>> [atl1-01-mic0:190442] Error: pshmem_init.c:61 - shmem_init() SHMEM
>>>>> failed to initialize - aborting
>>>>> [atl1-01-mic0:190441] Error: pshmem_init.c:61 - shmem_init() SHMEM
>>>>> failed to initialize - aborting
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like SHMEM_INIT failed for some reason; your parallel
>>>>> process
>>>>> is
>>>>> likely to abort. There are many reasons that a parallel process can
>>>>> fail during SHMEM_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems. This failure appears to be an internal failure; here's
>>>>> some
>>>>> additional information (which may only be relevant to an Open SHMEM
>>>>> developer):
>>>>>
>>>>> mca_memheap_base_select() failed
>>>>> --> Returned "Error" (-1) instead of "Success" (0)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> SHMEM_ABORT was invoked on rank 0 (pid 190441, host=atl1-01-mic0)
>>>>> with
>>>>> errorcode -1.
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> A SHMEM process is aborting at a time when it cannot guarantee that
>>>>> all
>>>>> of its peer processes in the job will be killed properly. You should
>>>>> double check that everything has shut down cleanly.
>>>>>
>>>>> Local host: atl1-01-mic0
>>>>> PID: 190441
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> -------------------------------------------------------
>>>>> Primary job terminated normally, but 1 process returned
>>>>> a non-zero exit code.. Per user-direction, the job has been aborted.
>>>>> -------------------------------------------------------
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:orted_cmd sending
>>>>> orted_exit commands
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> shmemrun detected that one or more processes exited with non-zero
>>>>> status, thus causing
>>>>> the job to be terminated. The first process to do so was:
>>>>>
>>>>> Process name: [[31875,1],0]
>>>>> Exit code: 255
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [atl1-01-mic0:190439] 1 more process has sent help message
>>>>> help-shmem-runtime.txt / shmem_init:startup:internal-failure
>>>>> [atl1-01-mic0:190439] Set MCA parameter "orte_base_help_aggregate"
>>>>> to 0
>>>>> to see all help / error messages
>>>>> [atl1-01-mic0:190439] 1 more process has sent help message
>>>>> help-shmem-api.txt / shmem-abort
>>>>> [atl1-01-mic0:190439] 1 more process has sent help message
>>>>> help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all
>>>>> killed
>>>>> [atl1-01-mic0:190439] [[31875,0],0] plm:base:receive stop comm
>>>>>
>>>>> On 04/12/2015 03:09 PM, Ralph Castain wrote:
>>>>>
>>>>> Sorry about that - I hadn't brought it over to the 1.8 branch yet.
>>>>> I've done so now, which means the ERROR_LOG shouldn't show up any
>>>>> more. It won't fix the memheap problem, though.
>>>>> You might try adding "--mca memheap_base_verbose 100" to your cmd
>>>>> line
>>>>> so we can see why none of the memheap components are being
>>>>> selected.
>>>>>
>>>>> On Apr 12, 2015, at 11:30 AM, Andy Riebs <andy.ri...@hp.com>
>>>>> <mailto:andy.ri...@hp.com> wrote:
>>>>> Hi Ralph,
>>>>>
>>>>> Here's the output with openmpi-v1.8.4-202-gc2da6a5.tar.bz2:
>>>>>
>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap --mca
>>>>> plm_base_verbose 5 $PWD/mic.out
>>>>> [atl1-01-mic0:190189] mca:base:select:( plm) Querying component
>>>>> [rsh]
>>>>> [atl1-01-mic0:190189] [[INVALID],INVALID] plm:rsh_lookup on agent
>>>>> ssh : rsh path NULL
>>>>> [atl1-01-mic0:190189] mca:base:select:( plm) Query of component
>>>>> [rsh] set priority to 10
>>>>> [atl1-01-mic0:190189] mca:base:select:( plm) Querying component
>>>>> [isolated]
>>>>> [atl1-01-mic0:190189] mca:base:select:( plm) Query of component
>>>>> [isolated] set priority to 0
>>>>> [atl1-01-mic0:190189] mca:base:select:( plm) Querying component
>>>>> [slurm]
>>>>> [atl1-01-mic0:190189] mca:base:select:( plm) Skipping component
>>>>> [slurm]. Query failed to return a module
>>>>> [atl1-01-mic0:190189] mca:base:select:( plm) Selected component
>>>>> [rsh]
>>>>> [atl1-01-mic0:190189] plm:base:set_hnp_name: initial bias 190189
>>>>> nodename hash 4121194178
>>>>> [atl1-01-mic0:190189] plm:base:set_hnp_name: final jobfam 32137
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:rsh_setup on agent ssh :
>>>>> rsh
>>>>> path NULL
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:receive start comm
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_job
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_vm
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_vm creating
>>>>> map
>>>>> [atl1-01-mic0:190189] [[32137,0],0] setup:vm: working unmanaged
>>>>> allocation
>>>>> [atl1-01-mic0:190189] [[32137,0],0] using dash_host
>>>>> [atl1-01-mic0:190189] [[32137,0],0] checking node atl1-01-mic0
>>>>> [atl1-01-mic0:190189] [[32137,0],0] ignoring myself
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:setup_vm only HNP in
>>>>> allocation
>>>>> [atl1-01-mic0:190189] [[32137,0],0] complete_setup on job
>>>>> [32137,1]
>>>>> [atl1-01-mic0:190189] [[32137,0],0] ORTE_ERROR_LOG: Not found in
>>>>> file base/plm_base_launch_support.c at line 440
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch_apps for job
>>>>> [32137,1]
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch wiring up iof
>>>>> for job [32137,1]
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch [32137,1]
>>>>> registered
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:launch job
>>>>> [32137,1] is
>>>>> not a dynamic spawn
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like SHMEM_INIT failed for some reason; your parallel
>>>>> process is
>>>>> likely to abort. There are many reasons that a parallel process
>>>>> can
>>>>> fail during SHMEM_INIT; some of which are due to configuration or
>>>>> environment
>>>>> problems. This failure appears to be an internal failure; here's
>>>>> some
>>>>> additional information (which may only be relevant to an Open
>>>>> SHMEM
>>>>> developer):
>>>>>
>>>>> mca_memheap_base_select() failed
>>>>> --> Returned "Error" (-1) instead of "Success" (0)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [atl1-01-mic0:190191] Error: pshmem_init.c:61 - shmem_init()
>>>>> SHMEM
>>>>> failed to initialize - aborting
>>>>> [atl1-01-mic0:190192] Error: pshmem_init.c:61 - shmem_init()
>>>>> SHMEM
>>>>> failed to initialize - aborting
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> SHMEM_ABORT was invoked on rank 1 (pid 190192, host=atl1-01-mic0)
>>>>> with errorcode -1.
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> A SHMEM process is aborting at a time when it cannot guarantee
>>>>> that
>>>>> all
>>>>> of its peer processes in the job will be killed properly. You
>>>>> should
>>>>> double check that everything has shut down cleanly.
>>>>>
>>>>> Local host: atl1-01-mic0
>>>>> PID: 190192
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> -------------------------------------------------------
>>>>> Primary job terminated normally, but 1 process returned
>>>>> a non-zero exit code.. Per user-direction, the job has been
>>>>> aborted.
>>>>> -------------------------------------------------------
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:orted_cmd sending
>>>>> orted_exit commands
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> shmemrun detected that one or more processes exited with non-zero
>>>>> status, thus causing
>>>>> the job to be terminated. The first process to do so was:
>>>>>
>>>>> Process name: [[32137,1],0]
>>>>> Exit code: 255
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [atl1-01-mic0:190189] 1 more process has sent help message
>>>>> help-shmem-runtime.txt / shmem_init:startup:internal-failure
>>>>> [atl1-01-mic0:190189] Set MCA parameter
>>>>> "orte_base_help_aggregate"
>>>>> to 0 to see all help / error messages
>>>>> [atl1-01-mic0:190189] 1 more process has sent help message
>>>>> help-shmem-api.txt / shmem-abort
>>>>> [atl1-01-mic0:190189] 1 more process has sent help message
>>>>> help-shmem-runtime.txt / oshmem shmem abort:cannot guarantee all
>>>>> killed
>>>>> [atl1-01-mic0:190189] [[32137,0],0] plm:base:receive stop comm
>>>>>
>>>>> On 04/11/2015 07:41 PM, Ralph Castain wrote:
>>>>>
>>>>> Got it - thanks. I fixed that ERROR_LOG issue (I think- please
>>>>> verify). I suspect the memheap issue relates to something else,
>>>>> but I probably need to let the OSHMEM folks comment on it
>>>>>
>>>>> On Apr 11, 2015, at 9:52 AM, Andy Riebs <andy.ri...@hp.com>
>>>>> <mailto:andy.ri...@hp.com>
>>>>> wrote:
>>>>> Everything is built on the Xeon side, with the icc "-mmic"
>>>>> switch. I then ssh into one of the PHIs, and run shmemrun
>>>>> from
>>>>> there.
>>>>>
>>>>> On 04/11/2015 12:00 PM, Ralph Castain wrote:
>>>>>
>>>>> Let me try to understand the setup a little better. Are you
>>>>> running shmemrun on the PHI itself? Or is it running on the
>>>>> host processor, and you are trying to spawn a process onto
>>>>> the
>>>>> Phi?
>>>>>
>>>>> On Apr 11, 2015, at 7:55 AM, Andy Riebs
>>>>> <andy.ri...@hp.com> <mailto:andy.ri...@hp.com>
>>>>> wrote:
>>>>> Hi Ralph,
>>>>>
>>>>> Yes, this is attempting to get OSHMEM to run on the Phi.
>>>>>
>>>>> I grabbed openmpi-dev-1484-g033418f.tar.bz2 and
>>>>> configured
>>>>> it with
>>>>>
>>>>> $ ./configure --prefix=/home/ariebs/mic/mpi-nightly
>>>>> CC=icc -mmic CXX=icpc -mmic \
>>>>> --build=x86_64-unknown-linux-gnu
>>>>> --host=x86_64-k1om-linux \
>>>>> AR=x86_64-k1om-linux-ar
>>>>> RANLIB=x86_64-k1om-linux-ranlib LD=x86_64-k1om-linux-ld
>>>>> \
>>>>> --enable-mpirun-prefix-by-default
>>>>> --disable-io-romio --disable-mpi-fortran \
>>>>> --enable-debug
>>>>>
>>>>> --enable-mca-no-build=btl-usnic,btl-openib,common-verbs,oob-ud
>>>>>
>>>>> (Note that I had to add "oob-ud" to the
>>>>> "--enable-mca-no-build" option, as the build complained
>>>>> that
>>>>> mca oob/ud needed mca common-verbs.)
>>>>>
>>>>> With that configuration, here is what I am seeing now...
>>>>>
>>>>> $ export SHMEM_SYMMETRIC_HEAP_SIZE=1G
>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap --mca
>>>>> plm_base_verbose 5 $PWD/mic.out
>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Querying
>>>>> component [rsh]
>>>>> [atl1-01-mic0:189895] [[INVALID],INVALID] plm:rsh_lookup
>>>>> on
>>>>> agent ssh : rsh path NULL
>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Query of
>>>>> component [rsh] set priority to 10
>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Querying
>>>>> component [isolated]
>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Query of
>>>>> component [isolated] set priority to 0
>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Querying
>>>>> component [slurm]
>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Skipping
>>>>> component [slurm]. Query failed to return a module
>>>>> [atl1-01-mic0:189895] mca:base:select:( plm) Selected
>>>>> component [rsh]
>>>>> [atl1-01-mic0:189895] plm:base:set_hnp_name: initial bias
>>>>> 189895 nodename hash 4121194178
>>>>> [atl1-01-mic0:189895] plm:base:set_hnp_name: final jobfam
>>>>> 32419
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:rsh_setup on
>>>>> agent
>>>>> ssh : rsh path NULL
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:receive
>>>>> start
>>>>> comm
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_job
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_vm
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_vm
>>>>> creating map
>>>>> [atl1-01-mic0:189895] [[32419,0],0] setup:vm: working
>>>>> unmanaged allocation
>>>>> [atl1-01-mic0:189895] [[32419,0],0] using dash_host
>>>>> [atl1-01-mic0:189895] [[32419,0],0] checking node
>>>>> atl1-01-mic0
>>>>> [atl1-01-mic0:189895] [[32419,0],0] ignoring myself
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:setup_vm
>>>>> only
>>>>> HNP in allocation
>>>>> [atl1-01-mic0:189895] [[32419,0],0] complete_setup on job
>>>>> [32419,1]
>>>>> [atl1-01-mic0:189895] [[32419,0],0] ORTE_ERROR_LOG: Not
>>>>> found in file base/plm_base_launch_support.c at line 440
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch_apps
>>>>> for
>>>>> job [32419,1]
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch
>>>>> wiring
>>>>> up iof for job [32419,1]
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch
>>>>> [32419,1] registered
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:launch job
>>>>> [32419,1] is not a dynamic spawn
>>>>> [atl1-01-mic0:189899] Error: pshmem_init.c:61 -
>>>>> shmem_init()
>>>>> SHMEM failed to initialize - aborting
>>>>> [atl1-01-mic0:189898] Error: pshmem_init.c:61 -
>>>>> shmem_init()
>>>>> SHMEM failed to initialize - aborting
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like SHMEM_INIT failed for some reason; your
>>>>> parallel process is
>>>>> likely to abort. There are many reasons that a parallel
>>>>> process can
>>>>> fail during SHMEM_INIT; some of which are due to
>>>>> configuration or environment
>>>>> problems. This failure appears to be an internal
>>>>> failure;
>>>>> here's some
>>>>> additional information (which may only be relevant to an
>>>>> Open SHMEM
>>>>> developer):
>>>>>
>>>>> mca_memheap_base_select() failed
>>>>> --> Returned "Error" (-1) instead of "Success" (0)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> SHMEM_ABORT was invoked on rank 1 (pid 189899,
>>>>> host=atl1-01-mic0) with errorcode -1.
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> A SHMEM process is aborting at a time when it cannot
>>>>> guarantee that all
>>>>> of its peer processes in the job will be killed
>>>>> properly.
>>>>> You should
>>>>> double check that everything has shut down cleanly.
>>>>>
>>>>> Local host: atl1-01-mic0
>>>>> PID: 189899
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> -------------------------------------------------------
>>>>> Primary job terminated normally, but 1 process returned
>>>>> a non-zero exit code.. Per user-direction, the job has
>>>>> been
>>>>> aborted.
>>>>> -------------------------------------------------------
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:orted_cmd
>>>>> sending orted_exit commands
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> shmemrun detected that one or more processes exited with
>>>>> non-zero status, thus causing
>>>>> the job to be terminated. The first process to do so was:
>>>>>
>>>>> Process name: [[32419,1],1]
>>>>> Exit code: 255
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> [atl1-01-mic0:189895] 1 more process has sent help
>>>>> message
>>>>> help-shmem-runtime.txt /
>>>>> shmem_init:startup:internal-failure
>>>>> [atl1-01-mic0:189895] Set MCA parameter
>>>>> "orte_base_help_aggregate" to 0 to see all help / error
>>>>> messages
>>>>> [atl1-01-mic0:189895] 1 more process has sent help
>>>>> message
>>>>> help-shmem-api.txt / shmem-abort
>>>>> [atl1-01-mic0:189895] 1 more process has sent help
>>>>> message
>>>>> help-shmem-runtime.txt / oshmem shmem abort:cannot
>>>>> guarantee
>>>>> all killed
>>>>> [atl1-01-mic0:189895] [[32419,0],0] plm:base:receive stop
>>>>> comm
>>>>>
>>>>> On 04/10/2015 06:37 PM, Ralph Castain wrote:
>>>>>
>>>>> Andy - could you please try the current 1.8.5 nightly
>>>>> tarball and see if it helps? The error log indicates
>>>>> that
>>>>> it is failing to get the topology from some daemon,
>>>>> I**m
>>>>> assuming the one on the Phi?
>>>>> You might also add **enable-debug to that configure
>>>>> line
>>>>> and then put -mca plm_base_verbose on the shmemrun cmd
>>>>> to
>>>>> get more help
>>>>>
>>>>> On Apr 10, 2015, at 11:55 AM, Andy Riebs
>>>>> <andy.ri...@hp.com> <mailto:andy.ri...@hp.com> wrote:
>>>>> Summary: MPI jobs work fine, SHMEM jobs work just
>>>>> often
>>>>> enough to be tantalizing, on an Intel Xeon Phi/MIC
>>>>> system.
>>>>>
>>>>> Longer version
>>>>>
>>>>> Thanks to the excellent write-up last June
>>>>>
>>>>> (<https://www.open-mpi.org/community/lists/users/2014/06/24711.php>
>>>>> <https://www.open-mpi.org/community/lists/users/2014/06/24711.php>),
>>>>> I have been able to build a version of Open MPI for
>>>>> the
>>>>> Xeon Phi coprocessor that runs MPI jobs on the Phi
>>>>> coprocessor with no problem, but not SHMEM jobs.
>>>>> Just
>>>>> at the point where I was about to document the
>>>>> problems
>>>>> I was having with SHMEM, my trivial SHMEM job worked.
>>>>> And then failed when I tried to run it again,
>>>>> immediately afterwards. I have a feeling I may be in
>>>>> uncharted territory here.
>>>>>
>>>>> Environment
>>>>> * RHEL 6.5
>>>>> * Intel Composer XE 2015
>>>>> * Xeon Phi/MIC
>>>>> ----------------
>>>>>
>>>>> Configuration
>>>>>
>>>>> $ export PATH=/usr/linux-k1om-4.7/bin/:$PATH
>>>>> $ source
>>>>> /opt/intel/15.0/composer_xe_2015/bin/compilervars.sh
>>>>> intel64
>>>>> $ ./configure --prefix=/home/ariebs/mic/mpi \
>>>>> CC="icc -mmic" CXX="icpc -mmic" \
>>>>> --build=x86_64-unknown-linux-gnu
>>>>> --host=x86_64-k1om-linux \
>>>>> AR=x86_64-k1om-linux-ar
>>>>> RANLIB=x86_64-k1om-linux-ranlib \
>>>>> LD=x86_64-k1om-linux-ld \
>>>>> --enable-mpirun-prefix-by-default
>>>>> --disable-io-romio
>>>>> \
>>>>> --disable-vt --disable-mpi-fortran \
>>>>>
>>>>>
>>>>> --enable-mca-no-build=btl-usnic,btl-openib,common-verbs
>>>>> $ make
>>>>> $ make install
>>>>>
>>>>> ----------------
>>>>>
>>>>> Test program
>>>>>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>> #include <shmem.h>
>>>>> int main(int argc, char **argv)
>>>>> {
>>>>> int me, num_pe;
>>>>> shmem_init();
>>>>> num_pe = num_pes();
>>>>> me = my_pe();
>>>>> printf("Hello World from process %ld of
>>>>> %ld\n",
>>>>> me, num_pe);
>>>>> exit(0);
>>>>> }
>>>>>
>>>>> ----------------
>>>>>
>>>>> Building the program
>>>>>
>>>>> export PATH=/home/ariebs/mic/mpi/bin:$PATH
>>>>> export PATH=/usr/linux-k1om-4.7/bin/:$PATH
>>>>> source
>>>>> /opt/intel/15.0/composer_xe_2015/bin/compilervars.sh
>>>>> intel64
>>>>> export
>>>>>
>>>>> LD_LIBRARY_PATH=/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic:$LD_LIBRARY_PATH
>>>>>
>>>>> icc -mmic -std=gnu99 -I/home/ariebs/mic/mpi/include
>>>>> -pthread \
>>>>> -Wl,-rpath -Wl,/home/ariebs/mic/mpi/lib
>>>>> -Wl,--enable-new-dtags \
>>>>> -L/home/ariebs/mic/mpi/lib -loshmem -lmpi
>>>>> -lopen-rte -lopen-pal \
>>>>> -lm -ldl -lutil \
>>>>> -Wl,-rpath
>>>>>
>>>>> -Wl,/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic
>>>>> \
>>>>>
>>>>>
>>>>> -L/opt/intel/15.0/composer_xe_2015.2.164/compiler/lib/mic
>>>>> \
>>>>> -o mic.out shmem_hello.c
>>>>>
>>>>> ----------------
>>>>>
>>>>> Running the program
>>>>>
>>>>> (Note that the program had been consistently failing.
>>>>> Then, when I logged back into the system to capture
>>>>> the
>>>>> results, it worked once, and then immediately failed
>>>>> when I tried again, as shown below. Logging in and
>>>>> out
>>>>> isn't sufficient to correct the problem. Overall, I
>>>>> think I had 3 successful runs in 30-40 attempts.)
>>>>>
>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap
>>>>> ./mic.out
>>>>> [atl1-01-mic0:189372] [[30936,0],0] ORTE_ERROR_LOG:
>>>>> Not
>>>>> found in file base/plm_base_launch_support.c at line
>>>>> 426
>>>>> Hello World from process 0 of 2
>>>>> Hello World from process 1 of 2
>>>>> $ shmemrun -H localhost -N 2 --mca sshmem mmap
>>>>> ./mic.out
>>>>> [atl1-01-mic0:189381] [[30881,0],0] ORTE_ERROR_LOG:
>>>>> Not
>>>>> found in file base/plm_base_launch_support.c at line
>>>>> 426
>>>>> [atl1-01-mic0:189383] Error: pshmem_init.c:61 -
>>>>> shmem_init() SHMEM failed to initialize - aborting
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> It looks like SHMEM_INIT failed for some reason; your
>>>>> parallel process is
>>>>> likely to abort. There are many reasons that a
>>>>> parallel
>>>>> process can
>>>>> fail during SHMEM_INIT; some of which are due to
>>>>> configuration or environment
>>>>> problems. This failure appears to be an internal
>>>>> failure; here's some
>>>>> additional information (which may only be relevant
>>>>> to an
>>>>> Open SHMEM
>>>>> developer):
>>>>>
>>>>> mca_memheap_base_select() failed
>>>>> --> Returned "Error" (-1) instead of "Success" (0)
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> SHMEM_ABORT was invoked on rank 0 (pid 189383,
>>>>> host=atl1-01-mic0) with errorcode -1.
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> A SHMEM process is aborting at a time when it cannot
>>>>> guarantee that all
>>>>> of its peer processes in the job will be killed
>>>>> properly. You should
>>>>> double check that everything has shut down cleanly.
>>>>>
>>>>> Local host: atl1-01-mic0
>>>>> PID: 189383
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> -------------------------------------------------------
>>>>> Primary job terminated normally, but 1 process
>>>>> returned
>>>>> a non-zero exit code.. Per user-direction, the job
>>>>> has
>>>>> been aborted.
>>>>>
>>>>> -------------------------------------------------------
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>> shmemrun detected that one or more processes exited
>>>>> with
>>>>> non-zero status, thus causing
>>>>> the job to be terminated. The first process to do so
>>>>> was:
>>>>>
>>>>> Process name: [[30881,1],0]
>>>>> Exit code: 255
>>>>>
>>>>> --------------------------------------------------------------------------
>>>>>
>>>>> Any thoughts about where to go from here?
>>>>>
>>>>> Andy
>>>>>
>>>>> --
>>>>> Andy Riebs
>>>>> Hewlett-Packard Company
>>>>> High Performance Computing
>>>>> +1 404 648 9024
>>>>> My opinions are not necessarily those of HP
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription:
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>>
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26670.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26670.php>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26676.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26676.php>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription:
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>>
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26678.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26678.php>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26679.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26679.php>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription:
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>>
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26680.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26680.php>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26682.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26682.php>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26683.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26683.php>
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26684.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26684.php>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>>> Link to this post:
>>>>> http://www.open-mpi.org/community/lists/users/2015/04/26697.php
>>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26697.php>
>>>>
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>>> Link to this post:
>>>> http://www.open-mpi.org/community/lists/users/2015/04/26699.php
>>>> <http://www.open-mpi.org/community/lists/users/2015/04/26699.php>
>>> _______________________________________________
>>> users mailing list
>>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>>> Link to this post:
>>> http://www.open-mpi.org/community/lists/users/2015/04/26700.php
>>> <http://www.open-mpi.org/community/lists/users/2015/04/26700.php>
>>
>>
>> _______________________________________________
>> users mailing list
>> us...@open-mpi.org <mailto:us...@open-mpi.org>
>> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
>> <http://www.open-mpi.org/mailman/listinfo.cgi/users>
>> Link to this post:
>> http://www.open-mpi.org/community/lists/users/2015/04/26706.php
>> <http://www.open-mpi.org/community/lists/users/2015/04/26706.php>
> _______________________________________________
> users mailing list
> us...@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post:
> http://www.open-mpi.org/community/lists/users/2015/04/26716.php