Adding a verbose output. Please check for failed and advise. Thank you. [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca oob_base_verbose 100 --mca btl tcp,self ring_c [ipv-rhel73:10575] mca_base_component_repository_open: unable to open mca_plm_tm: libtorque.so.2: cannot open shared object file: No such file or directory (ignored) [ipv-rhel73:10575] mca: base: components_register: registering framework oob components [ipv-rhel73:10575] mca: base: components_register: found loaded component tcp [ipv-rhel73:10575] mca: base: components_register: component tcp register function successful [ipv-rhel73:10575] mca: base: components_open: opening oob components [ipv-rhel73:10575] mca: base: components_open: found loaded component tcp [ipv-rhel73:10575] mca: base: components_open: component tcp open function successful [ipv-rhel73:10575] mca:oob:select: checking available component tcp [ipv-rhel73:10575] mca:oob:select: Querying component [tcp] [ipv-rhel73:10575] oob:tcp: component_available called [ipv-rhel73:10575] WORKING INTERFACE 1 KERNEL INDEX 2 FAMILY: V6 [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init adding fe80::b9b:ac5d:9cf0:b858 to our list of V6 connections [ipv-rhel73:10575] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4 [ipv-rhel73:10575] [[20058,0],0] oob:tcp:init rejecting loopback interface lo [ipv-rhel73:10575] WORKING INTERFACE 3 KERNEL INDEX 4 FAMILY: V4 [ipv-rhel73:10575] [[20058,0],0] TCP STARTUP [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv4 port 0 [ipv-rhel73:10575] [[20058,0],0] assigned IPv4 port 53438 [ipv-rhel73:10575] [[20058,0],0] attempting to bind to IPv6 port 0 [ipv-rhel73:10575] [[20058,0],0] assigned IPv6 port 43370 [ipv-rhel73:10575] mca:oob:select: Adding component to end [ipv-rhel73:10575] mca:oob:select: Found 1 active transports [ipv-rhel73:10575] [[20058,0],0]: get transports [ipv-rhel73:10575] [[20058,0],0]:get transports for component tcp [ipv-rhel73:10575] mca_base_component_repository_open: unable to open mca_ras_tm: libtorque.so.2: cannot open shared object file: No such file or directory (ignored) [ipv-rhel71a.locallab.local:12299] mca: base: components_register: registering framework oob components [ipv-rhel71a.locallab.local:12299] mca: base: components_register: found loaded component tcp [ipv-rhel71a.locallab.local:12299] mca: base: components_register: component tcp register function successful [ipv-rhel71a.locallab.local:12299] mca: base: components_open: opening oob components [ipv-rhel71a.locallab.local:12299] mca: base: components_open: found loaded component tcp [ipv-rhel71a.locallab.local:12299] mca: base: components_open: component tcp open function successful [ipv-rhel71a.locallab.local:12299] mca:oob:select: checking available component tcp [ipv-rhel71a.locallab.local:12299] mca:oob:select: Querying component [tcp] [ipv-rhel71a.locallab.local:12299] oob:tcp: component_available called [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 1 KERNEL INDEX 2 FAMILY: V6 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init adding fe80::226:b9ff:fe85:6a28 to our list of V6 connections [ipv-rhel71a.locallab.local:12299] WORKING INTERFACE 2 KERNEL INDEX 1 FAMILY: V4 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:init rejecting loopback interface lo [ipv-rhel71a.locallab.local:12299] [[20058,0],1] TCP STARTUP [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to IPv4 port 0 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv4 port 50782 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] attempting to bind to IPv6 port 0 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] assigned IPv6 port 59268 [ipv-rhel71a.locallab.local:12299] mca:oob:select: Adding component to end [ipv-rhel71a.locallab.local:12299] mca:oob:select: Found 1 active transports [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: get transports [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:get transports for component tcp [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: set_addr to uri 1314521088.0;tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:set_addr checking if peer [[20058,0],0] is reachable via component tcp [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp: working peer [[20058,0],0] address tcp6://[fe80::b9b:ac5d:9cf0:b858]:43370 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SET_PEER ADDING PEER [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1] set_peer: peer [[20058,0],0] is listening on net fe80::b9b:ac5d:9cf0:b858 port 43370 [ipv-rhel71a.locallab.local:12299] [[20058,0],1]: peer [[20058,0],0] is reachable via component tcp [ipv-rhel71a.locallab.local:12299] [[20058,0],1] OOB_SEND: rml_oob_send.c:265 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:base:send to target [[20058,0],0] - attempt 0 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] oob:tcp:send_nb to peer [[20058,0],0]:10 seq = -1 [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:204] processing send to peer [[20058,0],0]:10 seq_num = -1 via [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:225] queue pending to [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:send_nb: initiating connection to [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp.c:239] connect to [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on socket 20 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on (null):-1 - 0 retries [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: Connection to proc [[20058,0],0] succeeded [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 bytes to socket 20 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_send_blocking: send() to socket 20 failed: Broken pipe (32) [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp_peer_close for [[20058,0],0] sd 20 state FAILED [ipv-rhel71a.locallab.local:12299] [[20058,0],1]:[oob_tcp_connection.c:356] connect to [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1] tcp:lost connection called for peer [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on socket 20 [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: attempting to connect to proc [[20058,0],0] on (null):-1 - 0 retries [ipv-rhel71a.locallab.local:12299] [[20058,0],1] orte_tcp_peer_try_connect: Connection to proc [[20058,0],0] succeeded [ipv-rhel71a.locallab.local:12299] [[20058,0],1] SEND CONNECT ACK [ipv-rhel71a.locallab.local:12299] [[20058,0],1] send blocking of 72 bytes to socket 20 -------------------------------------------------------------------------- ORTE was unable to reliably start one or more daemons. This usually is caused by:
* not finding the required libraries and/or binaries on one or more nodes. Please check your PATH and LD_LIBRARY_PATH settings, or configure OMPI with --enable-orterun-prefix-by-default * lack of authority to execute on one or more specified nodes. Please verify your allocation and authorities. * the inability to write startup files into /tmp (--tmpdir/orte_tmpdir_base). Please check with your sys admin to determine the correct location to use. * compilation of the orted with dynamic libraries when static are required (e.g., on Cray). Please check your configure cmd line and consider using one of the contrib/platform definitions for your system type. * an inability to create a connection back to mpirun due to a lack of common network interfaces and/or no route found between them. Please check network connectivity (including firewalls and network routing requirements). -------------------------------------------------------------------------- [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN [ipv-rhel73:10575] [[20058,0],0] TCP SHUTDOWN done [ipv-rhel73:10575] mca: base: close: component tcp closed [ipv-rhel73:10575] mca: base: close: unloading component tcp Cordially, Muku. On Wed, Oct 18, 2017 at 11:18 AM, Mukkie <mukunthh...@gmail.com> wrote: > Hi, > > I have two ipv6 only machines, I configured/built OMPI version 3.0 with - > -enable-ipv6 > > I want to verify a simple MPI communication call through tcp ip between > these two machines. I am using ring_c and connectivity_c examples. > > > > Issuing from one of the host machine… > > [mselvam@ipv-rhel73 examples]$ mpirun -hostfile host --mca btl tcp,self > --mca oob_base_verbose 100 ring_c > > . > . > > [ipv-rhel71a.locallab.local:10822] [[5331,0],1] tcp_peer_send_blocking: > send() to socket 20 failed: Broken pipe (32) > > > where “host” contains the ipv6 address of the remote machine (namely – > ‘ipv-rhel71a’). Also I have passwordless ssh setup to the remote machine. > > > > I will attach a verbose output in the follow-up post. > > Thanks. > > > > Cordially, > > > > *Mukundhan Selvam* > > Development Engineer, HPC > > [image: MSC Software] <http://www.mscsoftware.com/> > > 4675 MacArthur Court, Newport Beach, CA 92660 > > 714-540-8900 <(714)%20540-8900> ext. 4166 >
_______________________________________________ users mailing list users@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/users