List-Post: users@lists.open-mpi.org
Date: Fri, 31 Oct 2008 09:34:52 -0600
From: Ralph Castain <r...@lanl.gov>
Subject: Re: [OMPI users] users Digest, Vol 1052, Issue 1
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <0cf28492-b13e-4f82-ac43-c1580f079...@lanl.gov>
Content-Type: text/plain; charset="us-ascii"; Format="flowed";
        DelSp="yes"

It looks like the daemon isn't seeing the other interface address on host x2. Can you ssh to x2 and send the contents of ifconfig -a?

Ralph

On Oct 31, 2008, at 9:18 AM, Allan Menezes wrote:


users-requ...@open-mpi.org wrote:
Send users mailing list submissions to
        us...@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
        http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
        users-requ...@open-mpi.org

You can reach the person managing the list at
        users-ow...@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Openmpi ver1.3beta1 (Allan Menezes)
  2. Re: Openmpi ver1.3beta1 (Ralph Castain)
  3. Re: Equivalent .h files (Benjamin Lamptey)
  4. Re: Equivalent .h files (Jeff Squyres)
  5. ompi-checkpoint is hanging (Matthias Hovestadt)
  6. unsubscibe (Bertrand P. S. Russell)
  7. Re: ompi-checkpoint is hanging (Tim Mattox)


----------------------------------------------------------------------

Message: 1
Date: Fri, 31 Oct 2008 02:06:09 -0400
From: Allan Menezes <amenezes...@sympatico.ca>
Subject: [OMPI users] Openmpi ver1.3beta1
To: us...@open-mpi.org
Message-ID: <blu0-smtp224b5e356302ac7aa4481088...@phx.gbl>
Content-Type: text/plain; charset=ISO-8859-1; format=flowed

Hi,
   I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads
--with-threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from the
head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1
printing out the hostname of x1
But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and does
not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express
gigabit ethernet for eth0
Can somebody advise?
Thank you very much.
Allan Menezes


------------------------------

Message: 2
Date: Fri, 31 Oct 2008 02:41:59 -0600
From: Ralph Castain <r...@lanl.gov>
Subject: Re: [OMPI users] Openmpi ver1.3beta1
To: Open MPI Users <us...@open-mpi.org>
Message-ID: <e8af5aaf-99cb-4efc-aa97-5385ce333...@lanl.gov>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

When you typed the --host x1 command, were you sitting on x1?
Likewise, when you typed the --host x2 command, were you not on host x2?

If the answer to both questions is "yes", then my guess is that
something is preventing you from launching a daemon on host x2. Try
adding --leave-session-attached to your cmd line and see if any error
messages appear. And check the FAQ for tips on how to setup for ssh
launch (I'm assuming that is what you are using).

http://www.open-mpi.org/faq/?category=rsh

Ralph

On Oct 31, 2008, at 12:06 AM, Allan Menezes wrote:


Hi Ralph,
Yes that is true I tried both commands on x1 and ver 1.28 works on the same setup without a problem.
Here is the output with the added
--leave-session-attached
[allan@x1 ~]$ mpiexec --prefix /opt/openmpi13b2 --leave-session- attached -host x2 hostname [x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0] mca_oob_tcp_peer_try_connect: connect to 192.168.0.198:0 failed: Network is unreachable (101) [x2.brampton.net:02236] [[1354,0],1]-[[1354,0],0] mca_oob_tcp_peer_try_connect: connect to 192.168.122.1:0 failed: Network is unreachable (101) [x2.brampton.net:02236] [[1354,0],1] routed:binomial: Connection to lifeline [[1354,0],0] lost
--------------------------------------------------------------------------
A daemon (pid 7665) died unexpectedly with status 1 while attempting
to launch so we are aborting.

There may be more information reported by the environment (see above).

This may be because the daemon was unable to find all the needed shared libraries on the remote node. You may set your LD_LIBRARY_PATH to have the
location of the shared libraries on the remote nodes and this will
automatically be forwarded to the remote nodes.
--------------------------------------------------------------------------
--------------------------------------------------------------------------
mpiexec noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
mpiexec: clean termination accomplished

[allan@x1 ~]$
However my main eth0 IP is 192.168.1.1 and internet gate way is 192.168.0.1
Any solutions?
Allan Menezes



Hi,
 I built open mpi version 1.3b1 withe following cofigure command:
./configure --prefix=/opt/openmpi13b1 --enable-mpi-threads --with-
threads=posix --disable-ipv6
I have six nodes x1..6
I distributed the /opt/openmpi13b1 with scp to all other nodes from
the head node
When i run the following command:
mpirun --prefix /opt/openmpi13b1  --host x1 hostname it works on x1
printing out the hostname of x1
But when i type
mpirun --prefix /opt/openmpi13b1 --host x2 hostname it hangs and
does not give me any output
I have a 6 node intel quad core cluster with OSCAR and pci express
gigabit ethernet for eth0
Can somebody advise?
Thank you very much.
Allan Menezes
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Hi Ralph,
It works for openmpi version 1.28 why should it not work for version 1.3?
Yes I can ssh to x2 from x1 and x1 from x2.
Here if the ifconfig -a for x1:
eth0 Link encap:Ethernet HWaddr 00:1B:21:02:89:DA inet addr:192.168.1.1 Bcast:192.168.1.255 Mask:255.255.255.0
         inet6 addr: fe80::21b:21ff:fe02:89da/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:44906 errors:0 dropped:0 overruns:0 frame:0
         TX packets:77644 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:3309896 (3.1 MiB)  TX bytes:101134505 (96.4 MiB)
         Memory:feae0000-feb00000

eth1 Link encap:Ethernet HWaddr 00:0E:0C:BC:AB:6D inet addr:192.168.3.1 Bcast:192.168.3.255 Mask:255.255.255.0
         inet6 addr: fe80::20e:cff:febc:ab6d/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:124 errors:0 dropped:0 overruns:0 frame:0
         TX packets:133 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:7440 (7.2 KiB)  TX bytes:10027 (9.7 KiB)

eth2 Link encap:Ethernet HWaddr 00:1B:FC:A0:A7:92 inet addr:192.168.7.1 Bcast:192.168.7.255 Mask:255.255.255.0
         inet6 addr: fe80::21b:fcff:fea0:a792/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:159 errors:0 dropped:0 overruns:0 frame:0
         TX packets:158 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:10902 (10.6 KiB)  TX bytes:13691 (13.3 KiB)
         Interrupt:17

eth4 Link encap:Ethernet HWaddr 00:0E:0C:B9:50:A3 inet addr:192.168.0.198 Bcast:192.168.0.255 Mask:255.255.255.0
         inet6 addr: fe80::20e:cff:feb9:50a3/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:25111 errors:0 dropped:0 overruns:0 frame:0
         TX packets:11633 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:24133775 (23.0 MiB)  TX bytes:833868 (814.3 KiB)

lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:16436  Metric:1
         RX packets:28973 errors:0 dropped:0 overruns:0 frame:0
         TX packets:28973 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:1223211 (1.1 MiB)  TX bytes:1223211 (1.1 MiB)

pan0 Link encap:Ethernet HWaddr CA:00:CE:02:90:90 BROADCAST MULTICAST MTU:1500 Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

virbr0 Link encap:Ethernet HWaddr EA:6D:E7:85:8D:E7 inet addr:192.168.122.1 Bcast:192.168.122.255 Mask:255.255.255.0
         inet6 addr: fe80::e86d:e7ff:fe85:8de7/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:30 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:5083 (4.9 KiB)

Here is the ifconfig -a for x2:
eth0 Link encap:Ethernet HWaddr 00:1B:21:02:DE:E9 inet addr:192.168.1.2 Bcast:192.168.1.255 Mask:255.255.255.0
         inet6 addr: fe80::21b:21ff:fe02:dee9/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:565 errors:0 dropped:0 overruns:0 frame:0
         TX packets:565 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:181079 (176.8 KiB)  TX bytes:106650 (104.1 KiB)
         Memory:feae0000-feb00000

eth1 Link encap:Ethernet HWaddr 00:0E:0C:BC:B1:7D inet addr:192.168.3.2 Bcast:192.168.3.255 Mask:255.255.255.0
         inet6 addr: fe80::20e:cff:febc:b17d/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:11 errors:0 dropped:0 overruns:0 frame:0
         TX packets:20 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:660 (660.0 b)  TX bytes:1136 (1.1 KiB)

eth2 Link encap:Ethernet HWaddr 00:1F:C6:27:1C:79 inet addr:192.168.7.2 Bcast:192.168.7.255 Mask:255.255.255.0
         inet6 addr: fe80::21f:c6ff:fe27:1c79/64 Scope:Link
         UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
         RX packets:11 errors:0 dropped:0 overruns:0 frame:0
         TX packets:19 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:1000
         RX bytes:506 (506.0 b)  TX bytes:1094 (1.0 KiB)
         Interrupt:17

lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0
         inet6 addr: ::1/128 Scope:Host
         UP LOOPBACK RUNNING  MTU:16436  Metric:1
         RX packets:1604 errors:0 dropped:0 overruns:0 frame:0
         TX packets:1604 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:140216 (136.9 KiB)  TX bytes:140216 (136.9 KiB)

sit0 Link encap:IPv6-in-IPv4 NOARP MTU:1480 Metric:1
         RX packets:0 errors:0 dropped:0 overruns:0 frame:0
         TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
         collisions:0 txqueuelen:0
         RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

Any help would be appreciated!
Allan Menezes

Reply via email to