Re: [OMPI users] users Digest, Vol 2881, Issue 4

2014-05-07 Thread Gus Correa

On 05/06/2014 09:49 PM, Ralph Castain wrote:


On May 6, 2014, at 6:24 PM, Clay Kirkland > wrote:


 Got it to work finally.  The longer line doesn't work.
192.168.0.0/1
But if I take off the -mca oob_tcp_if_include 192.168.0.0/16
 part then everything works from
every combination of machines I have.


Interesting - I'm surprised, but glad it worked



Could it be perhaps 192.168.0.0/24 (instead of /16)?
The ifconfig output says the netmask is 255.255.255.0.



And as to any MPI having trouble, in my original posting I stated that
I installed lam mpi
on the same hardware and it worked just fine.   Maybe you guys should
look at what they
do and copy it.   Virtually every machine I have used in the last 5
years has multiple nic
interfaces and almost all of them are set up to use only 1
interface.   It seems odd to have
a product that is designed to lash together multiple machines and have
it fail with default
install on generic machines.


Actually, we are the "lam mpi" guys :-)

There clearly is a bug in the connection logic, but a little hint will
work it thru until we can resolve it.



  But software is like that some time and I want to thank you  much
for all the help.   Please
take my criticism with a grain of salt.   I love MPI, I just want to
see it work.   I have been
using it for 20 some years to synchronize multiple machines for I/O
testing and it is one
slick product for that.   It has helped us find many bugs in shared
files systems.  Thanks
again,


No problem!






On Tue, May 6, 2014 at 7:45 PM, > wrote:

Send users mailing list submissions to
us...@open-mpi.org 

To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-requ...@open-mpi.org 

You can reach the person managing the list at
users-ow...@open-mpi.org 

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

   1. Re: users Digest, Vol 2881, Issue 2 (Ralph Castain)


--

Message: 1
Date: Tue, 6 May 2014 17:45:09 -0700
From: Ralph Castain >
To: Open MPI Users >
Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 2
Message-ID: <4b207e61-952a-4744-9a7b-0704c4b0d...@open-mpi.org
>
Content-Type: text/plain; charset="us-ascii"

-mca btl_tcp_if_include 192.168.0.0/16 
-mca oob_tcp_if_include 192.168.0.0/16 

should do the trick. Any MPI is going to have trouble with your
arrangement - just need a little hint to help figure it out.


On May 6, 2014, at 5:14 PM, Clay Kirkland
> wrote:

>  Someone suggested using some network address if all machines
are on same subnet.
> They are all on the same subnet, I think.   I have no idea what
to put for a param there.
> I tried the ethernet address but of course it couldn't be that
simple.  Here are my ifconfig
> outputs from a couple of machines:
>
> [root@RAID MPI]# ifconfig -a
> eth0  Link encap:Ethernet  HWaddr 00:25:90:73:2A:36
>   inet addr:192.168.0.59  Bcast:192.168.0.255
 Mask:255.255.255.0
>   inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:17983 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:26309771 (25.0 MiB)  TX bytes:758940 (741.1 KiB)
>   Interrupt:16 Memory:fbde-fbe0
>
> eth1  Link encap:Ethernet  HWaddr 00:25:90:73:2A:37
>   inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link
>   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>   RX packets:56 errors:0 dropped:0 overruns:0 frame:0
>   TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
>   collisions:0 txqueuelen:1000
>   RX bytes:3924 (3.8 KiB)  TX bytes:468 (468.0 b)
>   Interrupt:17 Memory:fbee-fbf0
>
>  And from one that I can't get to work:
>
> [root@centos ~]# ifconfig -a
> eth0  Link encap:Ethernet  HWaddr 00:1E:4F:FB:30:34
>   inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link
>   UP 

Re: [OMPI users] users Digest, Vol 2881, Issue 4

2014-05-06 Thread Ralph Castain

On May 6, 2014, at 6:24 PM, Clay Kirkland  wrote:

>  Got it to work finally.  The longer line doesn't work.
> 
> But if I take off the -mca oob_tcp_if_include 192.168.0.0/16 part then 
> everything works from
> every combination of machines I have.

Interesting - I'm surprised, but glad it worked

> 
> And as to any MPI having trouble, in my original posting I stated that I 
> installed lam mpi
> on the same hardware and it worked just fine.   Maybe you guys should look at 
> what they
> do and copy it.   Virtually every machine I have used in the last 5 years has 
> multiple nic
> interfaces and almost all of them are set up to use only 1 interface.   It 
> seems odd to have
> a product that is designed to lash together multiple machines and have it 
> fail with default
> install on generic machines.

Actually, we are the "lam mpi" guys :-)

There clearly is a bug in the connection logic, but a little hint will work it 
thru until we can resolve it.

>  
>   But software is like that some time and I want to thank you  much for all 
> the help.   Please 
> take my criticism with a grain of salt.   I love MPI, I just want to see it 
> work.   I have been
> using it for 20 some years to synchronize multiple machines for I/O testing 
> and it is one
> slick product for that.   It has helped us find many bugs in shared files 
> systems.  Thanks 
> again,

No problem!

> 
> 
> 
> 
> On Tue, May 6, 2014 at 7:45 PM,  wrote:
> Send users mailing list submissions to
> us...@open-mpi.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
> 
> You can reach the person managing the list at
> users-ow...@open-mpi.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
> 
> 
> Today's Topics:
> 
>1. Re: users Digest, Vol 2881, Issue 2 (Ralph Castain)
> 
> 
> --
> 
> Message: 1
> Date: Tue, 6 May 2014 17:45:09 -0700
> From: Ralph Castain 
> To: Open MPI Users 
> Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 2
> Message-ID: <4b207e61-952a-4744-9a7b-0704c4b0d...@open-mpi.org>
> Content-Type: text/plain; charset="us-ascii"
> 
> -mca btl_tcp_if_include 192.168.0.0/16 -mca oob_tcp_if_include 192.168.0.0/16
> 
> should do the trick. Any MPI is going to have trouble with your arrangement - 
> just need a little hint to help figure it out.
> 
> 
> On May 6, 2014, at 5:14 PM, Clay Kirkland  
> wrote:
> 
> >  Someone suggested using some network address if all machines are on same 
> > subnet.
> > They are all on the same subnet, I think.   I have no idea what to put for 
> > a param there.
> > I tried the ethernet address but of course it couldn't be that simple.  
> > Here are my ifconfig
> > outputs from a couple of machines:
> >
> > [root@RAID MPI]# ifconfig -a
> > eth0  Link encap:Ethernet  HWaddr 00:25:90:73:2A:36
> >   inet addr:192.168.0.59  Bcast:192.168.0.255  Mask:255.255.255.0
> >   inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:17983 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:26309771 (25.0 MiB)  TX bytes:758940 (741.1 KiB)
> >   Interrupt:16 Memory:fbde-fbe0
> >
> > eth1  Link encap:Ethernet  HWaddr 00:25:90:73:2A:37
> >   inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:56 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:3924 (3.8 KiB)  TX bytes:468 (468.0 b)
> >   Interrupt:17 Memory:fbee-fbf0
> >
> >  And from one that I can't get to work:
> >
> > [root@centos ~]# ifconfig -a
> > eth0  Link encap:Ethernet  HWaddr 00:1E:4F:FB:30:34
> >   inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:45 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:2700 (2.6 KiB)  TX bytes:468 (468.0 b)
> >   Interrupt:21 Memory:fe9e-fea0
> >
> > eth1  Link encap:Ethernet  HWaddr 00:14:D1:22:8E:50
> >   inet addr:192.168.0.154  Bcast:192.168.0.255  Mask:255.255.255.0
> >   inet6 addr: fe80::214:d1ff:fe22:8e50/64 Scope:Link
> >   UP 

Re: [OMPI users] users Digest, Vol 2881, Issue 4

2014-05-06 Thread Clay Kirkland
 Got it to work finally.  The longer line doesn't work.

But if I take off the -mca oob_tcp_if_include 192.168.0.0/16 part then
everything works from
every combination of machines I have.

And as to any MPI having trouble, in my original posting I stated that I
installed lam mpi
on the same hardware and it worked just fine.   Maybe you guys should look
at what they
do and copy it.   Virtually every machine I have used in the last 5 years
has multiple nic
interfaces and almost all of them are set up to use only 1 interface.   It
seems odd to have
a product that is designed to lash together multiple machines and have it
fail with default
install on generic machines.

  But software is like that some time and I want to thank you  much for all
the help.   Please
take my criticism with a grain of salt.   I love MPI, I just want to see it
work.   I have been
using it for 20 some years to synchronize multiple machines for I/O testing
and it is one
slick product for that.   It has helped us find many bugs in shared files
systems.  Thanks
again,




On Tue, May 6, 2014 at 7:45 PM,  wrote:

> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>1. Re: users Digest, Vol 2881, Issue 2 (Ralph Castain)
>
>
> --
>
> Message: 1
> Date: Tue, 6 May 2014 17:45:09 -0700
> From: Ralph Castain 
> To: Open MPI Users 
> Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 2
> Message-ID: <4b207e61-952a-4744-9a7b-0704c4b0d...@open-mpi.org>
> Content-Type: text/plain; charset="us-ascii"
>
> -mca btl_tcp_if_include 192.168.0.0/16 -mca oob_tcp_if_include
> 192.168.0.0/16
>
> should do the trick. Any MPI is going to have trouble with your
> arrangement - just need a little hint to help figure it out.
>
>
> On May 6, 2014, at 5:14 PM, Clay Kirkland 
> wrote:
>
> >  Someone suggested using some network address if all machines are on
> same subnet.
> > They are all on the same subnet, I think.   I have no idea what to put
> for a param there.
> > I tried the ethernet address but of course it couldn't be that simple.
>  Here are my ifconfig
> > outputs from a couple of machines:
> >
> > [root@RAID MPI]# ifconfig -a
> > eth0  Link encap:Ethernet  HWaddr 00:25:90:73:2A:36
> >   inet addr:192.168.0.59  Bcast:192.168.0.255  Mask:255.255.255.0
> >   inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:17983 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:26309771 (25.0 MiB)  TX bytes:758940 (741.1 KiB)
> >   Interrupt:16 Memory:fbde-fbe0
> >
> > eth1  Link encap:Ethernet  HWaddr 00:25:90:73:2A:37
> >   inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:56 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:3924 (3.8 KiB)  TX bytes:468 (468.0 b)
> >   Interrupt:17 Memory:fbee-fbf0
> >
> >  And from one that I can't get to work:
> >
> > [root@centos ~]# ifconfig -a
> > eth0  Link encap:Ethernet  HWaddr 00:1E:4F:FB:30:34
> >   inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:45 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:2700 (2.6 KiB)  TX bytes:468 (468.0 b)
> >   Interrupt:21 Memory:fe9e-fea0
> >
> > eth1  Link encap:Ethernet  HWaddr 00:14:D1:22:8E:50
> >   inet addr:192.168.0.154  Bcast:192.168.0.255
>  Mask:255.255.255.0
> >   inet6 addr: fe80::214:d1ff:fe22:8e50/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:160 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:120 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:31053 (30.3 KiB)  TX bytes:18897 (18.4 KiB)
> >   Interrupt:16 Base address:0x2f00
> >
> >
> >  The centos machine is using eth1 and not