Re: [OMPI users] users Digest, Vol 2881, Issue 4

2014-05-06 Thread Clay Kirkland
 Got it to work finally.  The longer line doesn't work.

But if I take off the -mca oob_tcp_if_include 192.168.0.0/16 part then
everything works from
every combination of machines I have.

And as to any MPI having trouble, in my original posting I stated that I
installed lam mpi
on the same hardware and it worked just fine.   Maybe you guys should look
at what they
do and copy it.   Virtually every machine I have used in the last 5 years
has multiple nic
interfaces and almost all of them are set up to use only 1 interface.   It
seems odd to have
a product that is designed to lash together multiple machines and have it
fail with default
install on generic machines.

  But software is like that some time and I want to thank you  much for all
the help.   Please
take my criticism with a grain of salt.   I love MPI, I just want to see it
work.   I have been
using it for 20 some years to synchronize multiple machines for I/O testing
and it is one
slick product for that.   It has helped us find many bugs in shared files
systems.  Thanks
again,




On Tue, May 6, 2014 at 7:45 PM, <users-requ...@open-mpi.org> wrote:

> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>1. Re: users Digest, Vol 2881, Issue 2 (Ralph Castain)
>
>
> --
>
> Message: 1
> Date: Tue, 6 May 2014 17:45:09 -0700
> From: Ralph Castain <r...@open-mpi.org>
> To: Open MPI Users <us...@open-mpi.org>
> Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 2
> Message-ID: <4b207e61-952a-4744-9a7b-0704c4b0d...@open-mpi.org>
> Content-Type: text/plain; charset="us-ascii"
>
> -mca btl_tcp_if_include 192.168.0.0/16 -mca oob_tcp_if_include
> 192.168.0.0/16
>
> should do the trick. Any MPI is going to have trouble with your
> arrangement - just need a little hint to help figure it out.
>
>
> On May 6, 2014, at 5:14 PM, Clay Kirkland <clay.kirkl...@versityinc.com>
> wrote:
>
> >  Someone suggested using some network address if all machines are on
> same subnet.
> > They are all on the same subnet, I think.   I have no idea what to put
> for a param there.
> > I tried the ethernet address but of course it couldn't be that simple.
>  Here are my ifconfig
> > outputs from a couple of machines:
> >
> > [root@RAID MPI]# ifconfig -a
> > eth0  Link encap:Ethernet  HWaddr 00:25:90:73:2A:36
> >   inet addr:192.168.0.59  Bcast:192.168.0.255  Mask:255.255.255.0
> >   inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:17983 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:26309771 (25.0 MiB)  TX bytes:758940 (741.1 KiB)
> >   Interrupt:16 Memory:fbde-fbe0
> >
> > eth1  Link encap:Ethernet  HWaddr 00:25:90:73:2A:37
> >   inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:56 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:3924 (3.8 KiB)  TX bytes:468 (468.0 b)
> >   Interrupt:17 Memory:fbee-fbf0
> >
> >  And from one that I can't get to work:
> >
> > [root@centos ~]# ifconfig -a
> > eth0  Link encap:Ethernet  HWaddr 00:1E:4F:FB:30:34
> >   inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link
> >   UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
> >   RX packets:45 errors:0 dropped:0 overruns:0 frame:0
> >   TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
> >   collisions:0 txqueuelen:1000
> >   RX bytes:2700 (2.6 KiB)  TX bytes:468 (468.0 b)
> >   Interrupt:21 Memory:fe9e-fea0
> >
> > eth1  Link encap:Ethernet  HWaddr 00:14:D1:22:8E:50
> >   inet addr:192.168.0.154  Bcast:192.168.0.255
>  Mask:255.255.255.0
> >   inet6 addr: fe80::214:d1ff:fe22:

Re: [OMPI users] users Digest, Vol 2881, Issue 2

2014-05-06 Thread Clay Kirkland
 Someone suggested using some network address if all machines are on same
subnet.
They are all on the same subnet, I think.   I have no idea what to put for
a param there.
I tried the ethernet address but of course it couldn't be that simple.
Here are my ifconfig
outputs from a couple of machines:

[root@RAID MPI]# ifconfig -a
eth0  Link encap:Ethernet  HWaddr 00:25:90:73:2A:36
  inet addr:192.168.0.59  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:17983 errors:0 dropped:0 overruns:0 frame:0
  TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:26309771 (25.0 MiB)  TX bytes:758940 (741.1 KiB)
  Interrupt:16 Memory:fbde-fbe0

eth1  Link encap:Ethernet  HWaddr 00:25:90:73:2A:37
  inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:56 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:3924 (3.8 KiB)  TX bytes:468 (468.0 b)
  Interrupt:17 Memory:fbee-fbf0

 And from one that I can't get to work:

[root@centos ~]# ifconfig -a
eth0  Link encap:Ethernet  HWaddr 00:1E:4F:FB:30:34
  inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:45 errors:0 dropped:0 overruns:0 frame:0
  TX packets:6 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:2700 (2.6 KiB)  TX bytes:468 (468.0 b)
  Interrupt:21 Memory:fe9e-fea0

eth1  Link encap:Ethernet  HWaddr 00:14:D1:22:8E:50
  inet addr:192.168.0.154  Bcast:192.168.0.255  Mask:255.255.255.0
  inet6 addr: fe80::214:d1ff:fe22:8e50/64 Scope:Link
  UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
  RX packets:160 errors:0 dropped:0 overruns:0 frame:0
  TX packets:120 errors:0 dropped:0 overruns:0 carrier:0
  collisions:0 txqueuelen:1000
  RX bytes:31053 (30.3 KiB)  TX bytes:18897 (18.4 KiB)
  Interrupt:16 Base address:0x2f00


 The centos machine is using eth1 and not eth0, therein lies the problem.

 I don't really need all this optimization of using multiple ethernet
adaptors to speed things
up.   I am just using MPI to synchronize I/O tests.   Can I go back to a
really old version
and avoid all this pain full debugging???




On Tue, May 6, 2014 at 6:50 PM, <users-requ...@open-mpi.org> wrote:

> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>1. Re: users Digest, Vol 2881, Issue 1 (Clay Kirkland)
>2. Re: users Digest, Vol 2881, Issue 1 (Clay Kirkland)
>
>
> ------
>
> Message: 1
> Date: Tue, 6 May 2014 18:28:59 -0500
> From: Clay Kirkland <clay.kirkl...@versityinc.com>
> To: us...@open-mpi.org
> Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 1
> Message-ID:
> <
> cajdnja90buhwu_ihssnna1a4p35+o96rrxk19xnhwo-nsd_...@mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
>  That last trick seems to work.  I can get it to work once in a while with
> those tcp options but it is
> tricky as I have three machines and two of them use eth0 as primary network
> interface and one
> uses eth1.   But by fiddling with network options and perhaps moving a
> cable or two I think I can
> get it all to workThanks much for the tip.
>
>  Clay
>
>
> On Tue, May 6, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote:
>
> > Send users mailing list submissions to
> > us...@open-mpi.org
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
> > or, via email, send a message with subject or body 'help' to
> > users-requ...@open-mpi.org
> >
> > You can reach the person managing the list at
> > users-ow...@open-mpi.org
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of

Re: [OMPI users] users Digest, Vol 2881, Issue 1

2014-05-06 Thread Clay Kirkland
 Well it turns out  I can't seem to get all three of my machines on the
same page.
Two of them are using eth0 and one is using eth1.   Centos seems unable to
bring
up multiple network interfaces for some reason and when I use the mca param
to
use eth0 it works on two machines but not the other.   Is there some way to
use
only eth1 on one host and only eth0 on the other two?   Maybe environment
variables
but I can't seem to get that to work either.

 Clay


On Tue, May 6, 2014 at 6:28 PM, Clay Kirkland
<clay.kirkl...@versityinc.com>wrote:

>  That last trick seems to work.  I can get it to work once in a while with
> those tcp options but it is
> tricky as I have three machines and two of them use eth0 as primary
> network interface and one
> uses eth1.   But by fiddling with network options and perhaps moving a
> cable or two I think I can
> get it all to workThanks much for the tip.
>
>  Clay
>
>
> On Tue, May 6, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote:
>
>> Send users mailing list submissions to
>> us...@open-mpi.org
>>
>> To subscribe or unsubscribe via the World Wide Web, visit
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> or, via email, send a message with subject or body 'help' to
>> users-requ...@open-mpi.org
>>
>> You can reach the person managing the list at
>> users-ow...@open-mpi.org
>>
>> When replying, please edit your Subject line so it is more specific
>> than "Re: Contents of users digest..."
>>
>>
>> Today's Topics:
>>
>>1. Re: MPI_Barrier hangs on second attempt but only  when
>>   multiple hosts used. (Daniels, Marcus G)
>>2. ROMIO bug reading darrays (Richard Shaw)
>>3. MPI File Open does not work (Imran Ali)
>>4. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>>5. Re: MPI File Open does not work (Imran Ali)
>>6. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>>7. Re: MPI File Open does not work (Imran Ali)
>>8. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>>9. Re: users Digest, Vol 2879, Issue 1 (Jeff Squyres (jsquyres))
>>
>>
>> --
>>
>> Message: 1
>> Date: Mon, 5 May 2014 19:28:07 +
>> From: "Daniels, Marcus G" <mdani...@lanl.gov>
>> To: "'us...@open-mpi.org'" <us...@open-mpi.org>
>> Subject: Re: [OMPI users] MPI_Barrier hangs on second attempt but only
>> whenmultiple hosts used.
>> Message-ID:
>> <
>> 532c594b7920a549a2a91cb4312cc57640dc5...@ecs-exg-p-mb01.win.lanl.gov>
>> Content-Type: text/plain; charset="utf-8"
>>
>>
>>
>> From: Clay Kirkland [mailto:clay.kirkl...@versityinc.com]
>> Sent: Friday, May 02, 2014 03:24 PM
>> To: us...@open-mpi.org <us...@open-mpi.org>
>> Subject: [OMPI users] MPI_Barrier hangs on second attempt but only when
>> multiple hosts used.
>>
>> I have been using MPI for many many years so I have very well debugged
>> mpi tests.   I am
>> having trouble on either openmpi-1.4.5  or  openmpi-1.6.5 versions though
>> with getting the
>> MPI_Barrier calls to work.   It works fine when I run all processes on
>> one machine but when
>> I run with two or more hosts the second call to MPI_Barrier always hangs.
>>   Not the first one,
>> but always the second one.   I looked at FAQ's and such but found nothing
>> except for a comment
>> that MPI_Barrier problems were often problems with fire walls.  Also
>> mentioned as a problem
>> was not having the same version of mpi on both machines.  I turned
>> firewalls off and removed
>> and reinstalled the same version on both hosts but I still see the same
>> thing.   I then installed
>> lam mpi on two of my machines and that works fine.   I can call the
>> MPI_Barrier function when run on
>> one of two machines by itself  many times with no hangs.  Only hangs if
>> two or more hosts are involved.
>> These runs are all being done on CentOS release 6.4.   Here is test
>> program I used.
>>
>> main (argc, argv)
>> int argc;
>> char **argv;
>> {
>> char message[20];
>> char hoster[256];
>> char nameis[256];
>> int fd, i, j, jnp, iret, myrank, np, ranker, recker;
>> MPI_Comm comm;
>> MPI_Status status;
>>
>> MPI_Init( ,  );
>> MPI_Comm_rank( MPI_COMM_WORLD, );
>> MPI_Comm_size( MPI_COMM_WORLD, );
>>
>> gethostname(hoster

Re: [OMPI users] users Digest, Vol 2881, Issue 1

2014-05-06 Thread Clay Kirkland
 That last trick seems to work.  I can get it to work once in a while with
those tcp options but it is
tricky as I have three machines and two of them use eth0 as primary network
interface and one
uses eth1.   But by fiddling with network options and perhaps moving a
cable or two I think I can
get it all to workThanks much for the tip.

 Clay


On Tue, May 6, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote:

> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>1. Re: MPI_Barrier hangs on second attempt but only  when
>   multiple hosts used. (Daniels, Marcus G)
>2. ROMIO bug reading darrays (Richard Shaw)
>3. MPI File Open does not work (Imran Ali)
>4. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>5. Re: MPI File Open does not work (Imran Ali)
>6. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>7. Re: MPI File Open does not work (Imran Ali)
>8. Re: MPI File Open does not work (Jeff Squyres (jsquyres))
>9. Re: users Digest, Vol 2879, Issue 1 (Jeff Squyres (jsquyres))
>
>
> --
>
> Message: 1
> Date: Mon, 5 May 2014 19:28:07 +
> From: "Daniels, Marcus G" <mdani...@lanl.gov>
> To: "'us...@open-mpi.org'" <us...@open-mpi.org>
> Subject: Re: [OMPI users] MPI_Barrier hangs on second attempt but only
> whenmultiple hosts used.
> Message-ID:
> <
> 532c594b7920a549a2a91cb4312cc57640dc5...@ecs-exg-p-mb01.win.lanl.gov>
> Content-Type: text/plain; charset="utf-8"
>
>
>
> From: Clay Kirkland [mailto:clay.kirkl...@versityinc.com]
> Sent: Friday, May 02, 2014 03:24 PM
> To: us...@open-mpi.org <us...@open-mpi.org>
> Subject: [OMPI users] MPI_Barrier hangs on second attempt but only when
> multiple hosts used.
>
> I have been using MPI for many many years so I have very well debugged mpi
> tests.   I am
> having trouble on either openmpi-1.4.5  or  openmpi-1.6.5 versions though
> with getting the
> MPI_Barrier calls to work.   It works fine when I run all processes on one
> machine but when
> I run with two or more hosts the second call to MPI_Barrier always hangs.
>   Not the first one,
> but always the second one.   I looked at FAQ's and such but found nothing
> except for a comment
> that MPI_Barrier problems were often problems with fire walls.  Also
> mentioned as a problem
> was not having the same version of mpi on both machines.  I turned
> firewalls off and removed
> and reinstalled the same version on both hosts but I still see the same
> thing.   I then installed
> lam mpi on two of my machines and that works fine.   I can call the
> MPI_Barrier function when run on
> one of two machines by itself  many times with no hangs.  Only hangs if
> two or more hosts are involved.
> These runs are all being done on CentOS release 6.4.   Here is test
> program I used.
>
> main (argc, argv)
> int argc;
> char **argv;
> {
> char message[20];
> char hoster[256];
> char nameis[256];
> int fd, i, j, jnp, iret, myrank, np, ranker, recker;
> MPI_Comm comm;
> MPI_Status status;
>
> MPI_Init( ,  );
> MPI_Comm_rank( MPI_COMM_WORLD, );
> MPI_Comm_size( MPI_COMM_WORLD, );
>
> gethostname(hoster,256);
>
> printf(" In rank %d and host= %s  Do Barrier call
> 1.\n",myrank,hoster);
> MPI_Barrier(MPI_COMM_WORLD);
> printf(" In rank %d and host= %s  Do Barrier call
> 2.\n",myrank,hoster);
> MPI_Barrier(MPI_COMM_WORLD);
> printf(" In rank %d and host= %s  Do Barrier call
> 3.\n",myrank,hoster);
> MPI_Barrier(MPI_COMM_WORLD);
> MPI_Finalize();
> exit(0);
> }
>
>   Here are three runs of test program.  First with two processes on one
> host, then with
> two processes on another host, and finally with one process on each of two
> hosts.  The
> first two runs are fine but the last run hangs on the second MPI_Barrier.
>
> [root@centos MPI]# /usr/local/bin/mpirun -np 2 --host centos a.out
>  In rank 0 and host= centos  Do Barrier call 1.
>  In rank 1 and host= centos  Do Barrier call 1.
>  In rank 1 and ho

Re: [OMPI users] users Digest, Vol 2879, Issue 1

2014-05-04 Thread Clay Kirkland
 I am configuring with all defaults.   Just doing a ./configure and then
make and make install.   I have used open mpi on several kinds of
unix  systems this way and have had no trouble before.   I believe I
last had success on a redhat version of linux.


On Sat, May 3, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote:

> Send users mailing list submissions to
> us...@open-mpi.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> or, via email, send a message with subject or body 'help' to
> users-requ...@open-mpi.org
>
> You can reach the person managing the list at
> users-ow...@open-mpi.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of users digest..."
>
>
> Today's Topics:
>
>1. MPI_Barrier hangs on second attempt but only when multiple
>   hosts used. (Clay Kirkland)
>2. Re: MPI_Barrier hangs on second attempt but only when
>   multiple hosts used. (Ralph Castain)
>
>
> --
>
> Message: 1
> Date: Fri, 2 May 2014 16:24:04 -0500
> From: Clay Kirkland <clay.kirkl...@versityinc.com>
> To: us...@open-mpi.org
> Subject: [OMPI users] MPI_Barrier hangs on second attempt but only
> whenmultiple hosts used.
> Message-ID:
> 

[OMPI users] MPI_Barrier hangs on second attempt but only when multiple hosts used.

2014-05-02 Thread Clay Kirkland
I have been using MPI for many many years so I have very well debugged mpi
tests.   I am
having trouble on either openmpi-1.4.5  or  openmpi-1.6.5 versions though
with getting the
MPI_Barrier calls to work.   It works fine when I run all processes on one
machine but when
I run with two or more hosts the second call to MPI_Barrier always hangs.
Not the first one,
but always the second one.   I looked at FAQ's and such but found nothing
except for a comment
that MPI_Barrier problems were often problems with fire walls.  Also
mentioned as a problem
was not having the same version of mpi on both machines.  I turned
firewalls off and removed
and reinstalled the same version on both hosts but I still see the same
thing.   I then installed
lam mpi on two of my machines and that works fine.   I can call the
MPI_Barrier function when run on
one of two machines by itself  many times with no hangs.  Only hangs if two
or more hosts are involved.
These runs are all being done on CentOS release 6.4.   Here is test program
I used.

main (argc, argv)
int argc;
char **argv;
{
char message[20];
char hoster[256];
char nameis[256];
int fd, i, j, jnp, iret, myrank, np, ranker, recker;
MPI_Comm comm;
MPI_Status status;

MPI_Init( ,  );
MPI_Comm_rank( MPI_COMM_WORLD, );
MPI_Comm_size( MPI_COMM_WORLD, );

gethostname(hoster,256);

printf(" In rank %d and host= %s  Do Barrier call
1.\n",myrank,hoster);
MPI_Barrier(MPI_COMM_WORLD);
printf(" In rank %d and host= %s  Do Barrier call
2.\n",myrank,hoster);
MPI_Barrier(MPI_COMM_WORLD);
printf(" In rank %d and host= %s  Do Barrier call
3.\n",myrank,hoster);
MPI_Barrier(MPI_COMM_WORLD);
MPI_Finalize();
exit(0);
}

  Here are three runs of test program.  First with two processes on one
host, then with
two processes on another host, and finally with one process on each of two
hosts.  The
first two runs are fine but the last run hangs on the second MPI_Barrier.

[root@centos MPI]# /usr/local/bin/mpirun -np 2 --host centos a.out
 In rank 0 and host= centos  Do Barrier call 1.
 In rank 1 and host= centos  Do Barrier call 1.
 In rank 1 and host= centos  Do Barrier call 2.
 In rank 1 and host= centos  Do Barrier call 3.
 In rank 0 and host= centos  Do Barrier call 2.
 In rank 0 and host= centos  Do Barrier call 3.
[root@centos MPI]# /usr/local/bin/mpirun -np 2 --host RAID a.out
/root/.bashrc: line 14: unalias: ls: not found
 In rank 0 and host= RAID  Do Barrier call 1.
 In rank 0 and host= RAID  Do Barrier call 2.
 In rank 0 and host= RAID  Do Barrier call 3.
 In rank 1 and host= RAID  Do Barrier call 1.
 In rank 1 and host= RAID  Do Barrier call 2.
 In rank 1 and host= RAID  Do Barrier call 3.
[root@centos MPI]# /usr/local/bin/mpirun -np 2 --host centos,RAID a.out
/root/.bashrc: line 14: unalias: ls: not found
 In rank 0 and host= centos  Do Barrier call 1.
 In rank 0 and host= centos  Do Barrier call 2.
In rank 1 and host= RAID  Do Barrier call 1.
 In rank 1 and host= RAID  Do Barrier call 2.

  Since it is such a simple test and problem and such a widely used MPI
function, it must obviously
be an installation or configuration problem.   A pstack for each of the
hung MPI_Barrier processes
on the two machines shows this:

[root@centos ~]# pstack 31666
#0  0x003baf0e8ee3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x7f5de06125eb in epoll_dispatch () from /usr/local/lib/libmpi.so.1
#2  0x7f5de061475a in opal_event_base_loop () from
/usr/local/lib/libmpi.so.1
#3  0x7f5de0639229 in opal_progress () from /usr/local/lib/libmpi.so.1
#4  0x7f5de0586f75 in ompi_request_default_wait_all () from
/usr/local/lib/libmpi.so.1
#5  0x7f5ddc59565e in ompi_coll_tuned_sendrecv_actual () from
/usr/local/lib/openmpi/mca_coll_tuned.so
#6  0x7f5ddc59d8ff in ompi_coll_tuned_barrier_intra_two_procs () from
/usr/local/lib/openmpi/mca_coll_tuned.so
#7  0x7f5de05941c2 in PMPI_Barrier () from /usr/local/lib/libmpi.so.1
#8  0x00400a43 in main ()

[root@RAID openmpi-1.6.5]# pstack 22167
#0  0x0030302e8ee3 in __epoll_wait_nocancel () from /lib64/libc.so.6
#1  0x7f7ee46885eb in epoll_dispatch () from /usr/local/lib/libmpi.so.1
#2  0x7f7ee468a75a in opal_event_base_loop () from
/usr/local/lib/libmpi.so.1
#3  0x7f7ee46af229 in opal_progress () from /usr/local/lib/libmpi.so.1
#4  0x7f7ee45fcf75 in ompi_request_default_wait_all () from
/usr/local/lib/libmpi.so.1
#5  0x7f7ee060b65e in ompi_coll_tuned_sendrecv_actual () from
/usr/local/lib/openmpi/mca_coll_tuned.so
#6  0x7f7ee06138ff in ompi_coll_tuned_barrier_intra_two_procs () from
/usr/local/lib/openmpi/mca_coll_tuned.so
#7  0x7f7ee460a1c2 in PMPI_Barrier () from /usr/local/lib/libmpi.so.1
#8  0x00400a43 in main ()

 Which looks exactly the same on each machine.  Any thoughts or ideas would
be greatly appreciated as
I am stuck.

 Clay Kirkland