Re: [OMPI users] users Digest, Vol 2881, Issue 4
Got it to work finally. The longer line doesn't work. But if I take off the -mca oob_tcp_if_include 192.168.0.0/16 part then everything works from every combination of machines I have. And as to any MPI having trouble, in my original posting I stated that I installed lam mpi on the same hardware and it worked just fine. Maybe you guys should look at what they do and copy it. Virtually every machine I have used in the last 5 years has multiple nic interfaces and almost all of them are set up to use only 1 interface. It seems odd to have a product that is designed to lash together multiple machines and have it fail with default install on generic machines. But software is like that some time and I want to thank you much for all the help. Please take my criticism with a grain of salt. I love MPI, I just want to see it work. I have been using it for 20 some years to synchronize multiple machines for I/O testing and it is one slick product for that. It has helped us find many bugs in shared files systems. Thanks again, On Tue, May 6, 2014 at 7:45 PM, <users-requ...@open-mpi.org> wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > >1. Re: users Digest, Vol 2881, Issue 2 (Ralph Castain) > > > -- > > Message: 1 > Date: Tue, 6 May 2014 17:45:09 -0700 > From: Ralph Castain <r...@open-mpi.org> > To: Open MPI Users <us...@open-mpi.org> > Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 2 > Message-ID: <4b207e61-952a-4744-9a7b-0704c4b0d...@open-mpi.org> > Content-Type: text/plain; charset="us-ascii" > > -mca btl_tcp_if_include 192.168.0.0/16 -mca oob_tcp_if_include > 192.168.0.0/16 > > should do the trick. Any MPI is going to have trouble with your > arrangement - just need a little hint to help figure it out. > > > On May 6, 2014, at 5:14 PM, Clay Kirkland <clay.kirkl...@versityinc.com> > wrote: > > > Someone suggested using some network address if all machines are on > same subnet. > > They are all on the same subnet, I think. I have no idea what to put > for a param there. > > I tried the ethernet address but of course it couldn't be that simple. > Here are my ifconfig > > outputs from a couple of machines: > > > > [root@RAID MPI]# ifconfig -a > > eth0 Link encap:Ethernet HWaddr 00:25:90:73:2A:36 > > inet addr:192.168.0.59 Bcast:192.168.0.255 Mask:255.255.255.0 > > inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:17983 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:26309771 (25.0 MiB) TX bytes:758940 (741.1 KiB) > > Interrupt:16 Memory:fbde-fbe0 > > > > eth1 Link encap:Ethernet HWaddr 00:25:90:73:2A:37 > > inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:56 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:3924 (3.8 KiB) TX bytes:468 (468.0 b) > > Interrupt:17 Memory:fbee-fbf0 > > > > And from one that I can't get to work: > > > > [root@centos ~]# ifconfig -a > > eth0 Link encap:Ethernet HWaddr 00:1E:4F:FB:30:34 > > inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link > > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > > RX packets:45 errors:0 dropped:0 overruns:0 frame:0 > > TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 > > collisions:0 txqueuelen:1000 > > RX bytes:2700 (2.6 KiB) TX bytes:468 (468.0 b) > > Interrupt:21 Memory:fe9e-fea0 > > > > eth1 Link encap:Ethernet HWaddr 00:14:D1:22:8E:50 > > inet addr:192.168.0.154 Bcast:192.168.0.255 > Mask:255.255.255.0 > > inet6 addr: fe80::214:d1ff:fe22:
Re: [OMPI users] users Digest, Vol 2881, Issue 2
Someone suggested using some network address if all machines are on same subnet. They are all on the same subnet, I think. I have no idea what to put for a param there. I tried the ethernet address but of course it couldn't be that simple. Here are my ifconfig outputs from a couple of machines: [root@RAID MPI]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:25:90:73:2A:36 inet addr:192.168.0.59 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::225:90ff:fe73:2a36/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:17983 errors:0 dropped:0 overruns:0 frame:0 TX packets:9952 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:26309771 (25.0 MiB) TX bytes:758940 (741.1 KiB) Interrupt:16 Memory:fbde-fbe0 eth1 Link encap:Ethernet HWaddr 00:25:90:73:2A:37 inet6 addr: fe80::225:90ff:fe73:2a37/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:56 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:3924 (3.8 KiB) TX bytes:468 (468.0 b) Interrupt:17 Memory:fbee-fbf0 And from one that I can't get to work: [root@centos ~]# ifconfig -a eth0 Link encap:Ethernet HWaddr 00:1E:4F:FB:30:34 inet6 addr: fe80::21e:4fff:fefb:3034/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:45 errors:0 dropped:0 overruns:0 frame:0 TX packets:6 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2700 (2.6 KiB) TX bytes:468 (468.0 b) Interrupt:21 Memory:fe9e-fea0 eth1 Link encap:Ethernet HWaddr 00:14:D1:22:8E:50 inet addr:192.168.0.154 Bcast:192.168.0.255 Mask:255.255.255.0 inet6 addr: fe80::214:d1ff:fe22:8e50/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:160 errors:0 dropped:0 overruns:0 frame:0 TX packets:120 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:31053 (30.3 KiB) TX bytes:18897 (18.4 KiB) Interrupt:16 Base address:0x2f00 The centos machine is using eth1 and not eth0, therein lies the problem. I don't really need all this optimization of using multiple ethernet adaptors to speed things up. I am just using MPI to synchronize I/O tests. Can I go back to a really old version and avoid all this pain full debugging??? On Tue, May 6, 2014 at 6:50 PM, <users-requ...@open-mpi.org> wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > >1. Re: users Digest, Vol 2881, Issue 1 (Clay Kirkland) >2. Re: users Digest, Vol 2881, Issue 1 (Clay Kirkland) > > > ------ > > Message: 1 > Date: Tue, 6 May 2014 18:28:59 -0500 > From: Clay Kirkland <clay.kirkl...@versityinc.com> > To: us...@open-mpi.org > Subject: Re: [OMPI users] users Digest, Vol 2881, Issue 1 > Message-ID: > < > cajdnja90buhwu_ihssnna1a4p35+o96rrxk19xnhwo-nsd_...@mail.gmail.com> > Content-Type: text/plain; charset="utf-8" > > That last trick seems to work. I can get it to work once in a while with > those tcp options but it is > tricky as I have three machines and two of them use eth0 as primary network > interface and one > uses eth1. But by fiddling with network options and perhaps moving a > cable or two I think I can > get it all to workThanks much for the tip. > > Clay > > > On Tue, May 6, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote: > > > Send users mailing list submissions to > > us...@open-mpi.org > > > > To subscribe or unsubscribe via the World Wide Web, visit > > http://www.open-mpi.org/mailman/listinfo.cgi/users > > or, via email, send a message with subject or body 'help' to > > users-requ...@open-mpi.org > > > > You can reach the person managing the list at > > users-ow...@open-mpi.org > > > > When replying, please edit your Subject line so it is more specific > > than "Re: Contents of
Re: [OMPI users] users Digest, Vol 2881, Issue 1
Well it turns out I can't seem to get all three of my machines on the same page. Two of them are using eth0 and one is using eth1. Centos seems unable to bring up multiple network interfaces for some reason and when I use the mca param to use eth0 it works on two machines but not the other. Is there some way to use only eth1 on one host and only eth0 on the other two? Maybe environment variables but I can't seem to get that to work either. Clay On Tue, May 6, 2014 at 6:28 PM, Clay Kirkland <clay.kirkl...@versityinc.com>wrote: > That last trick seems to work. I can get it to work once in a while with > those tcp options but it is > tricky as I have three machines and two of them use eth0 as primary > network interface and one > uses eth1. But by fiddling with network options and perhaps moving a > cable or two I think I can > get it all to workThanks much for the tip. > > Clay > > > On Tue, May 6, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote: > >> Send users mailing list submissions to >> us...@open-mpi.org >> >> To subscribe or unsubscribe via the World Wide Web, visit >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> or, via email, send a message with subject or body 'help' to >> users-requ...@open-mpi.org >> >> You can reach the person managing the list at >> users-ow...@open-mpi.org >> >> When replying, please edit your Subject line so it is more specific >> than "Re: Contents of users digest..." >> >> >> Today's Topics: >> >>1. Re: MPI_Barrier hangs on second attempt but only when >> multiple hosts used. (Daniels, Marcus G) >>2. ROMIO bug reading darrays (Richard Shaw) >>3. MPI File Open does not work (Imran Ali) >>4. Re: MPI File Open does not work (Jeff Squyres (jsquyres)) >>5. Re: MPI File Open does not work (Imran Ali) >>6. Re: MPI File Open does not work (Jeff Squyres (jsquyres)) >>7. Re: MPI File Open does not work (Imran Ali) >>8. Re: MPI File Open does not work (Jeff Squyres (jsquyres)) >>9. Re: users Digest, Vol 2879, Issue 1 (Jeff Squyres (jsquyres)) >> >> >> -- >> >> Message: 1 >> Date: Mon, 5 May 2014 19:28:07 + >> From: "Daniels, Marcus G" <mdani...@lanl.gov> >> To: "'us...@open-mpi.org'" <us...@open-mpi.org> >> Subject: Re: [OMPI users] MPI_Barrier hangs on second attempt but only >> whenmultiple hosts used. >> Message-ID: >> < >> 532c594b7920a549a2a91cb4312cc57640dc5...@ecs-exg-p-mb01.win.lanl.gov> >> Content-Type: text/plain; charset="utf-8" >> >> >> >> From: Clay Kirkland [mailto:clay.kirkl...@versityinc.com] >> Sent: Friday, May 02, 2014 03:24 PM >> To: us...@open-mpi.org <us...@open-mpi.org> >> Subject: [OMPI users] MPI_Barrier hangs on second attempt but only when >> multiple hosts used. >> >> I have been using MPI for many many years so I have very well debugged >> mpi tests. I am >> having trouble on either openmpi-1.4.5 or openmpi-1.6.5 versions though >> with getting the >> MPI_Barrier calls to work. It works fine when I run all processes on >> one machine but when >> I run with two or more hosts the second call to MPI_Barrier always hangs. >> Not the first one, >> but always the second one. I looked at FAQ's and such but found nothing >> except for a comment >> that MPI_Barrier problems were often problems with fire walls. Also >> mentioned as a problem >> was not having the same version of mpi on both machines. I turned >> firewalls off and removed >> and reinstalled the same version on both hosts but I still see the same >> thing. I then installed >> lam mpi on two of my machines and that works fine. I can call the >> MPI_Barrier function when run on >> one of two machines by itself many times with no hangs. Only hangs if >> two or more hosts are involved. >> These runs are all being done on CentOS release 6.4. Here is test >> program I used. >> >> main (argc, argv) >> int argc; >> char **argv; >> { >> char message[20]; >> char hoster[256]; >> char nameis[256]; >> int fd, i, j, jnp, iret, myrank, np, ranker, recker; >> MPI_Comm comm; >> MPI_Status status; >> >> MPI_Init( , ); >> MPI_Comm_rank( MPI_COMM_WORLD, ); >> MPI_Comm_size( MPI_COMM_WORLD, ); >> >> gethostname(hoster
Re: [OMPI users] users Digest, Vol 2881, Issue 1
That last trick seems to work. I can get it to work once in a while with those tcp options but it is tricky as I have three machines and two of them use eth0 as primary network interface and one uses eth1. But by fiddling with network options and perhaps moving a cable or two I think I can get it all to workThanks much for the tip. Clay On Tue, May 6, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > >1. Re: MPI_Barrier hangs on second attempt but only when > multiple hosts used. (Daniels, Marcus G) >2. ROMIO bug reading darrays (Richard Shaw) >3. MPI File Open does not work (Imran Ali) >4. Re: MPI File Open does not work (Jeff Squyres (jsquyres)) >5. Re: MPI File Open does not work (Imran Ali) >6. Re: MPI File Open does not work (Jeff Squyres (jsquyres)) >7. Re: MPI File Open does not work (Imran Ali) >8. Re: MPI File Open does not work (Jeff Squyres (jsquyres)) >9. Re: users Digest, Vol 2879, Issue 1 (Jeff Squyres (jsquyres)) > > > -- > > Message: 1 > Date: Mon, 5 May 2014 19:28:07 + > From: "Daniels, Marcus G" <mdani...@lanl.gov> > To: "'us...@open-mpi.org'" <us...@open-mpi.org> > Subject: Re: [OMPI users] MPI_Barrier hangs on second attempt but only > whenmultiple hosts used. > Message-ID: > < > 532c594b7920a549a2a91cb4312cc57640dc5...@ecs-exg-p-mb01.win.lanl.gov> > Content-Type: text/plain; charset="utf-8" > > > > From: Clay Kirkland [mailto:clay.kirkl...@versityinc.com] > Sent: Friday, May 02, 2014 03:24 PM > To: us...@open-mpi.org <us...@open-mpi.org> > Subject: [OMPI users] MPI_Barrier hangs on second attempt but only when > multiple hosts used. > > I have been using MPI for many many years so I have very well debugged mpi > tests. I am > having trouble on either openmpi-1.4.5 or openmpi-1.6.5 versions though > with getting the > MPI_Barrier calls to work. It works fine when I run all processes on one > machine but when > I run with two or more hosts the second call to MPI_Barrier always hangs. > Not the first one, > but always the second one. I looked at FAQ's and such but found nothing > except for a comment > that MPI_Barrier problems were often problems with fire walls. Also > mentioned as a problem > was not having the same version of mpi on both machines. I turned > firewalls off and removed > and reinstalled the same version on both hosts but I still see the same > thing. I then installed > lam mpi on two of my machines and that works fine. I can call the > MPI_Barrier function when run on > one of two machines by itself many times with no hangs. Only hangs if > two or more hosts are involved. > These runs are all being done on CentOS release 6.4. Here is test > program I used. > > main (argc, argv) > int argc; > char **argv; > { > char message[20]; > char hoster[256]; > char nameis[256]; > int fd, i, j, jnp, iret, myrank, np, ranker, recker; > MPI_Comm comm; > MPI_Status status; > > MPI_Init( , ); > MPI_Comm_rank( MPI_COMM_WORLD, ); > MPI_Comm_size( MPI_COMM_WORLD, ); > > gethostname(hoster,256); > > printf(" In rank %d and host= %s Do Barrier call > 1.\n",myrank,hoster); > MPI_Barrier(MPI_COMM_WORLD); > printf(" In rank %d and host= %s Do Barrier call > 2.\n",myrank,hoster); > MPI_Barrier(MPI_COMM_WORLD); > printf(" In rank %d and host= %s Do Barrier call > 3.\n",myrank,hoster); > MPI_Barrier(MPI_COMM_WORLD); > MPI_Finalize(); > exit(0); > } > > Here are three runs of test program. First with two processes on one > host, then with > two processes on another host, and finally with one process on each of two > hosts. The > first two runs are fine but the last run hangs on the second MPI_Barrier. > > [root@centos MPI]# /usr/local/bin/mpirun -np 2 --host centos a.out > In rank 0 and host= centos Do Barrier call 1. > In rank 1 and host= centos Do Barrier call 1. > In rank 1 and ho
Re: [OMPI users] users Digest, Vol 2879, Issue 1
I am configuring with all defaults. Just doing a ./configure and then make and make install. I have used open mpi on several kinds of unix systems this way and have had no trouble before. I believe I last had success on a redhat version of linux. On Sat, May 3, 2014 at 11:00 AM, <users-requ...@open-mpi.org> wrote: > Send users mailing list submissions to > us...@open-mpi.org > > To subscribe or unsubscribe via the World Wide Web, visit > http://www.open-mpi.org/mailman/listinfo.cgi/users > or, via email, send a message with subject or body 'help' to > users-requ...@open-mpi.org > > You can reach the person managing the list at > users-ow...@open-mpi.org > > When replying, please edit your Subject line so it is more specific > than "Re: Contents of users digest..." > > > Today's Topics: > >1. MPI_Barrier hangs on second attempt but only when multiple > hosts used. (Clay Kirkland) >2. Re: MPI_Barrier hangs on second attempt but only when > multiple hosts used. (Ralph Castain) > > > -- > > Message: 1 > Date: Fri, 2 May 2014 16:24:04 -0500 > From: Clay Kirkland <clay.kirkl...@versityinc.com> > To: us...@open-mpi.org > Subject: [OMPI users] MPI_Barrier hangs on second attempt but only > whenmultiple hosts used. > Message-ID: >
[OMPI users] MPI_Barrier hangs on second attempt but only when multiple hosts used.
I have been using MPI for many many years so I have very well debugged mpi tests. I am having trouble on either openmpi-1.4.5 or openmpi-1.6.5 versions though with getting the MPI_Barrier calls to work. It works fine when I run all processes on one machine but when I run with two or more hosts the second call to MPI_Barrier always hangs. Not the first one, but always the second one. I looked at FAQ's and such but found nothing except for a comment that MPI_Barrier problems were often problems with fire walls. Also mentioned as a problem was not having the same version of mpi on both machines. I turned firewalls off and removed and reinstalled the same version on both hosts but I still see the same thing. I then installed lam mpi on two of my machines and that works fine. I can call the MPI_Barrier function when run on one of two machines by itself many times with no hangs. Only hangs if two or more hosts are involved. These runs are all being done on CentOS release 6.4. Here is test program I used. main (argc, argv) int argc; char **argv; { char message[20]; char hoster[256]; char nameis[256]; int fd, i, j, jnp, iret, myrank, np, ranker, recker; MPI_Comm comm; MPI_Status status; MPI_Init( , ); MPI_Comm_rank( MPI_COMM_WORLD, ); MPI_Comm_size( MPI_COMM_WORLD, ); gethostname(hoster,256); printf(" In rank %d and host= %s Do Barrier call 1.\n",myrank,hoster); MPI_Barrier(MPI_COMM_WORLD); printf(" In rank %d and host= %s Do Barrier call 2.\n",myrank,hoster); MPI_Barrier(MPI_COMM_WORLD); printf(" In rank %d and host= %s Do Barrier call 3.\n",myrank,hoster); MPI_Barrier(MPI_COMM_WORLD); MPI_Finalize(); exit(0); } Here are three runs of test program. First with two processes on one host, then with two processes on another host, and finally with one process on each of two hosts. The first two runs are fine but the last run hangs on the second MPI_Barrier. [root@centos MPI]# /usr/local/bin/mpirun -np 2 --host centos a.out In rank 0 and host= centos Do Barrier call 1. In rank 1 and host= centos Do Barrier call 1. In rank 1 and host= centos Do Barrier call 2. In rank 1 and host= centos Do Barrier call 3. In rank 0 and host= centos Do Barrier call 2. In rank 0 and host= centos Do Barrier call 3. [root@centos MPI]# /usr/local/bin/mpirun -np 2 --host RAID a.out /root/.bashrc: line 14: unalias: ls: not found In rank 0 and host= RAID Do Barrier call 1. In rank 0 and host= RAID Do Barrier call 2. In rank 0 and host= RAID Do Barrier call 3. In rank 1 and host= RAID Do Barrier call 1. In rank 1 and host= RAID Do Barrier call 2. In rank 1 and host= RAID Do Barrier call 3. [root@centos MPI]# /usr/local/bin/mpirun -np 2 --host centos,RAID a.out /root/.bashrc: line 14: unalias: ls: not found In rank 0 and host= centos Do Barrier call 1. In rank 0 and host= centos Do Barrier call 2. In rank 1 and host= RAID Do Barrier call 1. In rank 1 and host= RAID Do Barrier call 2. Since it is such a simple test and problem and such a widely used MPI function, it must obviously be an installation or configuration problem. A pstack for each of the hung MPI_Barrier processes on the two machines shows this: [root@centos ~]# pstack 31666 #0 0x003baf0e8ee3 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x7f5de06125eb in epoll_dispatch () from /usr/local/lib/libmpi.so.1 #2 0x7f5de061475a in opal_event_base_loop () from /usr/local/lib/libmpi.so.1 #3 0x7f5de0639229 in opal_progress () from /usr/local/lib/libmpi.so.1 #4 0x7f5de0586f75 in ompi_request_default_wait_all () from /usr/local/lib/libmpi.so.1 #5 0x7f5ddc59565e in ompi_coll_tuned_sendrecv_actual () from /usr/local/lib/openmpi/mca_coll_tuned.so #6 0x7f5ddc59d8ff in ompi_coll_tuned_barrier_intra_two_procs () from /usr/local/lib/openmpi/mca_coll_tuned.so #7 0x7f5de05941c2 in PMPI_Barrier () from /usr/local/lib/libmpi.so.1 #8 0x00400a43 in main () [root@RAID openmpi-1.6.5]# pstack 22167 #0 0x0030302e8ee3 in __epoll_wait_nocancel () from /lib64/libc.so.6 #1 0x7f7ee46885eb in epoll_dispatch () from /usr/local/lib/libmpi.so.1 #2 0x7f7ee468a75a in opal_event_base_loop () from /usr/local/lib/libmpi.so.1 #3 0x7f7ee46af229 in opal_progress () from /usr/local/lib/libmpi.so.1 #4 0x7f7ee45fcf75 in ompi_request_default_wait_all () from /usr/local/lib/libmpi.so.1 #5 0x7f7ee060b65e in ompi_coll_tuned_sendrecv_actual () from /usr/local/lib/openmpi/mca_coll_tuned.so #6 0x7f7ee06138ff in ompi_coll_tuned_barrier_intra_two_procs () from /usr/local/lib/openmpi/mca_coll_tuned.so #7 0x7f7ee460a1c2 in PMPI_Barrier () from /usr/local/lib/libmpi.so.1 #8 0x00400a43 in main () Which looks exactly the same on each machine. Any thoughts or ideas would be greatly appreciated as I am stuck. Clay Kirkland