Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking
When you run without a hostfile, you're likely only running on a single node via shared memory (unless you're running inside a SLURM job, which is unlikely, given the context of your mails). When you're running in SLURM, I'm guessing that you're running across multiple nodes. Are you using TCP as your MPI transport? If so, I would still recommend trying stopping iptables altogether -- /etc/init.d/iptables stop. It might not make a difference, but I've found iptables to be sufficiently complex that it's easier to take that variable out altogether by stopping it to really, really test to see if that's the problem. On Feb 2, 2012, at 9:48 AM, adrian sabou wrote: > Hi, > > I have disabled iptables on all nodes using: > > iptables -F > iptables -X > iptables -t nat -F > iptables -t nat -X > iptables -t mangle -F > iptables -t mangle -X > iptables -P INPUT ACCEPT > iptables -P FORWARD ACCEPT > iptables -P OUTPUT ACCEPT > > My problem is still there. I have re-enabled iptables. The current output of > the "iptables --list" command is: > > Chain INPUT (policy ACCEPT) > target prot opt source destination > ACCEPT udp -- anywhere anywhereudp dpt:domain > ACCEPT tcp -- anywhere anywheretcp dpt:domain > ACCEPT udp -- anywhere anywhereudp dpt:bootps > ACCEPT tcp -- anywhere anywheretcp dpt:bootps > Chain FORWARD (policy ACCEPT) > target prot opt source destination > ACCEPT all -- anywhere 192.168.122.0/24state > RELATED,ESTABLISHED > ACCEPT all -- 192.168.122.0/24 anywhere > ACCEPT all -- anywhere anywhere > REJECT all -- anywhere anywherereject-with > icmp-port-unreachable > REJECT all -- anywhere anywherereject-with > icmp-port-unreachable > Chain OUTPUT (policy ACCEPT) > target prot opt source destination > I don't think this is it. I have tried to run a simple ping-pong program that > I found (keeps bouncing a value between two processes) and I keep getting the > same results : the first Send / Receive pairs (p1 sends to p2, p2 receives > and sends back to p1, p1 receives) work and after that the program just > blocks. However, like all other examples, the example works if I launch it > with "mpirun -np 2 " and bounces the value 100 times. > > Adrian > From: Jeff Squyres> To: adrian sabou ; Open MPI Users > > Sent: Thursday, February 2, 2012 3:09 PM > Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking > > Have you disabled iptables (firewalling) on your nodes? > > Or, if you want to leave iptables enabled, set it such that all nodes in your > cluster are allowed to open TCP connections from any port to any other port. > > > > > On Feb 2, 2012, at 4:49 AM, adrian sabou wrote: > > > Hi, > > > > The only example that works is hello_c.c. All others (that use MPI_Send and > > MPI_Recv)(connectivity_c.c and ring_c.c) block after the first MPI_Send / > > MPI_Recv (although the first Send/Receive pair works well for all > > processes, subsequent Send/Receive pairs block). My slurm version is 2.1.0. > > It is also worth mentioning that all examples work when not using SLURM > > (launching with "mpirun -np 5 "). Blocking occurs only when I > > try to run on multiple hosts with SLURM ("salloc -N5 mpirun "). > > > > Adrian > > > > From: Jeff Squyres > > To: adrian sabou ; Open MPI Users > > > > Sent: Wednesday, February 1, 2012 10:32 PM > > Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking > > > > On Jan 31, 2012, at 11:16 AM, adrian sabou wrote: > > > > > Like I said, a very simple program. > > > When launching this application with SLURM (using "salloc -N2 mpirun > > > ./"), it hangs at the barrier. > > > > Are you able to run the MPI example programs in examples/ ? > > > > > However, it passes the barrier if I launch it without SLURM (using > > > "mpirun -np 2 ./"). I first noticed this problem when my > > > application hanged if I tried to send two successive messages from a > > > process to another. Only the first MPI_Send would work. The second > > > MPI_Send would block indefinitely. I was wondering whether any of you > > > have encountered a similar problem, or may have an ideea as to what is > > > causing the Send/Receive pair to block when using SLURM. The exact output > > > in my console is as follows: > > > > > >salloc: Granted job allocation 1138 > > >Process 0 - Sending... > > >Process 1 - Receiving... > > >Process 1 - Received. > > >Process 1 - Barrier reached. > > >Process 0 - Sent. > > >Process 0 - Barrier reached. > > >(it just hangs here) >
Re: [OMPI users] Using physical numbering in a rankfile
Actually, that's not true - the 1.5 series technically still supports assignment to physical cpus. However, it is never really tested and very unusual for someone to use, so I suspect it is broken. I very much doubt anyone will fix it. Also, be aware that physical cpu assignments are not supported in the current developer trunk. This will likely be the case when it is released as the 1.7 series and going forward. On Feb 2, 2012, at 10:17 AM, teng ma wrote: > Just remove p in your rankfile like > > rank 0=host1 slot=0:0 > rank 1=host1 slot=0:2 > rank 2=host1 slot=0:4 > rank 3=host1 slot=0:6 > rank 4=host1 slot=1:1 > rank 5=host1 slot=1:3 > rank 6=host1 slot=1:5 > rank 7=host1 slot=1:7 > > Teng > > 2012/2/2 François Tessier> Hello, > > I need to use a rankfile with openMPI 1.5.4 to do some tests on a basic > architecture. I'm using a node for which lstopo returns that : > > > Machine (24GB) > NUMANode L#0 (P#0 12GB) > Socket L#0 + L3 L#0 (8192KB) > L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) > L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2) > L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4) > L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) > HostBridge L#0 > PCIBridge > PCI 8086:10c9 > Net L#0 "eth0" > PCI 8086:10c9 > Net L#1 "eth1" > PCIBridge > PCI 15b3:673c > Net L#2 "ib0" > Net L#3 "ib1" > OpenFabrics L#4 "mlx4_0" > PCIBridge > PCI 102b:0522 > PCI 8086:3a22 > Block L#5 "sda" > Block L#6 "sdb" > Block L#7 "sdc" > Block L#8 "sdd" > NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB) > L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) > L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3) > L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5) > L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7) > > > And I would like to use the physical numbering. To do that, I created a > rankfile like this : > > rank 0=host1 slot=p0:0 > rank 1=host1 slot=p0:2 > rank 2=host1 slot=p0:4 > rank 3=host1 slot=p0:6 > rank 4=host1 slot=p1:1 > rank 5=host1 slot=p1:3 > rank 6=host1 slot=p1:5 > rank 7=host1 slot=p1:7 > > But when I run my job with "mpiexec -np 8 --rankfile rankfile ./foo", I > encounter this error : > > Specified slot list: p0:4 > Error: Not found > > This could mean that a non-existent processor was specified, or > that the specification had improper syntax. > > > Do you know what I did wrong? > > Best regards, > > François > > -- > ___ > François TESSIER > PhD Student at University of Bordeaux > Tel : 0033.5.24.57.41.52 > francois.tess...@inria.fr > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > > > > -- > | Teng Ma Univ. of Tennessee | > | t...@cs.utk.eduKnoxville, TN | > | http://web.eecs.utk.edu/~tma/ | > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users
Re: [OMPI users] Using physical numbering in a rankfile
I made a mistake in the previous reply. You can use two ways here like: rank 0=host1 slot=0 rank 1=host1 slot=2 rank 2=host1 slot=4 rank 3=host1 slot=6 rank 4=host1 slot=1 rank 5=host1 slot=3 rank 6=host1 slot=5 rank 7=host1 slot=7 or rank 0=host1 slot=0:0 rank 1=host1 slot=0:1 rank 2=host1 slot=0:2 rank 3=host1 slot=0:3 rank 4=host1 slot=1:0 rank 5=host1 slot=1:1 rank 6=host1 slot=1:2 rank 7=host1 slot=1:3 Teng On Thu, Feb 2, 2012 at 12:17 PM, teng mawrote: > Just remove p in your rankfile like > > rank 0=host1 slot=0:0 > rank 1=host1 slot=0:2 > rank 2=host1 slot=0:4 > rank 3=host1 slot=0:6 > rank 4=host1 slot=1:1 > rank 5=host1 slot=1:3 > rank 6=host1 slot=1:5 > rank 7=host1 slot=1:7 > > Teng > > 2012/2/2 François Tessier > >> Hello, >> >> I need to use a rankfile with openMPI 1.5.4 to do some tests on a basic >> architecture. I'm using a node for which lstopo returns that : >> >> >> Machine (24GB) >> NUMANode L#0 (P#0 12GB) >> Socket L#0 + L3 L#0 (8192KB) >> L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) >> L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2) >> L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4) >> L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) >> HostBridge L#0 >> PCIBridge >> PCI 8086:10c9 >> Net L#0 "eth0" >> PCI 8086:10c9 >> Net L#1 "eth1" >> PCIBridge >> PCI 15b3:673c >> Net L#2 "ib0" >> Net L#3 "ib1" >> OpenFabrics L#4 "mlx4_0" >> PCIBridge >> PCI 102b:0522 >> PCI 8086:3a22 >> Block L#5 "sda" >> Block L#6 "sdb" >> Block L#7 "sdc" >> Block L#8 "sdd" >> NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB) >> L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) >> L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3) >> L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5) >> L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7) >> >> >> And I would like to use the physical numbering. To do that, I created a >> rankfile like this : >> >> rank 0=host1 slot=p0:0 >> rank 1=host1 slot=p0:2 >> rank 2=host1 slot=p0:4 >> rank 3=host1 slot=p0:6 >> rank 4=host1 slot=p1:1 >> rank 5=host1 slot=p1:3 >> rank 6=host1 slot=p1:5 >> rank 7=host1 slot=p1:7 >> >> But when I run my job with "*mpiexec -np 8 --rankfile rankfile ./foo*", >> I encounter this error : >> >> *Specified slot list: p0:4 >> Error: Not found >> >> This could mean that a non-existent processor was specified, or >> that the specification had improper syntax.* >> >> >> Do you know what I did wrong? >> >> Best regards, >> >> François >> >> -- >> ___ >> François TESSIER >> PhD Student at University of Bordeaux >> Tel : 0033.5.24.57.41.52francois.tess...@inria.fr >> >> >> >> ___ >> users mailing list >> us...@open-mpi.org >> http://www.open-mpi.org/mailman/listinfo.cgi/users >> > > > > -- > | Teng Ma Univ. of Tennessee | > | t...@cs.utk.eduKnoxville, TN | > | http://web.eecs.utk.edu/~tma/ | > -- | Teng Ma Univ. of Tennessee | | t...@cs.utk.eduKnoxville, TN | | http://web.eecs.utk.edu/~tma/ |
Re: [OMPI users] Using physical numbering in a rankfile
Just remove p in your rankfile like rank 0=host1 slot=0:0 rank 1=host1 slot=0:2 rank 2=host1 slot=0:4 rank 3=host1 slot=0:6 rank 4=host1 slot=1:1 rank 5=host1 slot=1:3 rank 6=host1 slot=1:5 rank 7=host1 slot=1:7 Teng 2012/2/2 François Tessier> Hello, > > I need to use a rankfile with openMPI 1.5.4 to do some tests on a basic > architecture. I'm using a node for which lstopo returns that : > > > Machine (24GB) > NUMANode L#0 (P#0 12GB) > Socket L#0 + L3 L#0 (8192KB) > L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) > L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2) > L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4) > L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) > HostBridge L#0 > PCIBridge > PCI 8086:10c9 > Net L#0 "eth0" > PCI 8086:10c9 > Net L#1 "eth1" > PCIBridge > PCI 15b3:673c > Net L#2 "ib0" > Net L#3 "ib1" > OpenFabrics L#4 "mlx4_0" > PCIBridge > PCI 102b:0522 > PCI 8086:3a22 > Block L#5 "sda" > Block L#6 "sdb" > Block L#7 "sdc" > Block L#8 "sdd" > NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB) > L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) > L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3) > L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5) > L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7) > > > And I would like to use the physical numbering. To do that, I created a > rankfile like this : > > rank 0=host1 slot=p0:0 > rank 1=host1 slot=p0:2 > rank 2=host1 slot=p0:4 > rank 3=host1 slot=p0:6 > rank 4=host1 slot=p1:1 > rank 5=host1 slot=p1:3 > rank 6=host1 slot=p1:5 > rank 7=host1 slot=p1:7 > > But when I run my job with "*mpiexec -np 8 --rankfile rankfile ./foo*", I > encounter this error : > > *Specified slot list: p0:4 > Error: Not found > > This could mean that a non-existent processor was specified, or > that the specification had improper syntax.* > > > Do you know what I did wrong? > > Best regards, > > François > > -- > ___ > François TESSIER > PhD Student at University of Bordeaux > Tel : 0033.5.24.57.41.52francois.tess...@inria.fr > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users > -- | Teng Ma Univ. of Tennessee | | t...@cs.utk.eduKnoxville, TN | | http://web.eecs.utk.edu/~tma/ |
[OMPI users] Using physical numbering in a rankfile
Hello, I need to use a rankfile with openMPI 1.5.4 to do some tests on a basic architecture. I'm using a node for which lstopo returns that : Machine (24GB) NUMANode L#0 (P#0 12GB) Socket L#0 + L3 L#0 (8192KB) L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0 + PU L#0 (P#0) L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1 + PU L#1 (P#2) L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2 + PU L#2 (P#4) L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3 + PU L#3 (P#6) HostBridge L#0 PCIBridge PCI 8086:10c9 Net L#0 "eth0" PCI 8086:10c9 Net L#1 "eth1" PCIBridge PCI 15b3:673c Net L#2 "ib0" Net L#3 "ib1" OpenFabrics L#4 "mlx4_0" PCIBridge PCI 102b:0522 PCI 8086:3a22 Block L#5 "sda" Block L#6 "sdb" Block L#7 "sdc" Block L#8 "sdd" NUMANode L#1 (P#1 12GB) + Socket L#1 + L3 L#1 (8192KB) L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4 + PU L#4 (P#1) L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5 + PU L#5 (P#3) L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6 + PU L#6 (P#5) L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7 + PU L#7 (P#7) And I would like to use the physical numbering. To do that, I created a rankfile like this : rank 0=host1 slot=p0:0 rank 1=host1 slot=p0:2 rank 2=host1 slot=p0:4 rank 3=host1 slot=p0:6 rank 4=host1 slot=p1:1 rank 5=host1 slot=p1:3 rank 6=host1 slot=p1:5 rank 7=host1 slot=p1:7 But when I run my job with "/mpiexec -np 8 --rankfile rankfile ./foo/", I encounter this error : /Specified slot list: p0:4 Error: Not found This could mean that a non-existent processor was specified, or that the specification had improper syntax./ Do you know what I did wrong? Best regards, François -- ___ François TESSIER PhD Student at University of Bordeaux Tel : 0033.5.24.57.41.52 francois.tess...@inria.fr
Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking
Hi, I have disabled iptables on all nodes using: iptables -F iptables -X iptables -t nat -F iptables -t nat -X iptables -t mangle -F iptables -t mangle -X iptables -P INPUT ACCEPT iptables -P FORWARD ACCEPT iptables -P OUTPUT ACCEPT My problem is still there. I have re-enabled iptables. The current output of the "iptables --list" command is: Chain INPUT (policy ACCEPT) target prot opt source destination ACCEPT udp -- anywhere anywhere udp dpt:domain ACCEPT tcp -- anywhere anywhere tcp dpt:domain ACCEPT udp -- anywhere anywhere udp dpt:bootps ACCEPT tcp -- anywhere anywhere tcp dpt:bootps Chain FORWARD (policy ACCEPT) target prot opt source destination ACCEPT all -- anywhere 192.168.122.0/24 state RELATED,ESTABLISHED ACCEPT all -- 192.168.122.0/24 anywhere ACCEPT all -- anywhere anywhere REJECT all -- anywhere anywhere reject-with icmp-port-unreachable REJECT all -- anywhere anywhere reject-with icmp-port-unreachable Chain OUTPUT (policy ACCEPT) target prot opt source destination I don't think this is it. I have tried to run a simple ping-pong program that I found (keeps bouncing a value between two processes) and I keep getting the same results : the first Send / Receive pairs (p1 sends to p2, p2 receives and sends back to p1, p1 receives) work and after that the program just blocks. However, like all other examples, the example works if I launch it with "mpirun -np 2 " and bounces the value 100 times. Adrian From: Jeff SquyresTo: adrian sabou ; Open MPI Users Sent: Thursday, February 2, 2012 3:09 PM Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking Have you disabled iptables (firewalling) on your nodes? Or, if you want to leave iptables enabled, set it such that all nodes in your cluster are allowed to open TCP connections from any port to any other port. On Feb 2, 2012, at 4:49 AM, adrian sabou wrote: > Hi, > > The only example that works is hello_c.c. All others (that use MPI_Send and > MPI_Recv)(connectivity_c.c and ring_c.c) block after the first MPI_Send / > MPI_Recv (although the first Send/Receive pair works well for all processes, > subsequent Send/Receive pairs block). My slurm version is 2.1.0. It is also > worth mentioning that all examples work when not using SLURM (launching with > "mpirun -np 5 "). Blocking occurs only when I try to run on > multiple hosts with SLURM ("salloc -N5 mpirun "). > > Adrian > > From: Jeff Squyres > To: adrian sabou ; Open MPI Users > > Sent: Wednesday, February 1, 2012 10:32 PM > Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking > > On Jan 31, 2012, at 11:16 AM, adrian sabou wrote: > > > Like I said, a very simple program. > > When launching this application with SLURM (using "salloc -N2 mpirun > > ./"), it hangs at the barrier. > > Are you able to run the MPI example programs in examples/ ? > > > However, it passes the barrier if I launch it without SLURM (using "mpirun > > -np 2 ./"). I first noticed this problem when my application hanged > > if I tried to send two successive messages from a process to another. Only > > the first MPI_Send would work. The second MPI_Send would block > > indefinitely. I was wondering whether any of you have encountered a similar > > problem, or may have an ideea as to what is causing the Send/Receive pair > > to block when using SLURM. The exact output in my console is as follows: > > > > salloc: Granted job allocation 1138 > > Process 0 - Sending... > > Process 1 - Receiving... > > Process 1 - Received. > > Process 1 - Barrier reached. > > Process 0 - Sent. > > Process 0 - Barrier reached. > > (it just hangs here) > > > > I am new to MPI programming and to OpenMPI and would greatly appreciate any > > help. My OpenMPI version is 1.4.4 (although I have also tried it on 1.5.4), > > my SLURM version is 0.3.3-1 (slurm-llnl 2.1.0-1), > > I'm not sure what SLURM version that is -- my "srun --version" shows 2.2.4. > 0.3.3 would be pretty ancient, no? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking
Have you disabled iptables (firewalling) on your nodes? Or, if you want to leave iptables enabled, set it such that all nodes in your cluster are allowed to open TCP connections from any port to any other port. On Feb 2, 2012, at 4:49 AM, adrian sabou wrote: > Hi, > > The only example that works is hello_c.c. All others (that use MPI_Send and > MPI_Recv)(connectivity_c.c and ring_c.c) block after the first MPI_Send / > MPI_Recv (although the first Send/Receive pair works well for all processes, > subsequent Send/Receive pairs block). My slurm version is 2.1.0. It is also > worth mentioning that all examples work when not using SLURM (launching with > "mpirun -np 5 "). Blocking occurs only when I try to run on > multiple hosts with SLURM ("salloc -N5 mpirun "). > > Adrian > > From: Jeff Squyres> To: adrian sabou ; Open MPI Users > > Sent: Wednesday, February 1, 2012 10:32 PM > Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking > > On Jan 31, 2012, at 11:16 AM, adrian sabou wrote: > > > Like I said, a very simple program. > > When launching this application with SLURM (using "salloc -N2 mpirun > > ./"), it hangs at the barrier. > > Are you able to run the MPI example programs in examples/ ? > > > However, it passes the barrier if I launch it without SLURM (using "mpirun > > -np 2 ./"). I first noticed this problem when my application hanged > > if I tried to send two successive messages from a process to another. Only > > the first MPI_Send would work. The second MPI_Send would block > > indefinitely. I was wondering whether any of you have encountered a similar > > problem, or may have an ideea as to what is causing the Send/Receive pair > > to block when using SLURM. The exact output in my console is as follows: > > > >salloc: Granted job allocation 1138 > >Process 0 - Sending... > >Process 1 - Receiving... > >Process 1 - Received. > >Process 1 - Barrier reached. > >Process 0 - Sent. > >Process 0 - Barrier reached. > >(it just hangs here) > > > > I am new to MPI programming and to OpenMPI and would greatly appreciate any > > help. My OpenMPI version is 1.4.4 (although I have also tried it on 1.5.4), > > my SLURM version is 0.3.3-1 (slurm-llnl 2.1.0-1), > > I'm not sure what SLURM version that is -- my "srun --version" shows 2.2.4. > 0.3.3 would be pretty ancient, no? > > -- > Jeff Squyres > jsquy...@cisco.com > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > > ___ > users mailing list > us...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/users -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [OMPI users] Error building Openmpi (configure: error: C compiler cannot create executables)
Both icc and gcc seem to be broken on your system; they're not creating executables. You can look in config.log for more details about what is failing. But basically, configure is trying to compile a simple "hello world"-like C program, and it's failing. You might want to try trying to compile a simple "hello world"-like C program yourself and see what error messages are coming out of the compiler. On Feb 2, 2012, at 5:19 AM, Syed Ahsan Ali wrote: > Dear All, > > I have been stuck in installation of openmpi1.4.2 on RHEL5.2 with ifort and > icc.I get following error while configuring, Please help. > > > [precis@precis2 openmpi-1.4.2]$ ./build.sh > checking for a BSD-compatible install... /usr/bin/install -c > checking whether build environment is sane... yes > checking for a thread-safe mkdir -p... /bin/mkdir -p > checking for gawk... gawk > checking whether make sets $(MAKE)... yes > checking how to create a ustar tar archive... gnutar > > > == Configuring Open MPI > > > *** Checking versions > checking Open MPI version... 1.4.2 > checking Open MPI release date... May 04, 2010 > checking Open MPI Subversion repository version... r23093 > checking Open Run-Time Environment version... 1.4.2 > checking Open Run-Time Environment release date... May 04, 2010 > checking Open Run-Time Environment Subversion repository version... r23093 > checking Open Portable Access Layer version... 1.4.2 > checking Open Portable Access Layer release date... May 04, 2010 > checking Open Portable Access Layer Subversion repository version... r23093 > > *** Initialization, setup > configure: builddir: /home/precis/opemmpi/openmpi-1.4.2 > configure: srcdir: /home/precis/opemmpi/openmpi-1.4.2 > checking build system type... i686-pc-linux-gnu > checking host system type... i686-pc-linux-gnu > installing to directory "/home/openmpi" > > *** Configuration options > checking whether to run code coverage... no > checking whether to compile with branch probabilities... no > checking whether to debug memory usage... no > checking whether to profile memory usage... no > checking if want developer-level compiler pickyness... no > checking if want developer-level debugging code... no > checking if want sparse process groups... no > checking if want Fortran 77 bindings... yes > checking if want Fortran 90 bindings... yes > checking desired Fortran 90 bindings "size"... small > checking whether to enable PMPI... yes > checking if want C++ bindings... yes > checking if want MPI::SEEK_SET support... yes > checking if want to enable weak symbol support... yes > checking if want run-time MPI parameter checking... runtime > checking if want to install OMPI header files... no > checking if want pretty-print stacktrace... yes > checking if peruse support is required... no > checking max supported array dimension in F90 MPI bindings... 4 > checking if pty support should be enabled... yes > checking if user wants dlopen support... yes > checking if heterogeneous support should be enabled... no > checking if want trace file debugging... no > checking if want full RTE support... yes > checking if want fault tolerance... Disabled fault tolerance > checking if want IPv6 support... yes (if underlying system supports it) > checking if want orterun "--prefix" behavior to be enabled by default... no > checking for package/brand string... Open MPI pre...@precis2.pakmet.com.pk > Distribution > checking for ident string... 1.4.2 > checking whether to add padding to the openib control header... no > checking whether to use an alternative checksum algo for messages... no > > > == Compiler and preprocessor tests > > > *** C compiler and preprocessor > checking for style of include used by make... GNU > checking for gcc... icc > checking for C compiler default output file name... > configure: error: in `/home/precis/opemmpi/openmpi-1.4.2': > configure: error: C compiler cannot create executables > See `config.log' for more details. > make: *** No targets specified and no makefile found. Stop. > make: *** No rule to make target `install'. Stop. > [precis@precis2 openmpi-1.4.2]$ clean all > bash: clean: command not found > [precis@precis2 openmpi-1.4.2]$ make clean > make: *** No rule to make target `clean'. Stop. > [precis@precis2 openmpi-1.4.2]$ make > make: *** No targets specified and no makefile found. Stop. > [precis@precis2 openmpi-1.4.2]$ ./configure > checking for a BSD-compatible install... /usr/bin/install -c > checking whether build environment is sane... yes > checking for a thread-safe mkdir -p... /bin/mkdir -p > checking for gawk... gawk > checking whether make sets $(MAKE)... yes > checking how to create a ustar tar
[OMPI users] Error building Openmpi (configure: error: C compiler cannot create executables)
Dear All, I have been stuck in installation of openmpi1.4.2 on RHEL5.2 with ifort and icc.I get following error while configuring, Please help. [precis@precis2 openmpi-1.4.2]$ ./build.sh checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking how to create a ustar tar archive... gnutar == Configuring Open MPI *** Checking versions checking Open MPI version... 1.4.2 checking Open MPI release date... May 04, 2010 checking Open MPI Subversion repository version... r23093 checking Open Run-Time Environment version... 1.4.2 checking Open Run-Time Environment release date... May 04, 2010 checking Open Run-Time Environment Subversion repository version... r23093 checking Open Portable Access Layer version... 1.4.2 checking Open Portable Access Layer release date... May 04, 2010 checking Open Portable Access Layer Subversion repository version... r23093 *** Initialization, setup configure: builddir: /home/precis/opemmpi/openmpi-1.4.2 configure: srcdir: /home/precis/opemmpi/openmpi-1.4.2 checking build system type... i686-pc-linux-gnu checking host system type... i686-pc-linux-gnu installing to directory "/home/openmpi" *** Configuration options checking whether to run code coverage... no checking whether to compile with branch probabilities... no checking whether to debug memory usage... no checking whether to profile memory usage... no checking if want developer-level compiler pickyness... no checking if want developer-level debugging code... no checking if want sparse process groups... no checking if want Fortran 77 bindings... yes checking if want Fortran 90 bindings... yes checking desired Fortran 90 bindings "size"... small checking whether to enable PMPI... yes checking if want C++ bindings... yes checking if want MPI::SEEK_SET support... yes checking if want to enable weak symbol support... yes checking if want run-time MPI parameter checking... runtime checking if want to install OMPI header files... no checking if want pretty-print stacktrace... yes checking if peruse support is required... no checking max supported array dimension in F90 MPI bindings... 4 checking if pty support should be enabled... yes checking if user wants dlopen support... yes checking if heterogeneous support should be enabled... no checking if want trace file debugging... no checking if want full RTE support... yes checking if want fault tolerance... Disabled fault tolerance checking if want IPv6 support... yes (if underlying system supports it) checking if want orterun "--prefix" behavior to be enabled by default... no checking for package/brand string... Open MPI pre...@precis2.pakmet.com.pkDistribution checking for ident string... 1.4.2 checking whether to add padding to the openib control header... no checking whether to use an alternative checksum algo for messages... no == Compiler and preprocessor tests *** C compiler and preprocessor checking for style of include used by make... GNU checking for gcc... icc checking for C compiler default output file name... configure: error: in `/home/precis/opemmpi/openmpi-1.4.2': configure: error: C compiler cannot create executables See `config.log' for more details. make: *** No targets specified and no makefile found. Stop. make: *** No rule to make target `install'. Stop. [precis@precis2 openmpi-1.4.2]$ clean all bash: clean: command not found [precis@precis2 openmpi-1.4.2]$ make clean make: *** No rule to make target `clean'. Stop. [precis@precis2 openmpi-1.4.2]$ make make: *** No targets specified and no makefile found. Stop. [precis@precis2 openmpi-1.4.2]$ ./configure checking for a BSD-compatible install... /usr/bin/install -c checking whether build environment is sane... yes checking for a thread-safe mkdir -p... /bin/mkdir -p checking for gawk... gawk checking whether make sets $(MAKE)... yes checking how to create a ustar tar archive... gnutar == Configuring Open MPI *** Checking versions checking Open MPI version... 1.4.2 checking Open MPI release date... May 04, 2010 checking Open MPI Subversion repository version... r23093 checking Open Run-Time Environment version... 1.4.2 checking Open Run-Time Environment release date... May 04, 2010 checking Open Run-Time Environment Subversion repository version... r23093 checking Open Portable Access Layer version... 1.4.2 checking Open Portable Access Layer release date... May 04,
Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking
Hi, The only example that works is hello_c.c. All others (that use MPI_Send and MPI_Recv)(connectivity_c.c and ring_c.c) block after the first MPI_Send / MPI_Recv (although the first Send/Receive pair works well for all processes, subsequent Send/Receive pairs block). My slurm version is 2.1.0. It is also worth mentioning that all examples work when not using SLURM (launching with "mpirun -np 5 "). Blocking occurs only when I try to run on multiple hosts with SLURM ("salloc -N5 mpirun "). Adrian From: Jeff SquyresTo: adrian sabou ; Open MPI Users Sent: Wednesday, February 1, 2012 10:32 PM Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking On Jan 31, 2012, at 11:16 AM, adrian sabou wrote: > Like I said, a very simple program. > When launching this application with SLURM (using "salloc -N2 mpirun > ./"), it hangs at the barrier. Are you able to run the MPI example programs in examples/ ? > However, it passes the barrier if I launch it without SLURM (using "mpirun > -np 2 ./"). I first noticed this problem when my application hanged > if I tried to send two successive messages from a process to another. Only > the first MPI_Send would work. The second MPI_Send would block indefinitely. > I was wondering whether any of you have encountered a similar problem, or may > have an ideea as to what is causing the Send/Receive pair to block when using > SLURM. The exact output in my console is as follows: > > salloc: Granted job allocation 1138 > Process 0 - Sending... > Process 1 - Receiving... > Process 1 - Received. > Process 1 - Barrier reached. > Process 0 - Sent. > Process 0 - Barrier reached. > (it just hangs here) > > I am new to MPI programming and to OpenMPI and would greatly appreciate any > help. My OpenMPI version is 1.4.4 (although I have also tried it on 1.5.4), > my SLURM version is 0.3.3-1 (slurm-llnl 2.1.0-1), I'm not sure what SLURM version that is -- my "srun --version" shows 2.2.4. 0.3.3 would be pretty ancient, no? -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/