Re: [OMPI users] openmpi failed the hello world test

2012-09-25 Thread Ralph Castain
Try installing the latest 1.6.2 release candidate and see if it still happens. I can't replicate the behavior on system. On Sep 24, 2012, at 7:02 PM, "Liu, Hanzhong" wrote: > Thanks for your reply. The command I used is "mpirun -np 4 ./hello" > > From:

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread Ralph Castain
On Sep 24, 2012, at 6:13 PM, Mariana Vargas Magana wrote: > > > Yes you are right this is what it says but if fact the weird thing is that > not all times the error message appears….I send to 20 nodes and only one > gives this message, is this normal… Yes - that is precisely the behavior y

[OMPI users] mpi job is blocked

2012-09-25 Thread Richard
I have 3 computers with the same Linux system. I have setup the mpi cluster based on ssh connection. I have tested a very simple mpi program, it works on the cluster. To make my story clear, I name the three computer as A, B and C. 1) If I run the job with 2 processes on A and B, it works. 2)

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread jody
Hi Richard When a collective call hangs, this usually means that one (or more) processes did not reach this command. Are you sure that all processes reach the allreduce statement? If something like this happens to me, i insert print statements just before the MPI-call so i can see which processes

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread Jeff Squyres
+1 Additionally, if you're trying to debug your machines/network/setup, you might want to use something simpler, like the ring programs in the examples/ directory. On Sep 25, 2012, at 9:43 AM, jody wrote: > Hi Richard > > When a collective call hangs, this usually means that one (or more) >

Re: [OMPI users] Algorithms used in MPI_BCast

2012-09-25 Thread Iliev, Hristo
Hello, At the moment I'm building them manually but I am also thinking of a measurement framework to do it automatically in the future. Best regards, Hristo -- Hristo Iliev, Ph.D. -- High Performance Computing RWTH Aachen University, Center for Computing and Communication Rechen- und Kom

Re: [OMPI users] openmpi failed the hello world test

2012-09-25 Thread Jeff Squyres
Also, please send all the information listed here: http://www.open-mpi.org/community/help/ On Sep 25, 2012, at 6:09 AM, Ralph Castain wrote: > Try installing the latest 1.6.2 release candidate and see if it still > happens. I can't replicate the behavior on system. > > > On Sep 24, 2012,

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread Richard
Hi Jody,thanks for your suggestion and you are right. if I use the ring example, the same happened.I have put a printf statement, it seems that all the three processed have reached the line calling "PMPI_Allreduce", any further suggestion? Thanks. Richard Message: 12 List-Post: users@list

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread Richard
if I tried the ring program, the first round of pass is fine, but the second round is blocked at some node. here is the message printed out Process 0 sending 10 to 1, tag 201 (3 processes in ring) Process 0 sent to 1 rank 1, message 10,start=== rank 1, message 10,end- rank 2

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread Richard
sometimes the following message jumped out when I run the ring program, but not always. I do not know this ip address 192.168.122.1, it is not in my list of hosts. [[53402,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect] connect() to 192.168.122.1 failed: Connection refuse

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread Jeff Squyres
Hav you disabled firewalls on your nodes (e.g., iptables)? On Sep 25, 2012, at 11:08 AM, Richard wrote: > sometimes the following message jumped out when I run the ring program, but > not always. > I do not know this ip address 192.168.122.1, it is not in my list of hosts. > > > [[53402,1],6]

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread Jeff Squyres
Actually, I didn't read your message closely, enough -- sorry. If you're getting a message about an IP address that is unknown to you, this suggests that there might be something wonky in your network setup. Can you send all the information listed here: http://www.open-mpi.org/community/hel

Re: [OMPI users] mpi job is blocked

2012-09-25 Thread Richard
I used "chkconfig --list iptables ", none of computer is set as "on". At 2012-09-25 17:54:53,"Jeff Squyres" wrote: >Hav you disabled firewalls on your nodes (e.g., iptables)? > >On Sep 25, 2012, at 11:08 AM, Richard wrote: > >> sometimes the following message jumped out when I run the ring pro

[OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Richard
I have setup a small cluster with 3 nodes, named A, B and C respectively. I tested the ring_c.c program in the examples. For debugging purpose, I have added some print statements as follows in the original ring_c.c >> 60 printf("rank %d, message %d,start===\n", rank, message); >>

Re: [OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Jeff Squyres
Ah, I see the problem. See this FAQ entry: http://www.open-mpi.org/faq/?category=tcp#tcp-selection You want to exclude the virbr0 interfaces on your nodes; they're local-only interfaces (that's where the 192.168.122.x addresses are coming from) that, IIRC, have something to do with virtual

Re: [OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Jeff Squyres
On Sep 25, 2012, at 2:28 PM, Jeff Squyres wrote: > mpirun --mca btl_if_exclude virbr0 ... Gah; sorry, that should be: mpirun --mca btl_tcp_if_exclude virbr0 ... I forgot the "tcp" there in the middle. -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.ci

Re: [OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Richard
thanks a lot ! using "--mca btl_if_exclude virbr0" does not work, but you have pointed out the problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I know this is the high speed network interface I should use on each node. At 2012-09-25 20:30:16,"Jeff Squyres" wrote: >On Sep

Re: [OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Jeff Squyres
On Sep 25, 2012, at 2:56 PM, Richard wrote: > thanks a lot ! > using "--mca btl_if_exclude virbr0" does not work, but you have pointed out > the Ya, sorry -- see my second mail, it should be "btl_tcp_if_exclude". > problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I know

Re: [OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Tom Bryan (tombry)
On 9/25/12 9:10 AM, "Jeff Squyres (jsquyres)" wrote: >>problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I >>know this is the high speed network interface I should use on each node. > >Glad it works for you! > >If you're not using those interfaces (they might be related to Xen

Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361

2012-09-25 Thread Siegmar Gross
Hi, > > I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361. > > Why doesn't "mpiexec" start a process on my local machine (it > > is not a matter of Java, because I have the same behaviour when > > I use "hostname")? > > > > tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \ > > j

Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361

2012-09-25 Thread Ralph Castain
On Sep 25, 2012, at 6:45 AM, Siegmar Gross wrote: > Hi, > >>> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361. >>> Why doesn't "mpiexec" start a process on my local machine (it >>> is not a matter of Java, because I have the same behaviour when >>> I use "hostname")? >>> >>> t

Re: [OMPI users] A question on MPI_Probe

2012-09-25 Thread Jeff Squyres
At the receiver, MPI ordering is only within a specific (source_rank, tag, communicator) tuple. So if you send: M1 (source_rank=A, tag=T1, comm=foo) M2 (source_rank=A, tag=T2, comm=foo) Then those have 2 different tuples, and can be probed/received in a different order in which they were sent.

Re: [OMPI users] bindings not reported and other problems in openmpi-1.7a1r27358

2012-09-25 Thread Jeff Squyres
I will also add that Oracle seems to be fading away from Open MPI; their priorities seem to be shifting, so it's quite possible that Open MPI is experiencing bit rot / lack of testing on Solaris. We already ran into the one issue where process binding is not well supported on Solaris (i.e., you

Re: [OMPI users] static, standalone mpirun

2012-09-25 Thread Jeff Squyres
Sorry for jumping in late in this thread. More below. On Sep 20, 2012, at 1:43 PM, Ilias Miroslav wrote: > I prepared my own static OpenMPI files (mpirun, mpif90...) within > openmpi-1.6.1.tar.gz > > ./configure --prefix= --without-memory-manager CXX=icpc CC=icc > F77=ifort FC=ifort LDFLAGS=

Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361

2012-09-25 Thread Siegmar Gross
Hi, the environment is OK now (see below). Thank you very much for your help. > >>> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361. > >>> Why doesn't "mpiexec" start a process on my local machine (it > >>> is not a matter of Java, because I have the same behaviour when > >>> I us

[OMPI users] Memchecker failure with empty struct type

2012-09-25 Thread Jeremiah Willcock
The following C program: #include int main(int argc, char** argv) { int blocklengths; MPI_Aint displacements; MPI_Datatype types, dt; int x; MPI_Init(&argc, &argv); MPI_Type_struct(0, &blocklengths, &displacements, &types, &dt); MPI_Type_commit(&dt); MPI_Send(&x, 1, dt, MPI_PROC

Re: [OMPI users] problem with 32-bit mpiJava on openmpi-1.9a1r27361

2012-09-25 Thread Jeff Squyres
On Sep 25, 2012, at 5:59 PM, Siegmar Gross wrote: > I have had "--enable-orterun-prefix-by-default" in my configure > command. I removed it and rebuilt the package and now the environment > is OK. Tommorrow I will run some tests and also try to get the > information about the topology for our M400

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread mariana Vargas
Hi I fact I found what is the origin of this problem and it is because all processes have rank 0, so I tested and in effect even when I send the clasical Hello.py give the same, how can I solved this?? Do I re installed every again??? Help please... Mariana On Sep 24, 2012, at 9:13 P

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread Ralph Castain
The usual reason for this is that you aren't launching these processes correctly. How are you starting your job? Are you using mpirun? On Sep 25, 2012, at 1:43 PM, mariana Vargas wrote: > Hi > > I fact I found what is the origin of this problem and it is because all > processes have rank 0,

Re: [OMPI users] Memchecker failure with empty struct type

2012-09-25 Thread Ralph Castain
IIRC, we found a configure "bug" that allowed you to enable-memchecker without also including the required --with-valgrind. You might try again with 1.6.2, which includes the change - and be sure to add the extra configure flag. On Sep 25, 2012, at 12:04 PM, Jeremiah Willcock wrote: > The fol

Re: [OMPI users] Memchecker failure with empty struct type

2012-09-25 Thread Jeremiah Willcock
My config.log shows that it found Valgrind even though I didn't specify --with-valgrind. It looks like the issue is in the datatype creation code; looking at the data structure shows unusual values for true_ub and true_lb: {super = {super = {obj_magic_id = 16046253926196952813, obj_class = 0

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread Mariana Vargas Magana
Hi I think I'am not understanding what you said , here is the hello.py and next the command mpirun… Thanks! #!/usr/bin/env python """ Parallel Hello World """ from mpi4py import MPI import sys size = MPI.COMM_WORLD.Get_size() rank = MPI.COMM_WORLD.Get_rank() name = MPI.Get_processor_name() sy

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread Ralph Castain
You forgot to call MPI_Init at the beginning of your program. On Sep 25, 2012, at 2:08 PM, Mariana Vargas Magana wrote: > Hi > I think I'am not understanding what you said , here is the hello.py and next > the command mpirun… > > Thanks! > > #!/usr/bin/env python > """ > Parallel Hello World

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread mariana Vargas
MPI_init() is actually called when import MPI module from MPi package... On Sep 25, 2012, at 5:17 PM, Ralph Castain wrote: You forgot to call MPI_Init at the beginning of your program. On Sep 25, 2012, at 2:08 PM, Mariana Vargas Magana > wrote: Hi I think I'am not understanding what you sa

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread Ralph Castain
I don't think that is true, but I suggest you check the mpi4py examples. I believe all import does is import function definitions - it doesn't execute anything. Sent from my iPad On Sep 25, 2012, at 2:41 PM, mariana Vargas wrote: > MPI_init() is actually called when import MPI module from MPi

Re: [OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Richard
Jeff, It was a typo in my last post, I did use "--mca btl_tcp_if_exclude virbr0" and it did not work. At 2012-09-25 21:10:24,"Jeff Squyres" wrote: >On Sep 25, 2012, at 2:56 PM, Richard wrote: > >> thanks a lot ! >> using "--mca btl_if_exclude virbr0" does not work, but you have pointed out

Re: [OMPI users] mpi test program "ring" failed: blocked at MPI_Send

2012-09-25 Thread Richard
Tom might be correct, I checked my system. Using rpm -qa, I did not find Xen, but found libvirt. At 2012-09-25 21:38:23,"Tom Bryan (tombry)" wrote: >On 9/25/12 9:10 AM, "Jeff Squyres (jsquyres)" wrote: > >>>problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I >>>know this is

Re: [OMPI users] Problem runing MPI on cluster

2012-09-25 Thread Mariana Vargas Magana
Yes I am sure I read from a mpi4py guide I already check the examples if fact this an example extracted from a guide…!! Evenmore this example if I use with mpich2 it runs very nicely, even though for the other code I need openmpi working =s Mariana On Sep 25, 2012, at 8:00 PM, Ralph Castain

[OMPI users] error from MPI_Allgather

2012-09-25 Thread Rajesh J
Hello, I have some error while using mpirun. Could anyone please help me solve this. I googled this and found some, but too technical for me since I am not so familiar with mpi programs. Is this due to some installation problem or the program which I run? Fatal error in PMPI_Allgather: Invalid buf