Try installing the latest 1.6.2 release candidate and see if it still happens.
I can't replicate the behavior on system.
On Sep 24, 2012, at 7:02 PM, "Liu, Hanzhong" wrote:
> Thanks for your reply. The command I used is "mpirun -np 4 ./hello"
>
> From:
On Sep 24, 2012, at 6:13 PM, Mariana Vargas Magana
wrote:
>
>
> Yes you are right this is what it says but if fact the weird thing is that
> not all times the error message appears….I send to 20 nodes and only one
> gives this message, is this normal…
Yes - that is precisely the behavior y
I have 3 computers with the same Linux system. I have setup the mpi cluster
based on ssh connection.
I have tested a very simple mpi program, it works on the cluster.
To make my story clear, I name the three computer as A, B and C.
1) If I run the job with 2 processes on A and B, it works.
2)
Hi Richard
When a collective call hangs, this usually means that one (or more)
processes did not reach this command.
Are you sure that all processes reach the allreduce statement?
If something like this happens to me, i insert print statements just
before the MPI-call so i can see which processes
+1
Additionally, if you're trying to debug your machines/network/setup, you might
want to use something simpler, like the ring programs in the examples/
directory.
On Sep 25, 2012, at 9:43 AM, jody wrote:
> Hi Richard
>
> When a collective call hangs, this usually means that one (or more)
>
Hello,
At the moment I'm building them manually but I am also thinking of a
measurement framework to do it automatically in the future.
Best regards,
Hristo
--
Hristo Iliev, Ph.D. -- High Performance Computing
RWTH Aachen University, Center for Computing and Communication
Rechen- und Kom
Also, please send all the information listed here:
http://www.open-mpi.org/community/help/
On Sep 25, 2012, at 6:09 AM, Ralph Castain wrote:
> Try installing the latest 1.6.2 release candidate and see if it still
> happens. I can't replicate the behavior on system.
>
>
> On Sep 24, 2012,
Hi Jody,thanks for your suggestion and you are right. if I use the ring
example, the same happened.I have put a printf statement, it seems that all the
three processed have reached the line calling "PMPI_Allreduce", any further
suggestion?
Thanks.
Richard
Message: 12
List-Post: users@list
if I tried the ring program, the first round of pass is fine, but the second
round is blocked at some node.
here is the message printed out
Process 0 sending 10 to 1, tag 201 (3 processes in ring)
Process 0 sent to 1
rank 1, message 10,start===
rank 1, message 10,end-
rank 2
sometimes the following message jumped out when I run the ring program, but not
always.
I do not know this ip address 192.168.122.1, it is not in my list of hosts.
[[53402,1],6][btl_tcp_endpoint.c:638:mca_btl_tcp_endpoint_complete_connect]
connect() to 192.168.122.1 failed: Connection refuse
Hav you disabled firewalls on your nodes (e.g., iptables)?
On Sep 25, 2012, at 11:08 AM, Richard wrote:
> sometimes the following message jumped out when I run the ring program, but
> not always.
> I do not know this ip address 192.168.122.1, it is not in my list of hosts.
>
>
> [[53402,1],6]
Actually, I didn't read your message closely, enough -- sorry.
If you're getting a message about an IP address that is unknown to you, this
suggests that there might be something wonky in your network setup.
Can you send all the information listed here:
http://www.open-mpi.org/community/hel
I used "chkconfig --list iptables ", none of computer is set as "on".
At 2012-09-25 17:54:53,"Jeff Squyres" wrote:
>Hav you disabled firewalls on your nodes (e.g., iptables)?
>
>On Sep 25, 2012, at 11:08 AM, Richard wrote:
>
>> sometimes the following message jumped out when I run the ring pro
I have setup a small cluster with 3 nodes, named A, B and C respectively.
I tested the ring_c.c program in the examples. For debugging purpose,
I have added some print statements as follows in the original ring_c.c
>> 60 printf("rank %d, message %d,start===\n", rank, message);
>>
Ah, I see the problem. See this FAQ entry:
http://www.open-mpi.org/faq/?category=tcp#tcp-selection
You want to exclude the virbr0 interfaces on your nodes; they're local-only
interfaces (that's where the 192.168.122.x addresses are coming from) that,
IIRC, have something to do with virtual
On Sep 25, 2012, at 2:28 PM, Jeff Squyres wrote:
> mpirun --mca btl_if_exclude virbr0 ...
Gah; sorry, that should be:
mpirun --mca btl_tcp_if_exclude virbr0 ...
I forgot the "tcp" there in the middle.
--
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.ci
thanks a lot !
using "--mca btl_if_exclude virbr0" does not work, but you have pointed out the
problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I know
this is the high
speed network interface I should use on each node.
At 2012-09-25 20:30:16,"Jeff Squyres" wrote:
>On Sep
On Sep 25, 2012, at 2:56 PM, Richard wrote:
> thanks a lot !
> using "--mca btl_if_exclude virbr0" does not work, but you have pointed out
> the
Ya, sorry -- see my second mail, it should be "btl_tcp_if_exclude".
> problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I know
On 9/25/12 9:10 AM, "Jeff Squyres (jsquyres)" wrote:
>>problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I
>>know this is the high speed network interface I should use on each node.
>
>Glad it works for you!
>
>If you're not using those interfaces (they might be related to Xen
Hi,
> > I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
> > Why doesn't "mpiexec" start a process on my local machine (it
> > is not a matter of Java, because I have the same behaviour when
> > I use "hostname")?
> >
> > tyr java 133 mpiexec -np 3 -host tyr,sunpc4,sunpc1 \
> > j
On Sep 25, 2012, at 6:45 AM, Siegmar Gross
wrote:
> Hi,
>
>>> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
>>> Why doesn't "mpiexec" start a process on my local machine (it
>>> is not a matter of Java, because I have the same behaviour when
>>> I use "hostname")?
>>>
>>> t
At the receiver, MPI ordering is only within a specific (source_rank, tag,
communicator) tuple.
So if you send:
M1 (source_rank=A, tag=T1, comm=foo)
M2 (source_rank=A, tag=T2, comm=foo)
Then those have 2 different tuples, and can be probed/received in a different
order in which they were sent.
I will also add that Oracle seems to be fading away from Open MPI; their
priorities seem to be shifting, so it's quite possible that Open MPI is
experiencing bit rot / lack of testing on Solaris.
We already ran into the one issue where process binding is not well supported
on Solaris (i.e., you
Sorry for jumping in late in this thread. More below.
On Sep 20, 2012, at 1:43 PM, Ilias Miroslav wrote:
> I prepared my own static OpenMPI files (mpirun, mpif90...) within
> openmpi-1.6.1.tar.gz
>
> ./configure --prefix= --without-memory-manager CXX=icpc CC=icc
> F77=ifort FC=ifort LDFLAGS=
Hi,
the environment is OK now (see below). Thank you very much for your
help.
> >>> I tried mpiJava on a 32-bit installation of openmpi-1.9a1r27361.
> >>> Why doesn't "mpiexec" start a process on my local machine (it
> >>> is not a matter of Java, because I have the same behaviour when
> >>> I us
The following C program:
#include
int main(int argc, char** argv) {
int blocklengths;
MPI_Aint displacements;
MPI_Datatype types, dt;
int x;
MPI_Init(&argc, &argv);
MPI_Type_struct(0, &blocklengths, &displacements, &types, &dt);
MPI_Type_commit(&dt);
MPI_Send(&x, 1, dt, MPI_PROC
On Sep 25, 2012, at 5:59 PM, Siegmar Gross wrote:
> I have had "--enable-orterun-prefix-by-default" in my configure
> command. I removed it and rebuilt the package and now the environment
> is OK. Tommorrow I will run some tests and also try to get the
> information about the topology for our M400
Hi
I fact I found what is the origin of this problem and it is because
all processes have rank 0, so I tested and in effect even when I send
the clasical Hello.py give the same, how can I solved this?? Do I re
installed every again???
Help please...
Mariana
On Sep 24, 2012, at 9:13 P
The usual reason for this is that you aren't launching these processes
correctly. How are you starting your job? Are you using mpirun?
On Sep 25, 2012, at 1:43 PM, mariana Vargas wrote:
> Hi
>
> I fact I found what is the origin of this problem and it is because all
> processes have rank 0,
IIRC, we found a configure "bug" that allowed you to enable-memchecker without
also including the required --with-valgrind. You might try again with 1.6.2,
which includes the change - and be sure to add the extra configure flag.
On Sep 25, 2012, at 12:04 PM, Jeremiah Willcock wrote:
> The fol
My config.log shows that it found Valgrind even though I didn't specify
--with-valgrind. It looks like the issue is in the datatype creation
code; looking at the data structure shows unusual values for true_ub and
true_lb:
{super = {super = {obj_magic_id = 16046253926196952813, obj_class =
0
Hi
I think I'am not understanding what you said , here is the hello.py and next
the command mpirun…
Thanks!
#!/usr/bin/env python
"""
Parallel Hello World
"""
from mpi4py import MPI
import sys
size = MPI.COMM_WORLD.Get_size()
rank = MPI.COMM_WORLD.Get_rank()
name = MPI.Get_processor_name()
sy
You forgot to call MPI_Init at the beginning of your program.
On Sep 25, 2012, at 2:08 PM, Mariana Vargas Magana
wrote:
> Hi
> I think I'am not understanding what you said , here is the hello.py and next
> the command mpirun…
>
> Thanks!
>
> #!/usr/bin/env python
> """
> Parallel Hello World
MPI_init() is actually called when import MPI module from MPi package...
On Sep 25, 2012, at 5:17 PM, Ralph Castain wrote:
You forgot to call MPI_Init at the beginning of your program.
On Sep 25, 2012, at 2:08 PM, Mariana Vargas Magana > wrote:
Hi
I think I'am not understanding what you sa
I don't think that is true, but I suggest you check the mpi4py examples. I
believe all import does is import function definitions - it doesn't execute
anything.
Sent from my iPad
On Sep 25, 2012, at 2:41 PM, mariana Vargas wrote:
> MPI_init() is actually called when import MPI module from MPi
Jeff,
It was a typo in my last post, I did use "--mca btl_tcp_if_exclude virbr0" and
it did not work.
At 2012-09-25 21:10:24,"Jeff Squyres" wrote:
>On Sep 25, 2012, at 2:56 PM, Richard wrote:
>
>> thanks a lot !
>> using "--mca btl_if_exclude virbr0" does not work, but you have pointed out
Tom might be correct, I checked my system. Using rpm -qa, I did not find Xen,
but found libvirt.
At 2012-09-25 21:38:23,"Tom Bryan (tombry)" wrote:
>On 9/25/12 9:10 AM, "Jeff Squyres (jsquyres)" wrote:
>
>>>problem, so i fixed it using "--mca btl_tcp_if_include bond0" because I
>>>know this is
Yes I am sure I read from a mpi4py guide I already check the examples if fact
this an example extracted from a guide…!! Evenmore this example if I use with
mpich2 it runs very nicely, even though for the other code I need openmpi
working =s
Mariana
On Sep 25, 2012, at 8:00 PM, Ralph Castain
Hello,
I have some error while using mpirun. Could anyone please help me solve this.
I googled this and found some, but too technical for me since I am not
so familiar with mpi programs. Is this due to some installation
problem or the program which I run?
Fatal error in PMPI_Allgather: Invalid buf
39 matches
Mail list logo