[OMPI users] questions about tracing OpenMPI program

2009-07-15 Thread Zou, Lin (GE, Research, Consultant)
Hi all,
I want to tracing my program using vampir, having untar vampir and
license, but when I ran vampir, it return a error "can't find
libXp.so.6", but I do really find this lib in /usr/lib, and I also set
the ld configuration and LD_LIBARY_PATH, but they all don't work. Does
anyone get this situation before?
(I ran vampir on a PowerPC station)
Lin 


[OMPI users] where can i get a tracing tool

2009-07-13 Thread Zou, Lin (GE, Research, Consultant)
Hi all, 
I want to trace my program, having used vampirTrace to generate  tracing
info, except for Vampir, where can I download free tools to parse the
tracing info?
Thanks in advance.
Lin


Re: [OMPI users] Configuration problem or network problem?

2009-07-07 Thread Zou, Lin (GE, Research, Consultant)
 
Thank you for your suggestion, I tried this solution, but it doesn't work. In 
fact, the headnode doesn't participate the computing and communication, it only 
malloc a large a memory, and when the loop in every PS3 is over, the headnode 
gather the data from every PS3.
The strange thing is that sometimes the program can work well, but when reboot 
the system, without any change to the program, it can't work, so I think it 
should be some mechanism in OpenMPI that can configure to let the program work 
well.
 
Lin



From: users-boun...@open-mpi.org [mailto:users-boun...@open-mpi.org] On Behalf 
Of Doug Reeder
Sent: 2009年7月7日 10:49
To: Open MPI Users
Subject: Re: [OMPI users] Configuration problem or network problem?


Lin, 

Try -np 16 and not running on the head node.

Doug Reeder

On Jul 6, 2009, at 7:08 PM, Zou, Lin (GE, Research, Consultant) wrote:


Hi all,
The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a 
headnode, they are connected by a high speed switch.
There are point-to-point communication functions( MPI_Send and 
MPI_Recv ), the data size is about 40KB, and a lot of computings which will 
consume a long time(about 1 sec)in a loop.The co-processor in PS3 can take care 
of the computation, the main processor take care of point-to-point 
communication,so the computing and communication can overlap.The communication 
funtions should return much faster than computing function.
My question is that after some circles, the time consumed by 
communication functions in a PS3 will increase heavily, and the whole cluster's 
sync state will corrupt.When I decrease the computing time, this situation just 
disappeare.I am very confused about this.
I think there is a mechanism in OpenMPI that cause this case, does 
everyone get this situation before? 
I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there 
something i should added?
Lin
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




[OMPI users] Configuration problem or network problem?

2009-07-06 Thread Zou, Lin (GE, Research, Consultant)
Hi all,
The system I use is a PS3 cluster, with 16 PS3s and a PowerPC as a
headnode, they are connected by a high speed switch.
There are point-to-point communication functions( MPI_Send and
MPI_Recv ), the data size is about 40KB, and a lot of computings which
will consume a long time(about 1 sec)in a loop.The co-processor in PS3
can take care of the computation, the main processor take care of
point-to-point communication,so the computing and communication can
overlap.The communication funtions should return much faster than
computing function.
My question is that after some circles, the time consumed by
communication functions in a PS3 will increase heavily, and the whole
cluster's sync state will corrupt.When I decrease the computing time,
this situation just disappeare.I am very confused about this.
I think there is a mechanism in OpenMPI that cause this case, does
everyone get this situation before? 
I use "mpirun --mca btl tcp, self -np 17 --hostfile ...", is there
something i should added?
Lin