Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
On Tue, Jul 27, 2010 at 7:29 PM, Gus Correa wrote: > Hi Cristobal > > Does it run only on the head node alone? > (Fuego? Agua? Acatenango?) > Try to put only the head node on the hostfile and execute with mpiexec. > --> i will try only with the head node, and post results

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Hugo Gagnon
I did and it runs now, but the result is wrong: outside is still 1.d0, 2.d0, 3.d0, 4.d0, 5.d0 How can I make sure to compile OpenMPI so that datatypes such as mpi_double_precision behave as they "should"? Are there flags during the OpenMPI building process or something? Thanks, -- Hugo Gagnon

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
i compiled with absolute path in case: fcluster@agua:~$ /opt/openmpi-1.4.2/bin/mpicc testMPI/hello.c -o testMPI/hola fcluster@agua:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola [agua:03547] mca: base: component_find: unable to open /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
Thanks Gus, but i already had the paths fcluster@agua:~$ echo $PATH /opt/openmpi-1.4.2/bin:/opt/cfc/sge/bin/lx24-amd64:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/usr/games fcluster@agua:~$ echo $LD_LIBRARY_PATH /opt/openmpi-1.4.2/lib: fcluster@agua:~$ even weird, errors come

Re: [OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Gus Correa
Hi Cristobal Try using the --prefix option of mpiexec. "man mpiexec" is your friend! Alternatively, append the OpenMPI directories to your PATH *and* LD_LIBRARY_PATH on your .bashrc/.csrhc file See this FAQ: http://www.open-mpi.org/faq/?category=running#adding-ompi-to-path I hope it helps, Gus

[OMPI users] openMPI shared with NFS, but says different version

2010-07-27 Thread Cristobal Navarro
Hi, Even when executing a hello world openmpi, i get this error, which is then ignored. fcluster@fuego:~$ mpirun --hostfile myhostfile -np 5 testMPI/hola [agua:02357] mca: base: component_find: unable to open /opt/openmpi-1.4.2/lib/openmpi/mca_btl_ofud: perhaps a missing symbol, or compiled for a

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje
With this earlier failure do you know how many message may have been transferred between the two processes? Is there a way to narrow this down to a small piece of code? Do you have totalview or ddt at your disposal? --td Brian Smith wrote: Also, the application I'm having trouble with

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Gus Correa
Hi Hugo, David, Jeff, Terry, Anton, list I suppose maybe we're guessing that somehow on Hugo's iMac MPI_DOUBLE_PRECISION may not have as many bytes as dp = kind(1.d0), hence the segmentation fault on MPI_Allreduce. Question: Is there a simple way to check the number of bytes associated to

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Brian Smith
Also, the application I'm having trouble with appears to work fine with MVAPICH2 1.4.1, if that is any help. -Brian On Tue, 2010-07-27 at 10:48 -0400, Terry Dontje wrote: > Can you try a simple point-to-point program? > > --td > > Brian Smith wrote: > > After running on two processors across

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Edgar Gabriel
based on your output shown here, there is absolutely nothing wrong (yet). Both processes are in the same function and do what they are supposed to do. However, I am fairly sure that the client process bt that you show is already part of current_intracomm. Could you try to create a bt of the

Re: [OMPI users] Do MPI calls ever sleep?

2010-07-27 Thread Barrett, Brian W
No, we really shouldn't. Having just fought with a program using usleep(1) which was behaving even worse, working around this particular inability of the Linux kernel development team to do something sane will only lead to more pain. There are no good options, so the best option is to not try

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Ralph Castain
This slides outside of my purview - I would suggest you post this question with a different subject line specifically mentioning failure of intercomm_merge to work so it attracts the attention of those with knowledge of that area. On Jul 27, 2010, at 9:30 AM, Grzegorz Maj wrote: > So now I

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread David Zhang
Try mpi_real8 for the type in allreduce On 7/26/10, Hugo Gagnon wrote: > Hello, > > When I compile and run this code snippet: > > 1 program test > 2 > 3 use mpi > 4 > 5 implicit none > 6 > 7 integer :: ierr, nproc,

Re: [OMPI users] Dynamic processes connection and segfault on MPI_Comm_accept

2010-07-27 Thread Grzegorz Maj
So now I have a new question. When I run my server and a lot of clients on the same machine, everything looks fine. But when I try to run the clients on several machines the most frequent scenario is: * server is stared on machine A * X (= 1, 4, 10, ..) clients are started on machine B and they

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Brian Smith
Both 1.4.1 and 1.4.2 exhibit the same behaviors w/ OFED 1.5. It wasn't OFED 1.4 after all (after some more digging around through our update logs). All of the ibv_*_pingpong tests appear to work correctly. I'll try running a few more tests (np=2 over two nodes, some of the OSU benchmarks, etc.)

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Anton Shterenlikht
On Tue, Jul 27, 2010 at 08:11:39AM -0400, Jeff Squyres wrote: > On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote: > > > 8 integer, parameter :: dp = kind(1.d0) > > 9 real(kind=dp) :: inside(5), outside(5) > > I'm not a fortran expert -- is kind(1.d0) really double precision?

Re: [OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Terry Frankcombe
On Tue, 2010-07-27 at 08:11 -0400, Jeff Squyres wrote: > On Jul 26, 2010, at 11:06 PM, Hugo Gagnon wrote: > > > 8 integer, parameter :: dp = kind(1.d0) > > 9 real(kind=dp) :: inside(5), outside(5) > > I'm not a fortran expert -- is kind(1.d0) really double precision?

Re: [OMPI users] Processes stuck after MPI_Waitall() in 1.4.1

2010-07-27 Thread Terry Dontje
A clarification from your previous email, you had your code working with OMPI 1.4.1 but an older version of OFED? Then you upgraded to OFED 1.4 and things stopped working? Sounds like your current system is set up with OMPI 1.4.2 and OFED 1.5. Anyways, I am a little confused as to when

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain
Use what hostname returns - don't worry about IP addresses as we'll discover them. On Jul 26, 2010, at 10:45 PM, Philippe wrote: > Thanks a lot! > > now, for the ev "OMPI_MCA_orte_nodes", what do I put exactly? our > nodes have a short/long name (it's rhel 5.x, so the command hostname >

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Ralph Castain
Doh - yes it should! I'll fix it right now. Thanks! On Jul 26, 2010, at 9:28 PM, Philippe wrote: > Ralph, > > i was able to test the generic module and it seems to be working. > > one question tho, the function orte_ess_generic_component_query in >

Re: [OMPI users] MPI process dies with a route error when using dynamic process calls to connect more than 2 clients to a server with InfiniBand

2010-07-27 Thread Philippe
Ralph, i was able to test the generic module and it seems to be working. one question tho, the function orte_ess_generic_component_query in "orte/mca/ess/generic/ess_generic_component.c" calls getenv with the argument "OMPI_MCA_enc", which seems to cause the module to fail to load. shouldnt it

[OMPI users] MPI_Allreduce on local machine

2010-07-27 Thread Hugo Gagnon
Hello, When I compile and run this code snippet: 1 program test 2 3 use mpi 4 5 implicit none 6 7 integer :: ierr, nproc, myrank 8 integer, parameter :: dp = kind(1.d0) 9 real(kind=dp) :: inside(5), outside(5) 10 11 call