Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Aleksej Saushev
kevin.buck...@ecs.vuw.ac.nz writes: > Cc: to the OpenMPI list as the oftdump clash might be of interest > elsewhere. > >> I attach a patch, but it doesn't work and I don't see where the >> error lies now. It may be that I'm doing something stupid. >> It produces working OpenMPI-1.3.4 package on

[OMPI users] Hanging vs Stopping behaviour in communication failures

2009-12-09 Thread Constantinos Makassikis
Dear all, sometimes when running Open MPI jobs, the application hangs. By looking the output I get the following error message: [ic17][[34562,1],74][../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv ] mca_btl_tcp_frag_recv: readv failed: No route to host (113) I

Re: [OMPI users] mpirun only works when -np <4

2009-12-09 Thread Ashley Pittman
On Tue, 2009-12-08 at 08:30 -0800, Matthew MacManes wrote: > There are 8 physical cores, or 16 with hyperthreading enabled. That should be meaty enough. > 1st of all, let me say that when I specify that -np is less than 4 > processors (1, 2, or 3), both programs seem to work as expected. Also,

Re: [OMPI users] mpirun only works when -np <4

2009-12-09 Thread Iris Pernille Lohmann
Hi Matthew, I just had the same problem with my application when using more than 4 cores - however, the program didn't hang, it crashed, and I got an error message of 'address not mapped'. As you say, it happened different places in the code, sometimes in the beginning, sometimes in the

[OMPI users] orte error

2009-12-09 Thread Andrew McBride
Hi I've installed trilinos using the openmpi 1.3.3 libraries. I'm configuring openmpi as follows: /configure CXX=/usr/local/bin/g++ CC=/usr/local/bin/gcc F77=/usr/local/bin/gfortran - prefix=/Users/andrewmcbride/lib/openmpi-1.3.3/MAC Trilinos compiles without problem but the test fail (see

Re: [OMPI users] orte error

2009-12-09 Thread Ralph Castain
You need to set your LD_LIBRARY_PATH to ~/lib/openmpi-1.3.3/MAC/lib, and your PATH to ~/lib/openmpi-1.3.3/MAC/bin It should then run fine. On Wed, Dec 9, 2009 at 6:29 AM, Andrew McBride wrote: > Hi > > I've installed trilinos using the openmpi 1.3.3 libraries. I'm

Re: [OMPI users] orte error

2009-12-09 Thread Andrew McBride
Thanks for your quick response Ralph. The errors I get is now are of a completely different nature and have to do with, presumably, calling delete on an unallocated pointer. Now, this probably has little to do with openmpi and more to do with compilers used to create openmpi? I used gcc

Re: [OMPI users] orte error

2009-12-09 Thread Jeff Squyres
Can you run simple MPI applications, like sending a message around in a ring? On Dec 9, 2009, at 10:18 AM, Andrew McBride wrote: > Thanks for your quick response Ralph. > > The errors I get is now are of a completely different nature and have to do > with, presumably, calling delete on an

Re: [OMPI users] orte error

2009-12-09 Thread Andrew McBride
seemingly. here is the output of ring: bash-3.2$ ~/lib/openmpi-1.3.3/MAC/bin/mpicxx ring_cxx.cc bash-3.2$ ~/lib/openmpi-1.3.3/MAC/bin/mpirun -np 2 a.out Process 0 sending 10 to 1, tag 201 (2 processes in ring) Process 0 sent to 1 Process 0 decremented value: 9 Process 0 decremented value: 8

Re: [OMPI users] orte error

2009-12-09 Thread Jeff Squyres
On Dec 9, 2009, at 10:59 AM, Andrew McBride wrote: > seemingly. here is the output of ring: > > I presume this output is correct? I guess the issue I have lies elsewhere > then? Yes -- the output looks correct. Never say "never", but it would *seem* that the error lies in your app somewhere.

Re: [OMPI users] ompi-restart using different nodes

2009-12-09 Thread Josh Hursey
So I tried to reproduce this problem today, and everything worked fine for me using the trunk. I haven't tested v1.3/v1.4 yet. I tried checkpointing with one hostfile then restarting with each of the following: - No hostfile - a hostfile with completely different machines - a hostfile

Re: [OMPI users] ompi-restart using different nodes

2009-12-09 Thread Jonathan Ferland
Hi Josh, Thanks for helping. That solved the problem!!! cheers, Jonathan Josh Hursey wrote: So I tried to reproduce this problem today, and everything worked fine for me using the trunk. I haven't tested v1.3/v1.4 yet. I tried checkpointing with one hostfile then restarting with each of

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-09 Thread Matthew MacManes
Hi Gus, Interestingly the results for the connectivity_c test... works fine with -np <8. For -np >8 it works some of the time, other times it HANGS. I have got to believe that this is a big clue!! Also, when it hangs, sometimes I get the message "mpirun was unable to cleanly terminate the

Re: [OMPI users] Problem with mpirun -preload-binary option

2009-12-09 Thread Josh Hursey
I verified that the preload functionality works on the trunk. It seems to be broken on the v1.3/v1.4 branches. The version of this code has changed significantly between the v1.3/v1.4 and the trunk/v1.5 versions. I filed a bug about this so it does not get lost:

Re: [OMPI users] mpirun only works when -np <4

2009-12-09 Thread Matthew MacManes
Thanks Ashley, I'll try your tool.. I would think that this is an error in the programs I am trying to use, too, but this is a problem with 2 different programs, written by 2 different groups.. One of them might be bad, but both.. seems unlikely. Interestingly the results for the

Re: [OMPI users] checkpoint opempi-1.3.3+sge62

2009-12-09 Thread Josh Hursey
On Nov 12, 2009, at 10:54 AM, Sergio Díaz wrote: Hi Josh, You were right. The main problem was the /tmp. SGE uses a scratch directory in which the jobs have temporary files. Setting TMPDIR to / tmp, checkpoint works! However, when I try to restart it... I got the following error (see

Re: [OMPI users] Changing location where checkpoints are saved

2009-12-09 Thread Josh Hursey
I took a look at the checkpoint staging and preload functionality. It seems that the combination of the two is broken on the v1.3 and v1.4 branches. I filed a bug about it so that it would not get lost: https://svn.open-mpi.org/trac/ompi/ticket/2139 I also attached a patch to partially fix

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Kevin . Buckley
>> 26a27 >>> CONFIGURE_ARGS+= --enable-contrib-no-build=vt >> >> I have no idea how NetBSD go about resolving such clashes in the long >> term though? > > I've disabled it the same way for this time, my local package differs > from what's in wip: > > --- PLIST 3 Dec 2009 10:18:00 -

[OMPI users] Problem building OpenMPI with PGI compilers

2009-12-09 Thread David Turner
Hi all, My first ever attempt to build OpenMPI. Platform is Sun Sunfire x4600 M2 servers, running Scientific Linux version 5.3. Trying to build OpenMPI 1.4 (as of today; same problems yesterday with 1.3.4). Trying to use PGI version 10.0. As a first attempt, I set CC, CXX, F77, and FC, then

[OMPI users] OpenMPI 1.4 RPM Spec file problem

2009-12-09 Thread Jim Kusznir
Hi all: I'm trying to build openmpi-1.4 rpms using my normal (complex) rpm build commands, but its failing. I'm running into two errors: One (on gcc only): the D_FORTIFY_SOURCE build failure. I've had to move the if test "$using_gcc" = 0; then line down to after the RPM_OPT_FLAGS= that

Re: [OMPI users] Problem building OpenMPI with PGI compilers

2009-12-09 Thread Gus Correa
Hi David Last I tried, OpenMPI 1.3.2, PGI (8.0-4) was problematic, particularly for C and C++. I eventually settled down with a hybrid gcc, g++, and pgf90 (for both OpenMPI F77 and F90 bindings). Even this required a trick to avoid the "-pthread" flag to be inserted among the pgf90 flags (where

Re: [OMPI users] Problem building OpenMPI with PGI compilers

2009-12-09 Thread Gerald Creager
Fascinating. I've not had any real problems building it from scratch with PGI. We are using the PGI 9 compilers, though, for that. gerry Gus Correa wrote: Hi David Last I tried, OpenMPI 1.3.2, PGI (8.0-4) was problematic, particularly for C and C++. I eventually settled down with a hybrid

Re: [OMPI users] Problem building OpenMPI with PGI compilers

2009-12-09 Thread Jeff Squyres
Just to set the record straight: it's a Libtool problem with PGI version 10 (all PGI versions below 10 work fine). This has been reported to the GNU Libtool folks and patches have already been applied upstream. However, there hasn't been a new Libtool release yet with these patches, so we

Re: [OMPI users] Problem building OpenMPI with PGI compilers

2009-12-09 Thread Gus Correa
Hi All As I stated on my original posting, I haven't compiled OpenMPI since 1.3.2. Just trying to be of help, based on previous, and maybe too old, experiences. The problem I referred to happened with PGI 8.0-4 and OpenMPI 1.3. Most likely the issue is superseded already by the newer OpenMPI

Re: [OMPI users] OpenMPI 1.4 RPM Spec file problem

2009-12-09 Thread Jim Kusznir
By the way, if I set build_all_in_one_rpm to 1, it works fine... --Jim On Wed, Dec 9, 2009 at 1:47 PM, Jim Kusznir wrote: > Hi all: > > I'm trying to build openmpi-1.4 rpms using my normal (complex) rpm > build commands, but its failing.  I'm running into two errors: > > One

Re: [OMPI users] Pointers for understanding failure messages on NetBSD

2009-12-09 Thread Aleksej Saushev
kevin.buck...@ecs.vuw.ac.nz writes: CONFIGURE_ARGS+= --enable-contrib-no-build=vt >>> >>> I have no idea how NetBSD go about resolving such clashes in the long >>> term though? >> >> I've disabled it the same way for this time, my local package differs >> from what's in wip: >> >> ---

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-09 Thread Matthew MacManes
Hi Gus and List, 1st of all Gus, I want to say thanks.. you have been a huge help, and when I get this fixed, I owe you big time! However, the problems continue... I formatted the HD, reinstalled OS to make sure that I was working from scratch. I did your step A, which seemed to go fine:

[OMPI users] OMPI 1.4: connectivity_c fails, ring_c and hello_c work

2009-12-09 Thread Matthew MacManes
What is the difference between connectivity_c and ring_c or hello_c? Under what circumstances should one fail and not the others... I am having a huge problem with openMPI, and trying to get to the bottom of it by understanding the differences between the example files, connectivity, hello, and

Re: [OMPI users] mpirun only works when -np <4 (Gus Correa)

2009-12-09 Thread Gus Correa
Hi Matthew Save any misinterpretation I may have made of the code: Hello_c has no real communication, except for a final Barrier synchronization. Each process prints "hello world" and that's it. Ring probes a little more, with processes Send(ing) and Recv(cieving) messages. Ring just passes a