kevin.buck...@ecs.vuw.ac.nz writes:
> Cc: to the OpenMPI list as the oftdump clash might be of interest
> elsewhere.
>
>> I attach a patch, but it doesn't work and I don't see where the
>> error lies now. It may be that I'm doing something stupid.
>> It produces working OpenMPI-1.3.4 package on
Dear all,
sometimes when running Open MPI jobs, the application hangs. By looking the
output I get the following error message:
[ic17][[34562,1],74][../../../../../ompi/mca/btl/tcp/btl_tcp_frag.c:216:mca_btl_tcp_frag_recv
] mca_btl_tcp_frag_recv: readv failed: No route to host (113)
I
On Tue, 2009-12-08 at 08:30 -0800, Matthew MacManes wrote:
> There are 8 physical cores, or 16 with hyperthreading enabled.
That should be meaty enough.
> 1st of all, let me say that when I specify that -np is less than 4
> processors (1, 2, or 3), both programs seem to work as expected. Also,
Hi Matthew,
I just had the same problem with my application when using more than 4 cores -
however, the program didn't hang, it crashed, and I got an error message of
'address not mapped'. As you say, it happened different places in the code,
sometimes in the beginning, sometimes in the
Hi
I've installed trilinos using the openmpi 1.3.3 libraries. I'm configuring
openmpi as follows:
/configure CXX=/usr/local/bin/g++ CC=/usr/local/bin/gcc
F77=/usr/local/bin/gfortran - prefix=/Users/andrewmcbride/lib/openmpi-1.3.3/MAC
Trilinos compiles without problem but the test fail (see
You need to set your LD_LIBRARY_PATH to ~/lib/openmpi-1.3.3/MAC/lib, and
your PATH to ~/lib/openmpi-1.3.3/MAC/bin
It should then run fine.
On Wed, Dec 9, 2009 at 6:29 AM, Andrew McBride wrote:
> Hi
>
> I've installed trilinos using the openmpi 1.3.3 libraries. I'm
Thanks for your quick response Ralph.
The errors I get is now are of a completely different nature and have to do
with, presumably, calling delete on an unallocated pointer. Now, this probably
has little to do with openmpi and more to do with compilers used to create
openmpi?
I used gcc
Can you run simple MPI applications, like sending a message around in a ring?
On Dec 9, 2009, at 10:18 AM, Andrew McBride wrote:
> Thanks for your quick response Ralph.
>
> The errors I get is now are of a completely different nature and have to do
> with, presumably, calling delete on an
seemingly. here is the output of ring:
bash-3.2$ ~/lib/openmpi-1.3.3/MAC/bin/mpicxx ring_cxx.cc
bash-3.2$ ~/lib/openmpi-1.3.3/MAC/bin/mpirun -np 2 a.out
Process 0 sending 10 to 1, tag 201 (2 processes in ring)
Process 0 sent to 1
Process 0 decremented value: 9
Process 0 decremented value: 8
On Dec 9, 2009, at 10:59 AM, Andrew McBride wrote:
> seemingly. here is the output of ring:
>
> I presume this output is correct? I guess the issue I have lies elsewhere
> then?
Yes -- the output looks correct.
Never say "never", but it would *seem* that the error lies in your app
somewhere.
So I tried to reproduce this problem today, and everything worked fine
for me using the trunk. I haven't tested v1.3/v1.4 yet.
I tried checkpointing with one hostfile then restarting with each of
the following:
- No hostfile
- a hostfile with completely different machines
- a hostfile
Hi Josh,
Thanks for helping. That solved the problem!!!
cheers,
Jonathan
Josh Hursey wrote:
So I tried to reproduce this problem today, and everything worked fine
for me using the trunk. I haven't tested v1.3/v1.4 yet.
I tried checkpointing with one hostfile then restarting with each of
Hi Gus,
Interestingly the results for the connectivity_c test... works fine with -np
<8. For -np >8 it works some of the time, other times it HANGS. I have got to
believe that this is a big clue!! Also, when it hangs, sometimes I get the
message "mpirun was unable to cleanly terminate the
I verified that the preload functionality works on the trunk. It seems
to be broken on the v1.3/v1.4 branches. The version of this code has
changed significantly between the v1.3/v1.4 and the trunk/v1.5
versions. I filed a bug about this so it does not get lost:
Thanks Ashley, I'll try your tool..
I would think that this is an error in the programs I am trying to use, too,
but this is a problem with 2 different programs, written by 2 different
groups.. One of them might be bad, but both.. seems unlikely.
Interestingly the results for the
On Nov 12, 2009, at 10:54 AM, Sergio Díaz wrote:
Hi Josh,
You were right. The main problem was the /tmp. SGE uses a scratch
directory in which the jobs have temporary files. Setting TMPDIR to /
tmp, checkpoint works!
However, when I try to restart it... I got the following error (see
I took a look at the checkpoint staging and preload functionality. It
seems that the combination of the two is broken on the v1.3 and v1.4
branches. I filed a bug about it so that it would not get lost:
https://svn.open-mpi.org/trac/ompi/ticket/2139
I also attached a patch to partially fix
>> 26a27
>>> CONFIGURE_ARGS+= --enable-contrib-no-build=vt
>>
>> I have no idea how NetBSD go about resolving such clashes in the long
>> term though?
>
> I've disabled it the same way for this time, my local package differs
> from what's in wip:
>
> --- PLIST 3 Dec 2009 10:18:00 -
Hi all,
My first ever attempt to build OpenMPI. Platform is Sun Sunfire x4600
M2 servers, running Scientific Linux version 5.3. Trying to build
OpenMPI 1.4 (as of today; same problems yesterday with 1.3.4).
Trying to use PGI version 10.0.
As a first attempt, I set CC, CXX, F77, and FC, then
Hi all:
I'm trying to build openmpi-1.4 rpms using my normal (complex) rpm
build commands, but its failing. I'm running into two errors:
One (on gcc only): the D_FORTIFY_SOURCE build failure. I've had to
move the if test "$using_gcc" = 0; then line down to after the
RPM_OPT_FLAGS= that
Hi David
Last I tried, OpenMPI 1.3.2, PGI (8.0-4) was problematic,
particularly for C and C++.
I eventually settled down with a hybrid gcc, g++, and pgf90
(for both OpenMPI F77 and F90 bindings).
Even this required a trick to avoid the "-pthread" flag
to be inserted among the pgf90 flags (where
Fascinating. I've not had any real problems building it from scratch
with PGI. We are using the PGI 9 compilers, though, for that.
gerry
Gus Correa wrote:
Hi David
Last I tried, OpenMPI 1.3.2, PGI (8.0-4) was problematic,
particularly for C and C++.
I eventually settled down with a hybrid
Just to set the record straight: it's a Libtool problem with PGI version 10
(all PGI versions below 10 work fine).
This has been reported to the GNU Libtool folks and patches have already been
applied upstream. However, there hasn't been a new Libtool release yet with
these patches, so we
Hi All
As I stated on my original posting,
I haven't compiled OpenMPI since 1.3.2.
Just trying to be of help, based on previous,
and maybe too old, experiences.
The problem I referred to happened with PGI 8.0-4 and OpenMPI 1.3.
Most likely the issue is superseded already by the newer
OpenMPI
By the way, if I set build_all_in_one_rpm to 1, it works fine...
--Jim
On Wed, Dec 9, 2009 at 1:47 PM, Jim Kusznir wrote:
> Hi all:
>
> I'm trying to build openmpi-1.4 rpms using my normal (complex) rpm
> build commands, but its failing. I'm running into two errors:
>
> One
kevin.buck...@ecs.vuw.ac.nz writes:
CONFIGURE_ARGS+= --enable-contrib-no-build=vt
>>>
>>> I have no idea how NetBSD go about resolving such clashes in the long
>>> term though?
>>
>> I've disabled it the same way for this time, my local package differs
>> from what's in wip:
>>
>> ---
Hi Gus and List,
1st of all Gus, I want to say thanks.. you have been a huge help, and when I
get this fixed, I owe you big time!
However, the problems continue...
I formatted the HD, reinstalled OS to make sure that I was working from
scratch. I did your step A, which seemed to go fine:
What is the difference between connectivity_c and ring_c or hello_c? Under
what circumstances should one fail and not the others...
I am having a huge problem with openMPI, and trying to get to the bottom of
it by understanding the differences between the example files, connectivity,
hello, and
Hi Matthew
Save any misinterpretation I may have made of the code:
Hello_c has no real communication, except for a final Barrier
synchronization.
Each process prints "hello world" and that's it.
Ring probes a little more, with processes Send(ing) and
Recv(cieving) messages.
Ring just passes a
29 matches
Mail list logo