t; asm file which
you put into
opal/asm/generated/ and stick into libasm
- what you
generate there for whatever usage hasn't changed 1.4->1.4.1->svn trunk?
DM
On Tue, 9 Feb 2010, Jeff Squyres wrote:
Perhaps someone with a pathscale compiler support contract can investigate this
All,
FWIW, Pathscale is dying in the new atomics in 1.4.1 (and svn trunk) - actually
looping -
from gdb:
opal_progress_event_users_decrement () at
../.././opal/include/opal/sys/atomic_impl.h:61
61 } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval - delta));
Current language: a
Chance your arm and include a CFLAGS with your configure including -D__GNUC__
CFLAGS=-D__GNUC__
A small test case just using those headers works, thus wise.
BTW, PGI 9.0-1 also fails on those headers.
DM
On Tue, 29 Dec 2009, Richard Walsh wrote:
All,
Not overwhelmed with responses here ...
Simple job (standard compute pi, cpi.c), one machine with 4 cores -
OpenMPI built with gcc 4.3.2 and using gcc 4.3.2.
mpirun -x FOO --mca btl tcp,self -np 4 -machinefile hosty ./a.out
[hosty:11202] [[12796,0],0] ORTE_ERROR_LOG: Not implemented in file
../../../../.././orte/mca/rmaps/round_robin
Thank you.
DM
On Fri, 26 Jun 2009, Ralph Castain wrote:
Man, was this a PITA to chase down. Finally found it, though. Fixed on trunk
as of r21549
Thanks!
Ralph
So something else is wrong.
On Jun 25, 2009, at 3:19 PM, Mostyn Lewis wrote:
Just local machine - direct from the command line
. Are you using rsh, qrsh (i.e., SGE),
SLURM, Torque, ?
On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote:
Something like:
#!/bin/ksh
set -x
PREFIX=$OPENMPI_GCC_SVN
export PATH=$OPENMPI_GCC_SVN/bin:$PATH
MCA="--mca btl tcp,self"
mpicc -g -O6 mpiabort.c
NPROCS=4
mpirun --prefix
tain wrote:
Using what launch environment?
On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote:
While using the BLACS test programs, I've seen that with recent SVN
checkouts
(including todays) the MPI_Abort test left procs running. The last SVN I
have where it worked was 1.4a1r20936. By 1.4a1r21246 i
While using the BLACS test programs, I've seen that with recent SVN checkouts
(including todays) the MPI_Abort test left procs running. The last SVN I
have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails.
Works O.K. in the standard 1.3.2 release.
A test program is below. GCC was used.
If you want to find libimf.so, which is a shared INTEL library,
pass the library path with a -x on mpirun
mpirun -x LD_LIBRARY_PATH
DM
On Fri, 10 Apr 2009, Francesco Pietra wrote:
Hi Gus:
If you feel that the observations below are not relevant to openmpi,
please disregard the mes
_idle 0 on your cmd
line to override any internal settings.
On Apr 7, 2009, at 1:32 PM, Steve Kargl wrote:
On Tue, Apr 07, 2009 at 12:00:55PM -0700, Mostyn Lewis wrote:
Steve,
Did you rebuild 1.2.9? As I see you have static libraries, maybe there's
a lurking phthread or something els
Steve,
Did you rebuild 1.2.9? As I see you have static libraries, maybe there's
a lurking phthread or something else that may have changed over time?
DM
On Tue, 7 Apr 2009, Steve Kargl wrote:
On Tue, Apr 07, 2009 at 09:10:21AM -0700, Eugene Loh wrote:
Steve Kargl wrote:
I can rebuild 1.2.9
ut of configure say for you when it's checking for restrict
for you?
On Mar 13, 2009, at 3:07 PM, Mostyn Lewis wrote:
Well George's syntax didn't work, either:
"../../../.././ompi/mca/op/op.h", line 263: error: expected a ")"
typedef vo
type_t **,
struct
ompi_op_base_module_1_0_0_t *);
Thanks,
george.
On Mar 11, 2009, at 15:52 , Mostyn Lewis wrote:
Compiling SVN r20757 with PGI 8.0-4 failed doing ompi_info with
"
Compiling SVN r20757 with PGI 8.0-4 failed doing ompi_info with
"../../../.././ompi/mca/op/op.h", line 264: error: duplicate parameter name
void *restrict,
^
"../../../.././ompi/
works?
Ralph
On Mar 10, 2009, at 2:13 PM, Mostyn Lewis wrote:
Maybe I know why now but it's not pleasant, e.g. 2 machines in the same
cluster have their ethernets such as:
Machine s0157
eth2 Link encap:Ethernet HWaddr 00:1E:68:DA:74:A8
BROADCAST MULTICAST MTU:1500 Metric:1
l catch an "eth0" on another node
where we need it.
Can you give it a try and see if it works?
Ralph
On Mar 10, 2009, at 2:13 PM, Mostyn Lewis wrote:
Maybe I know why now but it's not pleasant, e.g. 2 machines in the same
cluster have their ethernets such as:
Machine s0157
eth2
ngs so we don't endlessly loop when that happens (IIRC, I think
we are already supposed to abort, but it appears that isn't working). But the
real question is why the comm fails in the first place.
On Mar 10, 2009, at 10:50 AM, Mostyn Lewis wrote:
Latest status - 1.4a1r20757 (yeste
It looks like the system doesn't know what nodes the procs are to be
placed upon. Can you run this with --display-devel-map? That will tell
us where the system thinks it is placing things.
Thanks
Ralph
On Feb 26, 2009, at 3:41 PM, Mostyn Lewis wrote:
Maybe it's my pine mailer.
Thi
ks like, what is
supposed to happen, etc? I can barely parse your cmd line...
Thanks
Ralph
On Feb 26, 2009, at 1:03 PM, Mostyn Lewis wrote:
Today's and yesterdays.
1.4a1r20643_svn
+ mpirun --prefix
/tools/openmpi/1.4a1r20643_svn/connectx/intel64/10.1.015/openib/suse_sles_10/x86_6
4/opter
Today's and yesterdays.
1.4a1r20643_svn
+ mpirun --prefix
/tools/openmpi/1.4a1r20643_svn/connectx/intel64/10.1.015/openib/suse_sles_10/x86_6
4/opteron -np 256 --mca btl sm,openib,self -x
OMPI_MCA_btl_openib_use_eager_rdma -x OMPI_MCA_btl_ope
nib_eager_limit -x OMPI_MCA_btl_self_eager_limit -x
Sort of ditto but with SVN release at 20123 (and earlier):
e.g.
[r2250_46:30018] mca_common_sm_mmap_init: open
/tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_46_0/25682/1/shared_mem_pool.r2250_46
failed with errno=2
[r2250_63:05292] mca_common_sm_mmap_init: open
/tmp/45139.1.all.q/openmpi-s
I think the quoting counts as my own fault as I used
MCA='--mca btl_openib_verbose 1 --mca btl openib,self --mca btl_openib_if_include
"mlx4_0:1,mlx4_1:1"'
...
mpirun ... $MCA ...
DM
On Sun, 19 Oct 2008, Jeff Squyres wrote:
On Oct 18, 2008, at 9:17 PM, Mostyn Lewis wrote:
On Sun, 19 Oct 2008, Jeff Squyres wrote:
On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote:
Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
to approach double the bandwidth on simple tests such as IMB PingPong?
Yes. OMPI will automatically (and aggressively) use as many
Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine
to approach double the bandwidth on simple tests such as IMB PingPong?
Regards,
DM
Jeff,
I traced this and it was the quote marks in "mlx4_0:1,mlx4_1:1" - they were
passed in and caused the mismatch :-(
Sorry about that.
Regards,
DM
On Sat, 18 Oct 2008, Jeff Squyres wrote:
On Oct 16, 2008, at 9:10 PM, Mostyn Lewis wrote:
OpenMPI says for a:
mpirun --pref
Hello,
Using today's SVN 1.4a1r19757
with
MCA='--mca btl_openib_verbose 1 --mca btl openib,self --mca btl_openib_if_include
"mlx4_0:1,mlx4_1:1"'
ibstatus (OFED 1.3.1) gives:
Infiniband device 'mlx4_0' port 1 status:
default gid: fe80::::0003:ba00:0100:71a1
base
Jeff,
You broke my ksh (and I expect something else)
Today's SVN 1.4a1r19757
orte/mca/plm/rsh/plm_rsh_module.c
line 471:
tmp = opal_argv_split("( test ! -r ./.profile || . ./.profile;", ' ');
^
ARGHH
No (
tmp = opal_arg
he SM BTL. Please upgrade to at least 19315
and [hopefully] your application will run to completion.
Thanks,
george.
On Jul 24, 2008, at 3:39 AM, Mostyn Lewis wrote:
Hello,
Using a very recent svn version (1.4a1r18899) I'm getting a non-
terminating
condition if I use the sm bt
Hello,
Using a very recent svn version (1.4a1r18899) I'm getting a non-terminating
condition if I use the sm btl with tcp,self or with openib,self.
The program is not finishing a reduce operation. It works if the sm btl
is left out.
Using 2 4 core nodes.
Program is:
-
Yes, Intel 10.0 (and some say 10.1) have this problem with gcc 4.2.X (which
it is using) - it works with 4.1.2 as in SuSE SLES 10.
A workaround is to include in your flags (CXXFLAGS, presumably), the following:
-D
"__sync_fetch_and_add(ptr,addend)=_InterlockedExchangeAdd(const_cast(reinterpret_
Using todays SVN (1.3a1r17234) and building in the context of OFED 1.2.5.4
installed and it works!
Regards,
Mostyn
On Thu, 24 Jan 2008, Mostyn Lewis wrote:
Hello,
I have a very simple MPI program hanging in MPI_Reduce using the openmpi-1.2.4-1
as supplied with OFED 1.2.5.4 (running this too
Hello,
I have a very simple MPI program hanging in MPI_Reduce using the openmpi-1.2.4-1
as supplied with OFED 1.2.5.4 (running this too).
It works with same hardware using the supplied mvapich (mvapich-0.9.9).
The hardware is a Mellanox Technologies MT25418 [ConnectX IB DDR] (rev a0) HCA
(SUN/v
- so we
have a real mix to handle.
Regarding the lack of mvapi support in OpenMPI there's just udapl left for
such as SilverStorm :-(
Thanks for looking,
Mostyn
On Tue, 6 Nov 2007, Andrew Friedley wrote:
Mostyn Lewis wrote:
Andrew,
Failure looks like:
+ mpirun --prefix
+
/tools/op
MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)
Regards,
Mostyn
On Tue, 6 Nov 2007, Andrew Friedley wrote:
All thread support is disabled by default in Open MPI; the uDAPL BTL is
neither thread safe nor makes use of a threaded uDAPL implementation.
For completeness, the thread support is con
I'm trying to build a udapl OpenMPI from last Friday's SVN and using
Qlogic/QuickSilver/SilverStorm 4.1.0.0.1 software. I can get it
made and it works in machine. With IB between 2 machines is fails
near termination of a job. Qlogic says they don't have a threaded
udapl (libpthread is in the trace
leton (MCA v1.0, API v1.0, Component v1.3)
MCA sds: slurm (MCA v1.0, API v1.0, Component v1.3)
MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3)
mostyn@s0120:/ctmp8/mostyn/glamex/pi> exit
Script done on Mon 01 Oct 2007 04:35:03 PM PDT
On Sun, 30 Sep 2007, Mostyn Lewis wrote:
Any ideas about this. One dual core operton box talking to another using
infincon/silverstorm/qlogic hardware and mvapi (actually it's the same
just using ethernet and tcp):
$OPENMPI_INFINICON_GCC_MVAPI/bin/mpicc cpi.c
$OPENMPI_INFINICON_GCC_MVAPI/bin/-np 4 -machinefile j ./a.out
[s0121:07450] [1
Brian,
Thanks for the reply.
The combination of:
libtool 1.5.23b
automake 1.10
autoconf 2.61
was O.K., so it seems that libtool 2.1a from CVS on 092407 caused my
hiccup.
Regards,
Mostyn
On Fri, 28 Sep 2007, Brian Barrett wrote:
On Sep 27, 2007, at 6:44 PM, Mostyn Lewis wrote:
Today
# LTDL_CONVENIENCE
#
...
GNU tools used:
autoconf 2.61
automake 1.10
libtool 2.1a_CVS.092407 (libtool from CVS 3 days ago)
Regards,
Mostyn Lewis
-openib,btl-gm,btl-mx,mtl-psm
would parse.
So, which is it? The docs or the last above?
From a SVN of today.
Regards,
Mostyn Lewis
Why not edit libtool to see what it is doing (it's just a script)
- you will get a lot of output:
Add a "set -x" as the second line and stand well back :-)
#! /bin/sh
set -x
Mostyn
On Wed, 20 Jun 2007, Andrew Friedley wrote:
I'm not seeing anything particularly relevant in the libtool
docume
e to bind and in that case need to
switch off any supplied binding. I really wish the default was no
binding like OpenMPI with docs that point out the variables but it's
not always the case.
Sorry again for any trub,
Mostyn
On Tue, 24 Apr 2007, Jeff Squyres wrote:
On Apr 23, 2007, at 9:22 P
the end) and a taskset call gets back the
mask to show if you are bound or not.
Regards,
Mostyn
On Mon, 23 Apr 2007, Jeff Squyres wrote:
On Apr 22, 2007, at 8:46 PM, Mostyn Lewis wrote:
More information.
--mca mpi_paffinity_alone 0
Can you describe how you're verifying that the proce
After 1.3a1r14155 (not sure how much after but certainly currently) you
get a SEGV if you use an unknown shell (I use something called ksh93).
Error is at lines 576->580
if ( i == ORTE_PLS_RSH_SHELL_UNKNOWN ) {
opal_output(0, "WARNING: local probe returned unhandled shell:%s
mca_base_param_reg_int_name call in ompi_mpi_params.c you don't check the
return code, so there may be junk in value? However, I don't see that explicitly
setting --mca mpi_paffinity_alone 0 would fail?
Regards,
Mostyn
P.S. I hope this doesn't seem too presumptious.
On Sun, 22 Apr 2
Using a lateish SVN, 1.3a1r14155 (circa April 1st), on a SuSE SLES10
opteron system which has 2 dual core opterons per node, I can't seem
to disable processor affinity? I have a small test program which call
something else to detect CPU binding and whatever I've done it's still
bound?
tried:
--m
I believe this is "too many open files".
ulimit -n some_number
Regards,
Mostyn
On Wed, 22 Nov 2006, Lydia Heck wrote:
I have - again - successfully built and installed
mx and openmpi and I can run 64 and 128 cpus jobs on a 256 CPU cluster
version of openmpi is 1.2b1
compiler used: studio11
I get this on rh9 ONLY if I leave out a -hostfile option
on mpirun, otherwise it works fine.
This is an old Red Hat.
Regards,
Mostyn
On Wed, 16 Nov 2005, Jeff Squyres wrote:
Clement --
Sorry for the delay in replying. We're running around crazy here at
SC, which pretty much keeps us away fr
48 matches
Mail list logo