Re: [OMPI users] Problems building Open MPI 1.4.1 with Pathscale

2010-02-09 Thread Mostyn Lewis
t; asm file which you put into opal/asm/generated/ and stick into libasm - what you generate there for whatever usage hasn't changed 1.4->1.4.1->svn trunk? DM On Tue, 9 Feb 2010, Jeff Squyres wrote: Perhaps someone with a pathscale compiler support contract can investigate this

Re: [OMPI users] Problems building Open MPI 1.4.1 with Pathscale

2010-02-09 Thread Mostyn Lewis
All, FWIW, Pathscale is dying in the new atomics in 1.4.1 (and svn trunk) - actually looping - from gdb: opal_progress_event_users_decrement () at ../.././opal/include/opal/sys/atomic_impl.h:61 61 } while (0 == opal_atomic_cmpset_32(addr, oldval, oldval - delta)); Current language: a

Re: [OMPI users] Problem compiling 1.4.0 snap with PGI 10.0-1 and openib flags turned on ...

2009-12-29 Thread Mostyn Lewis
Chance your arm and include a CFLAGS with your configure including -D__GNUC__ CFLAGS=-D__GNUC__ A small test case just using those headers works, thus wise. BTW, PGI 9.0-1 also fails on those headers. DM On Tue, 29 Dec 2009, Richard Walsh wrote: All, Not overwhelmed with responses here ...

[OMPI users] Today's SVN 1.7a1r22089_svn simple job failure

2009-10-11 Thread Mostyn Lewis
Simple job (standard compute pi, cpi.c), one machine with 4 cores - OpenMPI built with gcc 4.3.2 and using gcc 4.3.2. mpirun -x FOO --mca btl tcp,self -np 4 -machinefile hosty ./a.out [hosty:11202] [[12796,0],0] ORTE_ERROR_LOG: Not implemented in file ../../../../.././orte/mca/rmaps/round_robin

Re: [OMPI users] Did you break MPI_Abort recently?

2009-06-27 Thread Mostyn Lewis
Thank you. DM On Fri, 26 Jun 2009, Ralph Castain wrote: Man, was this a PITA to chase down. Finally found it, though. Fixed on trunk as of r21549 Thanks! Ralph So something else is wrong. On Jun 25, 2009, at 3:19 PM, Mostyn Lewis wrote: Just local machine - direct from the command line

Re: [OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Mostyn Lewis
. Are you using rsh, qrsh (i.e., SGE), SLURM, Torque, ? On Jun 25, 2009, at 2:54 PM, Mostyn Lewis wrote: Something like: #!/bin/ksh set -x PREFIX=$OPENMPI_GCC_SVN export PATH=$OPENMPI_GCC_SVN/bin:$PATH MCA="--mca btl tcp,self" mpicc -g -O6 mpiabort.c NPROCS=4 mpirun --prefix

Re: [OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Mostyn Lewis
tain wrote: Using what launch environment? On Jun 25, 2009, at 2:29 PM, Mostyn Lewis wrote: While using the BLACS test programs, I've seen that with recent SVN checkouts (including todays) the MPI_Abort test left procs running. The last SVN I have where it worked was 1.4a1r20936. By 1.4a1r21246 i

[OMPI users] Did you break MPI_Abort recently?

2009-06-25 Thread Mostyn Lewis
While using the BLACS test programs, I've seen that with recent SVN checkouts (including todays) the MPI_Abort test left procs running. The last SVN I have where it worked was 1.4a1r20936. By 1.4a1r21246 it fails. Works O.K. in the standard 1.3.2 release. A test program is below. GCC was used.

Re: [OMPI users] shared libraries issue compiling 1.3.1/intel 10.1.022

2009-04-10 Thread Mostyn Lewis
If you want to find libimf.so, which is a shared INTEL library, pass the library path with a -x on mpirun mpirun -x LD_LIBRARY_PATH DM On Fri, 10 Apr 2009, Francesco Pietra wrote: Hi Gus: If you feel that the observations below are not relevant to openmpi, please disregard the mes

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Mostyn Lewis
_idle 0 on your cmd line to override any internal settings. On Apr 7, 2009, at 1:32 PM, Steve Kargl wrote: On Tue, Apr 07, 2009 at 12:00:55PM -0700, Mostyn Lewis wrote: Steve, Did you rebuild 1.2.9? As I see you have static libraries, maybe there's a lurking phthread or something els

Re: [OMPI users] Factor of 10 loss in performance with 1.3.x

2009-04-07 Thread Mostyn Lewis
Steve, Did you rebuild 1.2.9? As I see you have static libraries, maybe there's a lurking phthread or something else that may have changed over time? DM On Tue, 7 Apr 2009, Steve Kargl wrote: On Tue, Apr 07, 2009 at 09:10:21AM -0700, Eugene Loh wrote: Steve Kargl wrote: I can rebuild 1.2.9

Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h

2009-03-13 Thread Mostyn Lewis
ut of configure say for you when it's checking for restrict for you? On Mar 13, 2009, at 3:07 PM, Mostyn Lewis wrote: Well George's syntax didn't work, either: "../../../.././ompi/mca/op/op.h", line 263: error: expected a ")" typedef vo

Re: [OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h

2009-03-13 Thread Mostyn Lewis
type_t **, struct ompi_op_base_module_1_0_0_t *); Thanks, george. On Mar 11, 2009, at 15:52 , Mostyn Lewis wrote: Compiling SVN r20757 with PGI 8.0-4 failed doing ompi_info with "

[OMPI users] PGI 8.0-4 doesn't like ompi/mca/op/op.h

2009-03-11 Thread Mostyn Lewis
Compiling SVN r20757 with PGI 8.0-4 failed doing ompi_info with "../../../.././ompi/mca/op/op.h", line 264: error: duplicate parameter name void *restrict, ^ "../../../.././ompi/

Re: [OMPI users] Latest SVN failures

2009-03-11 Thread Mostyn Lewis
works? Ralph On Mar 10, 2009, at 2:13 PM, Mostyn Lewis wrote: Maybe I know why now but it's not pleasant, e.g. 2 machines in the same cluster have their ethernets such as: Machine s0157 eth2 Link encap:Ethernet HWaddr 00:1E:68:DA:74:A8 BROADCAST MULTICAST MTU:1500 Metric:1

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Mostyn Lewis
l catch an "eth0" on another node where we need it. Can you give it a try and see if it works? Ralph On Mar 10, 2009, at 2:13 PM, Mostyn Lewis wrote: Maybe I know why now but it's not pleasant, e.g. 2 machines in the same cluster have their ethernets such as: Machine s0157 eth2

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Mostyn Lewis
ngs so we don't endlessly loop when that happens (IIRC, I think we are already supposed to abort, but it appears that isn't working). But the real question is why the comm fails in the first place. On Mar 10, 2009, at 10:50 AM, Mostyn Lewis wrote: Latest status - 1.4a1r20757 (yeste

Re: [OMPI users] Latest SVN failures

2009-03-10 Thread Mostyn Lewis
It looks like the system doesn't know what nodes the procs are to be placed upon. Can you run this with --display-devel-map? That will tell us where the system thinks it is placing things. Thanks Ralph On Feb 26, 2009, at 3:41 PM, Mostyn Lewis wrote: Maybe it's my pine mailer. Thi

Re: [OMPI users] Latest SVN failures

2009-02-26 Thread Mostyn Lewis
ks like, what is supposed to happen, etc? I can barely parse your cmd line... Thanks Ralph On Feb 26, 2009, at 1:03 PM, Mostyn Lewis wrote: Today's and yesterdays. 1.4a1r20643_svn + mpirun --prefix /tools/openmpi/1.4a1r20643_svn/connectx/intel64/10.1.015/openib/suse_sles_10/x86_6 4/opter

[OMPI users] Latest SVN failures

2009-02-26 Thread Mostyn Lewis
Today's and yesterdays. 1.4a1r20643_svn + mpirun --prefix /tools/openmpi/1.4a1r20643_svn/connectx/intel64/10.1.015/openib/suse_sles_10/x86_6 4/opteron -np 256 --mca btl sm,openib,self -x OMPI_MCA_btl_openib_use_eager_rdma -x OMPI_MCA_btl_ope nib_eager_limit -x OMPI_MCA_btl_self_eager_limit -x

Re: [OMPI users] v1.3: mca_common_sm_mmap_init error

2009-01-27 Thread Mostyn Lewis
Sort of ditto but with SVN release at 20123 (and earlier): e.g. [r2250_46:30018] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-sessions-mostyn@r2250_46_0/25682/1/shared_mem_pool.r2250_46 failed with errno=2 [r2250_63:05292] mca_common_sm_mmap_init: open /tmp/45139.1.all.q/openmpi-s

Re: [OMPI users] --mca btl_openib_if_include

2008-10-19 Thread Mostyn Lewis
I think the quoting counts as my own fault as I used MCA='--mca btl_openib_verbose 1 --mca btl openib,self --mca btl_openib_if_include "mlx4_0:1,mlx4_1:1"' ... mpirun ... $MCA ... DM On Sun, 19 Oct 2008, Jeff Squyres wrote: On Oct 18, 2008, at 9:17 PM, Mostyn Lewis wrote:

Re: [OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-19 Thread Mostyn Lewis
On Sun, 19 Oct 2008, Jeff Squyres wrote: On Oct 18, 2008, at 9:19 PM, Mostyn Lewis wrote: Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine to approach double the bandwidth on simple tests such as IMB PingPong? Yes. OMPI will automatically (and aggressively) use as many

[OMPI users] Can 2 IB HCAs give twice the bandwidth?

2008-10-18 Thread Mostyn Lewis
Can OpenMPI do like Scali and MVAPICH2 and utilize 2 IB HCAs per machine to approach double the bandwidth on simple tests such as IMB PingPong? Regards, DM

Re: [OMPI users] --mca btl_openib_if_include

2008-10-18 Thread Mostyn Lewis
Jeff, I traced this and it was the quote marks in "mlx4_0:1,mlx4_1:1" - they were passed in and caused the mismatch :-( Sorry about that. Regards, DM On Sat, 18 Oct 2008, Jeff Squyres wrote: On Oct 16, 2008, at 9:10 PM, Mostyn Lewis wrote: OpenMPI says for a: mpirun --pref

[OMPI users] --mca btl_openib_if_include

2008-10-16 Thread Mostyn Lewis
Hello, Using today's SVN 1.4a1r19757 with MCA='--mca btl_openib_verbose 1 --mca btl openib,self --mca btl_openib_if_include "mlx4_0:1,mlx4_1:1"' ibstatus (OFED 1.3.1) gives: Infiniband device 'mlx4_0' port 1 status: default gid: fe80::::0003:ba00:0100:71a1 base

Re: [OMPI users] Problem launching onto Bourne shell

2008-10-16 Thread Mostyn Lewis
Jeff, You broke my ksh (and I expect something else) Today's SVN 1.4a1r19757 orte/mca/plm/rsh/plm_rsh_module.c line 471: tmp = opal_argv_split("( test ! -r ./.profile || . ./.profile;", ' '); ^ ARGHH No ( tmp = opal_arg

Re: [OMPI users] Continuous poll/select using btl sm (svn 1.4a1r18899)

2008-08-18 Thread Mostyn Lewis
he SM BTL. Please upgrade to at least 19315 and [hopefully] your application will run to completion. Thanks, george. On Jul 24, 2008, at 3:39 AM, Mostyn Lewis wrote: Hello, Using a very recent svn version (1.4a1r18899) I'm getting a non- terminating condition if I use the sm bt

[OMPI users] Continuous poll/select using btl sm (svn 1.4a1r18899)

2008-07-23 Thread Mostyn Lewis
Hello, Using a very recent svn version (1.4a1r18899) I'm getting a non-terminating condition if I use the sm btl with tcp,self or with openib,self. The program is not finishing a reduce operation. It works if the sm btl is left out. Using 2 4 core nodes. Program is: -

Re: [OMPI users] compilation with intel fortran compiller problem

2008-01-26 Thread Mostyn Lewis
Yes, Intel 10.0 (and some say 10.1) have this problem with gcc 4.2.X (which it is using) - it works with 4.1.2 as in SuSE SLES 10. A workaround is to include in your flags (CXXFLAGS, presumably), the following: -D "__sync_fetch_and_add(ptr,addend)=_InterlockedExchangeAdd(const_cast(reinterpret_

Re: [OMPI users] openmpi-1.2.4-1/OFED 1.2.5.4 ConnectX MPI_Reduce hang

2008-01-25 Thread Mostyn Lewis
Using todays SVN (1.3a1r17234) and building in the context of OFED 1.2.5.4 installed and it works! Regards, Mostyn On Thu, 24 Jan 2008, Mostyn Lewis wrote: Hello, I have a very simple MPI program hanging in MPI_Reduce using the openmpi-1.2.4-1 as supplied with OFED 1.2.5.4 (running this too

[OMPI users] openmpi-1.2.4-1/OFED 1.2.5.4 ConnectX MPI_Reduce hang

2008-01-24 Thread Mostyn Lewis
Hello, I have a very simple MPI program hanging in MPI_Reduce using the openmpi-1.2.4-1 as supplied with OFED 1.2.5.4 (running this too). It works with same hardware using the supplied mvapich (mvapich-0.9.9). The hardware is a Mellanox Technologies MT25418 [ConnectX IB DDR] (rev a0) HCA (SUN/v

Re: [OMPI users] OpenMPI - can you switch off threads?

2007-11-06 Thread Mostyn Lewis
- so we have a real mix to handle. Regarding the lack of mvapi support in OpenMPI there's just udapl left for such as SilverStorm :-( Thanks for looking, Mostyn On Tue, 6 Nov 2007, Andrew Friedley wrote: Mostyn Lewis wrote: Andrew, Failure looks like: + mpirun --prefix + /tools/op

Re: [OMPI users] OpenMPI - can you switch off threads?

2007-11-06 Thread Mostyn Lewis
MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3) Regards, Mostyn On Tue, 6 Nov 2007, Andrew Friedley wrote: All thread support is disabled by default in Open MPI; the uDAPL BTL is neither thread safe nor makes use of a threaded uDAPL implementation. For completeness, the thread support is con

[OMPI users] OpenMPI - can you switch off threads?

2007-11-06 Thread Mostyn Lewis
I'm trying to build a udapl OpenMPI from last Friday's SVN and using Qlogic/QuickSilver/SilverStorm 4.1.0.0.1 software. I can get it made and it works in machine. With IB between 2 machines is fails near termination of a job. Qlogic says they don't have a threaded udapl (libpthread is in the trace

Re: [OMPI users] mca_oob_tcp_peer_try_connect: messages

2007-10-01 Thread Mostyn Lewis
leton (MCA v1.0, API v1.0, Component v1.3) MCA sds: slurm (MCA v1.0, API v1.0, Component v1.3) MCA filem: rsh (MCA v1.0, API v1.0, Component v1.3) mostyn@s0120:/ctmp8/mostyn/glamex/pi> exit Script done on Mon 01 Oct 2007 04:35:03 PM PDT On Sun, 30 Sep 2007, Mostyn Lewis wrote:

[OMPI users] mca_oob_tcp_peer_try_connect: messages

2007-09-30 Thread Mostyn Lewis
Any ideas about this. One dual core operton box talking to another using infincon/silverstorm/qlogic hardware and mvapi (actually it's the same just using ethernet and tcp): $OPENMPI_INFINICON_GCC_MVAPI/bin/mpicc cpi.c $OPENMPI_INFINICON_GCC_MVAPI/bin/-np 4 -machinefile j ./a.out [s0121:07450] [1

Re: [OMPI users] aclocal.m4 booboo?

2007-09-28 Thread Mostyn Lewis
Brian, Thanks for the reply. The combination of: libtool 1.5.23b automake 1.10 autoconf 2.61 was O.K., so it seems that libtool 2.1a from CVS on 092407 caused my hiccup. Regards, Mostyn On Fri, 28 Sep 2007, Brian Barrett wrote: On Sep 27, 2007, at 6:44 PM, Mostyn Lewis wrote: Today&#

[OMPI users] aclocal.m4 booboo?

2007-09-27 Thread Mostyn Lewis
# LTDL_CONVENIENCE # ... GNU tools used: autoconf 2.61 automake 1.10 libtool 2.1a_CVS.092407 (libtool from CVS 3 days ago) Regards, Mostyn Lewis

[OMPI users] --enable-mca-no-build broken or bad docs?

2007-09-27 Thread Mostyn Lewis
-openib,btl-gm,btl-mx,mtl-psm would parse. So, which is it? The docs or the last above? From a SVN of today. Regards, Mostyn Lewis

Re: [OMPI users] error -- libtool unsupported hardcode properties

2007-06-20 Thread Mostyn Lewis
Why not edit libtool to see what it is doing (it's just a script) - you will get a lot of output: Add a "set -x" as the second line and stand well back :-) #! /bin/sh set -x Mostyn On Wed, 20 Jun 2007, Andrew Friedley wrote: I'm not seeing anything particularly relevant in the libtool docume

Re: [OMPI users] How do you switch off paffinity?

2007-04-24 Thread Mostyn Lewis
e to bind and in that case need to switch off any supplied binding. I really wish the default was no binding like OpenMPI with docs that point out the variables but it's not always the case. Sorry again for any trub, Mostyn On Tue, 24 Apr 2007, Jeff Squyres wrote: On Apr 23, 2007, at 9:22 P

Re: [OMPI users] How do you switch off paffinity?

2007-04-23 Thread Mostyn Lewis
the end) and a taskset call gets back the mask to show if you are bound or not. Regards, Mostyn On Mon, 23 Apr 2007, Jeff Squyres wrote: On Apr 22, 2007, at 8:46 PM, Mostyn Lewis wrote: More information. --mca mpi_paffinity_alone 0 Can you describe how you're verifying that the proce

[OMPI users] Bug in orte/mca/pls/rsh/pls_rsh_module.c

2007-04-23 Thread Mostyn Lewis
After 1.3a1r14155 (not sure how much after but certainly currently) you get a SEGV if you use an unknown shell (I use something called ksh93). Error is at lines 576->580 if ( i == ORTE_PLS_RSH_SHELL_UNKNOWN ) { opal_output(0, "WARNING: local probe returned unhandled shell:%s

Re: [OMPI users] How do you switch off paffinity?

2007-04-22 Thread Mostyn Lewis
mca_base_param_reg_int_name call in ompi_mpi_params.c you don't check the return code, so there may be junk in value? However, I don't see that explicitly setting --mca mpi_paffinity_alone 0 would fail? Regards, Mostyn P.S. I hope this doesn't seem too presumptious. On Sun, 22 Apr 2

[OMPI users] How do you switch off paffinity?

2007-04-22 Thread Mostyn Lewis
Using a lateish SVN, 1.3a1r14155 (circa April 1st), on a SuSE SLES10 opteron system which has 2 dual core opterons per node, I can't seem to disable processor affinity? I have a small test program which call something else to detect CPU binding and whatever I've done it's still bound? tried: --m

Re: [OMPI users] openmpi, mx

2006-11-23 Thread Mostyn Lewis
I believe this is "too many open files". ulimit -n some_number Regards, Mostyn On Wed, 22 Nov 2006, Lydia Heck wrote: I have - again - successfully built and installed mx and openmpi and I can run 64 and 128 cpus jobs on a 256 CPU cluster version of openmpi is 1.2b1 compiler used: studio11

Re: [O-MPI users] Error on mpirun in Redhat Fedora Core 4

2005-11-16 Thread Mostyn Lewis
I get this on rh9 ONLY if I leave out a -hostfile option on mpirun, otherwise it works fine. This is an old Red Hat. Regards, Mostyn On Wed, 16 Nov 2005, Jeff Squyres wrote: Clement -- Sorry for the delay in replying. We're running around crazy here at SC, which pretty much keeps us away fr