Re: [OMPI devel] trunk build failure on Altix [w/ WORK AROUND]

2012-02-17 Thread Paul H. Hargrove
I've poked enough at the ompi configure magic to *think* I understand the source of the problem I've seen w/ both trunk and 1.5.x on the Altix. The problem appears to be that both timer/altix/configure.m4 and timer/linux/configure.m4 are setting the value of $timer_base_include and the LAST on

Re: [OMPI devel] OPAL_ENABLE_FT_CR build broken in 1.5 branch

2012-02-17 Thread Jeff Squyres
Thanks -- I (think I) fixed in https://svn.open-mpi.org/trac/ompi/ticket/3020. I threw it to Josh and Samuel to review. On Feb 17, 2012, at 12:33 AM, Paul H. Hargrove wrote: > I've tried to build from both the 1.5 and trunk nightly tarballs configured > with "--enable-ft=cr --with-blcr=" .

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Jeff Squyres
On Feb 17, 2012, at 11:54 AM, Ralph Castain wrote: > All that said, I think using the WHOLE_SYSTEM flag is actually incorrect. I think we want to continue using WHOLE_SYSTEM. There are definite uses for it (being able to look around the machine beyond where you may or may not be bound, such a

[OMPI devel] excessive warnings on some BSDs [w/ PATCH]

2012-02-17 Thread Paul H. Hargrove
When building trunk or 1.5.x on OpenBSD-5.0 (and maybe others), I get *LOTS* of the following: /usr/include/arpa/inet.h:74: warning: 'struct in_addr' declared inside parameter list /usr/include/arpa/inet.h:74: warning: its scope is only this definition or declaration, which is probably not what

[OMPI devel] Solaris/SOS build failure in trunk

2012-02-17 Thread Paul H. Hargrove
Building last night's trunk tarball (1.7a1r25944) On Solaris10/SPARC w/ Solaris Studio compilers if failing in "make check". This same problem is presenr with the 12.2 and 12.3 compilers and both v8plus and v9 ABIs: Making check in util make opal_bit_ops opal_path_nfs opal_sos CC opal_

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Ralph Castain
On Fri, Feb 17, 2012 at 8:47 AM, Brice Goglin wrote: > Le 17/02/2012 14:59, Jeff Squyres a écrit : > > On Feb 17, 2012, at 8:21 AM, Ralph Castain wrote: > > > >>> I didn't follow this entire thread in details, but I am feeling that > something is wrong here. The flag fixes your problem indeed, bu

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Brice Goglin
Le 17/02/2012 14:59, Jeff Squyres a écrit : > On Feb 17, 2012, at 8:21 AM, Ralph Castain wrote: > >>> I didn't follow this entire thread in details, but I am feeling that >>> something is wrong here. The flag fixes your problem indeed, but I think it >>> may break binding too. It's basically maki

Re: [OMPI devel] trunk build failure on NetBSD-5.0

2012-02-17 Thread Ralph Castain
Done - thanks! On Fri, Feb 17, 2012 at 1:29 AM, Paul Hargrove wrote: > I've confirmed that NO similar problem is present in the 1.5 branch. > > -Paul > > > On Fri, Feb 17, 2012 at 12:49 AM, Paul Hargrove wrote: > >> The following small patch was require to build the ompi trunk on >> NetBSD-5.0.

Re: [OMPI devel] Building otfcompress with binutils-gold fails

2012-02-17 Thread Dmitri Gribenko
Hi Jeff, On Fri, Feb 17, 2012 at 1:14 PM, Jeff Squyres wrote: > Are you compiling a nightly 1.7/trunk tarball, or an SVN checkout where you > ran autogen.pl? I was compiling an SVN checkout. > The 1.7 nightly tarballs are built with the latest Libtool (2.4.2).  If that > isn't doing the Right

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Ralph Castain
I took a closer look at this, and I think we're getting ourselves confused by the rather large differences between what is on the trunk vs the 1.5 branch. The trunk is doing the "am I bound" calculation correctly - it gets the cpubind bitmask and compares it to the allowed/available cpus. The 1.5

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Jeff Squyres
On Feb 16, 2012, at 8:16 AM, nadia.der...@bull.net wrote: > Could you please move it to v1.5 (do I need to fill a CMR)? Just to clarify - you're asking for the patch to set WHOLE_SYSTEM when we load the hwloc topology, right? If so, please file a CMR. Note that there's some differences betwee

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Jeff Squyres
On Feb 17, 2012, at 6:18 AM, nadia.der...@bull.net wrote: > But I have to look for the proper option in slurm: I don't know if slurm > allows for such a fine grained allocation. I have to look for the option > that enables to allocate socket X (X!=0). FWIW, you might just want to run a job tha

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Jeff Squyres
On Feb 17, 2012, at 8:21 AM, Ralph Castain wrote: >> I didn't follow this entire thread in details, but I am feeling that >> something is wrong here. The flag fixes your problem indeed, but I think it >> may break binding too. It's basically making all "unavailable resources" >> available. So t

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Ralph Castain
On Thu, Feb 16, 2012 at 11:36 PM, Brice Goglin wrote: > ** > Le 16/02/2012 14:16, nadia.der...@bull.net a écrit : > > Hi Jeff, > > Sorry for the delay, but my victim with 2 ib devices had been stolen ;-) > > So, I ported the patch on the v1.5 branch and finally could test it. > > Actually, there i

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread nadia . derbey
devel-boun...@open-mpi.org wrote on 02/17/2012 08:36:54 AM: > De : Brice Goglin > A : de...@open-mpi.org > Date : 02/17/2012 08:37 AM > Objet : Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see > processes as bound if the job has been launched by srun > Envoyé par : devel-boun...@open-

Re: [OMPI devel] Building otfcompress with binutils-gold fails

2012-02-17 Thread Jeff Squyres
Are you compiling a nightly 1.7/trunk tarball, or an SVN checkout where you ran autogen.pl? It strikes me that if binutils-gold is different enough, then Libtool must have been updated to match it. The 1.7 nightly tarballs are built with the latest Libtool (2.4.2). If that isn't doing the Rig

Re: [OMPI devel] trunk build failure on NetBSD-5.0

2012-02-17 Thread Paul Hargrove
I've confirmed that NO similar problem is present in the 1.5 branch. -Paul On Fri, Feb 17, 2012 at 12:49 AM, Paul Hargrove wrote: > The following small patch was require to build the ompi trunk on > NetBSD-5.0. > I am not sure if this was the proper header in which to add this include, > but it

[OMPI devel] trunk build failure on OpenBSD-5.0

2012-02-17 Thread Paul Hargrove
OpenBSD lacks an aio.h header. configure knows this: > $ grep aio.h configure.log > checking aio.h usability... no > checking aio.h presence... no > checking for aio.h... no Yet fbtl/posix is enabled, despite needing aio.h: > checking if MCA component fbtl:posix can compile... yes I am guessi

[OMPI devel] trunk build failure on NetBSD-5.0

2012-02-17 Thread Paul Hargrove
The following small patch was require to build the ompi trunk on NetBSD-5.0. I am not sure if this was the proper header in which to add this include, but it is the first one that needs "struct timeval". -Paul --- openmpi-1.7a1r25944/opal/dss/dss_types.h~ 2012-02-17 00:30:09.0 -0800 +++

Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-17 Thread Brice Goglin
Le 16/02/2012 14:16, nadia.der...@bull.net a écrit : > Hi Jeff, > > Sorry for the delay, but my victim with 2 ib devices had been stolen ;-) > > So, I ported the patch on the v1.5 branch and finally could test it. > > Actually, there is no opal_hwloc_base_get_topology() in v1.5 so I had > to set >

[OMPI devel] OPAL_ENABLE_FT_CR build broken in 1.5 branch

2012-02-17 Thread Paul H. Hargrove
I've tried to build from both the 1.5 and trunk nightly tarballs configured with "--enable-ft=cr --with-blcr=" . I am using Intel compilers on Linux/x86. The trunk was fine, but on the 1.5 branch I see the build fail with: Making all in mca/btl/sm make[2]: Entering directory `/home/pcp1/ph