[OMPI devel] How do you generate your FAQ pages?

2022-12-20 Thread Paul H. Hargrove via devel
Sorry for the somewhat off-topic question:
What tool(s) are you using to generate web pages for your wonderfully
organized FAQ?

-Paul

-- 
Paul H. Hargrove 
Pronouns: he, him, his
Computer Languages & Systems Software (CLaSS) Group
Computer Science Department
Lawrence Berkeley National Laboratory


Re: [hwloc-devel] [PATCH] Use plain "inline" in C++

2012-05-10 Thread Paul H. Hargrove

FWIW:
GASNet makes the assumption that every C++ compiler groks "inline" and 
has never encountered any counter-examples.


-Paul

On 5/9/2012 8:54 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 10/05/12 07:40, Jeff Squyres wrote:


Huh -- really?  I always thought that the C++ language itself
included the keyword "inline".

I asked via Twitter and got these responses..

# Inline was part of C++98 - the first c++ standard, and
# the inline kwd is in the cfront 1.0 ('86) source. So
# functionally, yes.

...and...

# This may be a different question than "have all C++
# compilers always accepted inline?"


I note that autoconf has an inline test for C:

http://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/C-Compiler.html

But not for C++:

http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/C_002b_002b-Compiler.html

So perhaps the fact that they've never needed to implement
such a test is in itself a good guide ?

cheers,
Chris
- -- 
 Christopher Samuel - Senior Systems Administrator

  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk+rPAoACgkQO2KABBYQAh+fqwCfbsCOjeK5y+WEZnWQ1e+pQmQg
DhQAoJdN6S7IJpUZ51IlXbE0QJOI1jjI
=dWPv
-END PGP SIGNATURE-
___
hwloc-devel mailing list
hwloc-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] lstopo-nox strikes back

2012-04-27 Thread Paul H. Hargrove



On 4/27/2012 10:39 AM, Brice Goglin wrote:

Le 27/04/2012 19:22, Samuel Thibault a écrit :

Brice Goglin, le Fri 27 Apr 2012 19:09:47 +0200, a écrit :

Le 25/04/2012 15:42, Jiri Hladky a écrit :

I would vote to make lstopo ASCII only and introduce new GUI binary
"lstopo-gui" in the version 1.5

I'll commit that during the weekend unless somebody comes with a better
solution.

Of course, distros are free to add symlinks as Xlstopo then :)

Xfoo is kinda reserved for X servers, not for X applications :)


Ok let's put a X server inside hwloc then.



No, Xlstopo should be for showing me the logical->physical layout of 
screens on a multi-headed X server, right?


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] BGQ empty topology with MPI

2012-03-22 Thread Paul H. Hargrove

From the same machine that Dan is using:

{hargrove@cetuslac1 ~}$ mpicc -v
mpicc for MPICH2 version 1.4.1p1
[...hairy details omitted...]
gcc version 4.4.6 (BGQ-dev-120305)

-Paul

On 3/22/2012 7:43 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 22/03/12 20:58, Brice Goglin wrote:


So there's something strange going on when MPI is added. Which MPI
are using? Is this a derivative of MPICH that embeds hwloc? (MPICH

= 1.2.1 if I remember correctly)

Not sure about BG/Q, but BG/P uses code derived from MPICH2 according
to: http://wiki.bg.anl-external.org/index.php/Main_Page

Our BG/P seems to claim it's from MPICH2 1.1:

samuel@tambo:~>  mpicc -v
mpicc for 1.1

cheers,
Chris
- -- 
 Christopher Samuel - Senior Systems Administrator

  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9r42cACgkQO2KABBYQAh9mbwCeOYrI5bsk/XOiXFl128BksV2D
SR4An1bs09e2lpyYadABbaRIG2dtg7Fr
=ucpF
-END PGP SIGNATURE-
___
hwloc-devel mailing list
hwloc-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] Open MPI nightly tarballs suspended / 1.5.5rc3

2012-02-28 Thread Paul H. Hargrove



On 2/28/2012 5:09 PM, Christopher Samuel wrote:

On 29/02/12 07:44, Jeffrey Squyres wrote:


>  - BlueGene fixes

rc3 fixes the builds on our front end node, thanks!


And on a BG/L (not a typo) front-end too, where the same problem existed 
in prior versions.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] typo in a copyright message

2012-02-28 Thread Paul H. Hargrove

By chance I noticed the following in the trunk:

Index: ompi-trunk/orte/mca/rml/oob/rml_oob_component.c
===
--- ompi-trunk/orte/mca/rml/oob/rml_oob_component.c (revision 26069)
+++ ompi-trunk/orte/mca/rml/oob/rml_oob_component.c (working copy)
@@ -11,7 +11,7 @@
  * Copyright (c) 2004-2005 The Regents of the University of California.
  * All rights reserved.
  * Copyright (c) 2007  Cisco Systems, Inc.  All rights reserved.
- * Copyright (c) 2011  Los Alamos Nation Security, LLC.
+ * Copyright (c) 2011  Los Alamos National Security, LLC.
  * All rights reserved.
  * $COPYRIGHT$
  *


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] Open MPI nightly tarballs suspended / 1.5.5rc3

2012-02-28 Thread Paul H. Hargrove
Testing 1.5.5rc3 on a "representative sampling" of my many platforms 
looks good.
In particular, I've retested various platforms that showed any 
significant problems previously and found them to be fixed.


Though minor, I do see that the following patches I've posted are not 
applied

+ Add a Mellanox PCI vendor ID to the device params file
http://www.open-mpi.org/community/lists/devel/2012/02/10615.php
Posted 13 hours ago and not yet on trunk
+ Fix show_help_lex.l to avoid undefined behavior (and silence 
associated warning from flex)

http://www.open-mpi.org/community/lists/devel/2012/02/10521.php
Was applied to trunk as r25983
+ Reorder includes to avoid "'struct in_addr' declared inside parameter 
list" warnings

http://www.open-mpi.org/community/lists/devel/2012/02/10484.php
Was applied to trunk as r25984
Sorry if I've messed an exiting CMR for those last two.
No big deal if these are held back for v1.6, but figured I mention them 
in case their exclusion was unintended.


I am assuming there is no interest in the MIPS atomics fixes, or the 
PPC64 atomics work-around for an XLC bug.

MIPS 1of2: http://www.open-mpi.org/community/lists/devel/2012/02/10416.php
MIPS 2of2: http://www.open-mpi.org/community/lists/devel/2012/02/10417.php
PPC64/XLC: http://www.open-mpi.org/community/lists/devel/2012/02/10603.php
If there *is* interest in these, let me know if there is any assistance 
I can lend.


-Paul

On 2/28/2012 12:44 PM, Jeffrey Squyres wrote:

There is a serious chilled water issue at IU right now; all non-essential servers 
(including Open MPI's nightly build server) have been turned off.  So we have no new 
"official" 1.5.5 RC, and no new nightlies will be produced until further notice.

However, to keep the 1.5.5 release train going, I've made an "unofficial" 
1.5.5rc3 and posted it in the usual location:

 http://www.open-mpi.org/software/ompi/v1.5/

Note that since there are no nightly tarballs, this rc will be farther along 
than the latest 1.5 nightly until the nightlies are resumed.

Changes since 1.5.5rc2:

- Removed the ofud BTL
- Updates to README and some copyright notices
- Fix the lt_dladvise search that caused VPATH weirdness
- Removed the pcie mpool
- Bring in some upstream hwloc v1.3 fixes
- VT updates:
   - non-GNU compiler _FORTIFY_SOURCE fixes
   - VT-specific CXXFLAGS
   - BlueGene fixes
- Fix processor affinity for some old/weird platforms



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] 1.5.5rc2 missing a Mellanox PCI vendor ID

2012-02-27 Thread Paul H. Hargrove
Testing 1.5.5rc2, I see warnings about an unknown IB HCA unless I make 
the following simple addition:


--- ompi-v1.5/ompi/mca/btl/openib/mca-btl-openib-device-params.ini  
(revision 26056)
+++ ompi-v1.5/ompi/mca/btl/openib/mca-btl-openib-device-params.ini  
(working copy)

@@ -127,7 +127,7 @@
 

 [Mellanox Tavor Infinihost]
-vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba
+vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3
 vendor_part_id = 23108
 use_eager_rdma = 1
 mtu = 1024


This one-line patch applies equally to v1.5 and to the trunk.
I suspect that this vendor ID should be added to the Arbel and Sinai HCA 
entries as well.

It is already listed for Hermon.

-Paul

On 2/23/2012 5:17 AM, Jeffrey Squyres wrote:

We finally have 1.5.5rc2:

 http://www.open-mpi.org/software/ompi/v1.5/

Given the amount of testing we've had, this rc might actually be pretty close.  
Lots and lots of changes since rc1; I'm not even going to bother to list them 
all.

Please test!



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5.5rc2 atomics fail w/ xlc-9

2012-02-24 Thread Paul H. Hargrove

OK, this is NOT an issue for v1.5.5, IMHO.
I was mistaken about the ppc atomics having an error that could impact 
builds with gcc.

The problems I've seen with xlc-9.0 turn out to be just a plain xlc bug.

When the asm takes as an argument the address of a signed 32-bit int, 
the compiler is incorrectly sign-extending the address (probably under 
the mistaken belief that it is manipulating the pointed-to type).  For 
the ILP32 ABI that is not a problem.  For the LP64 ABI, pointers get 
trashed by this incorrect operation.  The attached patch works-around 
this bug by conditionally inserting a cast, and I believe it should 
apply cleanly to both v1.5 branch (for v1.6) and to the trunk.


-Paul

On 2/24/2012 5:46 PM, Paul H. Hargrove wrote:

Hmm, I was certain I knew what was wrong, but the tests still fail.
Nobody should hold their breath waiting for my patches, but I am still 
investigating.


*IF* I can determine that I am right about the asm allowing gcc to 
generate bad code then I think this is important for 1.5.5.

Otherwise, I think this is a 1.6 issue.

-Paul

On 2/24/2012 5:19 PM, Paul H. Hargrove wrote:

I see now why I get "check" failures from the opal atomics w/ XLC-9.0.
The inline asm is mildly incorrect and I am actually surprised gcc 
didn't produce bad code.


Patch(es) will be sent ASAP as I think this should be fixed for 1.5.5.

-Paul

On 2/23/2012 8:24 PM, Paul H. Hargrove wrote:
This is consistent with my findings w/ XLC (mostly on BG/L and BG/P 
front end nodes).
None of the 7.0,  8.0, 9.0 or 11.1 versions of XLC I tested could 
generate correct atomics.

They either failed at build time, or failed the tests in test/asm/.

-Paul


On 2/23/2012 8:17 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 15:12, Christopher Samuel wrote:


I suspect this is irrelevant, but I got a build failure trying to
compile it on our BG/P front end node (login node) with the IBM XL
compilers.

Oops, forgot how I built it..

export
PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH 



CC=xlc CXX=xlC F77=xlf ./configure&&  make

- --  Christopher Samuel - Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD
rdcAni+dfEMhlqMzYMILn8jeS9yWlInu
=+rA4
-END PGP SIGNATURE-
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel








--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

--- openmpi-1.5.5rc3r26035/opal/include/opal/sys/powerpc/atomic.h~  
2012-02-25 01:15:24.550922758 +
+++ openmpi-1.5.5rc3r26035/opal/include/opal/sys/powerpc/atomic.h   
2012-02-25 02:37:39.229857857 +
@@ -117,6 +117,14 @@
  */
 #if OMPI_GCC_INLINE_ASSEMBLY

+#ifdef __xlC__
+/* work-around bizzare xlc bug in which it sign-extends
+   a pointer to a 32-bit signed integer */
+#define OPAL_ASM_ADDR(a) ((uintptr_t)a)
+#else
+#define OPAL_ASM_ADDR(a) (a)
+#endif
+
 static inline int opal_atomic_cmpset_32(volatile int32_t *addr,
 int32_t oldval, int32_t newval)
 {
@@ -130,7 +138,7 @@
  "   bne-1b \n\t"
  "2:"
  : "=" (ret), "=m" (*addr)
- : "r" (addr), "r" (oldval), "r" (newval), "m" (*addr)
+ : "r" OPAL_ASM_ADDR(addr), "r" (oldval), "r" 
(newval), "m" (*addr)
  : "cc", "memory");

return (ret == oldval);
@@ -249,7 +257,7 @@
  "subfic r9,r5,0\n\t"
  "adde %0,r9,r5 \n\t"
  : "=" (ret)
- : "r"(addr), 
+ : "r"OPAL_ASM_ADDR(addr), 
"m"(oldval), "m"(newval)
  : "r4", "r5", "r9", "cc", "memory");

@@ -297,7 +305,7 @@
 " stwcx.  %0, 0, %3\n\t"
 " bne-1b   \n\t"
 : "=" (t), "=m" (*v)
-: &quo

Re: [OMPI devel] 1.5.5rc2 atomics fail w/ xlc-9

2012-02-24 Thread Paul H. Hargrove

Hmm, I was certain I knew what was wrong, but the tests still fail.
Nobody should hold their breath waiting for my patches, but I am still 
investigating.


*IF* I can determine that I am right about the asm allowing gcc to 
generate bad code then I think this is important for 1.5.5.

Otherwise, I think this is a 1.6 issue.

-Paul

On 2/24/2012 5:19 PM, Paul H. Hargrove wrote:

I see now why I get "check" failures from the opal atomics w/ XLC-9.0.
The inline asm is mildly incorrect and I am actually surprised gcc 
didn't produce bad code.


Patch(es) will be sent ASAP as I think this should be fixed for 1.5.5.

-Paul

On 2/23/2012 8:24 PM, Paul H. Hargrove wrote:
This is consistent with my findings w/ XLC (mostly on BG/L and BG/P 
front end nodes).
None of the 7.0,  8.0, 9.0 or 11.1 versions of XLC I tested could 
generate correct atomics.

They either failed at build time, or failed the tests in test/asm/.

-Paul


On 2/23/2012 8:17 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 15:12, Christopher Samuel wrote:


I suspect this is irrelevant, but I got a build failure trying to
compile it on our BG/P front end node (login node) with the IBM XL
compilers.

Oops, forgot how I built it..

export
PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH 



CC=xlc CXX=xlC F77=xlf ./configure&&  make

- --  Christopher Samuel - Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD
rdcAni+dfEMhlqMzYMILn8jeS9yWlInu
=+rA4
-END PGP SIGNATURE-
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] 1.5.5rc2 atomics fail w/ xlc-9

2012-02-24 Thread Paul H. Hargrove

I see now why I get "check" failures from the opal atomics w/ XLC-9.0.
The inline asm is mildly incorrect and I am actually surprised gcc 
didn't produce bad code.


Patch(es) will be sent ASAP as I think this should be fixed for 1.5.5.

-Paul

On 2/23/2012 8:24 PM, Paul H. Hargrove wrote:
This is consistent with my findings w/ XLC (mostly on BG/L and BG/P 
front end nodes).
None of the 7.0,  8.0, 9.0 or 11.1 versions of XLC I tested could 
generate correct atomics.

They either failed at build time, or failed the tests in test/asm/.

-Paul


On 2/23/2012 8:17 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 15:12, Christopher Samuel wrote:


I suspect this is irrelevant, but I got a build failure trying to
compile it on our BG/P front end node (login node) with the IBM XL
compilers.

Oops, forgot how I built it..

export
PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH 



CC=xlc CXX=xlC F77=xlf ./configure&&  make

- --  Christopher Samuel - Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD
rdcAni+dfEMhlqMzYMILn8jeS9yWlInu
=+rA4
-END PGP SIGNATURE-
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5.5rc2

2012-02-24 Thread Paul H. Hargrove
Sorry, I just realized there was fair amount of context missing from my 
previous post:


The fix that Mattias committed as r26042 on the trunk is intended to 
correct the improper auto-detection of BG/P (or /L) when one is building 
for the front-end.  My suggested --with-platform=linux is a WORK-AROUND 
to allow testing w/o waiting for the CMR to be processed.


-Paul

On 2/24/2012 1:14 PM, Paul H. Hargrove wrote:

Christopher,

Just wanted to note that when you build like this on the BG/P front 
end, VT is detecting the BG/P environment and so trying to build for 
the BG/P compute node, meanwhile OMPI is building for the front-end 
node.  (Somebody correct me if I've misunderstood).


So, you may want to configure with
--with-contrib-vt-flags="--with-platform=linux"
to test a VT build for the Linux front-end.

-Paul

On 2/23/2012 8:17 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 15:12, Christopher Samuel wrote:


I suspect this is irrelevant, but I got a build failure trying to
compile it on our BG/P front end node (login node) with the IBM XL
compilers.

Oops, forgot how I built it..

export
PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH 



CC=xlc CXX=xlC F77=xlf ./configure&&  make

- --  Christopher Samuel - Senior Systems Administrator
  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD
rdcAni+dfEMhlqMzYMILn8jeS9yWlInu
=+rA4
-END PGP SIGNATURE-
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5.5rc2

2012-02-24 Thread Paul H. Hargrove

Christopher,

Just wanted to note that when you build like this on the BG/P front end, 
VT is detecting the BG/P environment and so trying to build for the BG/P 
compute node, meanwhile OMPI is building for the front-end node.  
(Somebody correct me if I've misunderstood).


So, you may want to configure with
--with-contrib-vt-flags="--with-platform=linux"
to test a VT build for the Linux front-end.

-Paul

On 2/23/2012 8:17 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 15:12, Christopher Samuel wrote:


I suspect this is irrelevant, but I got a build failure trying to
compile it on our BG/P front end node (login node) with the IBM XL
compilers.

Oops, forgot how I built it..

export
PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH

CC=xlc CXX=xlC F77=xlf ./configure&&  make

- -- 
 Christopher Samuel - Senior Systems Administrator

  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD
rdcAni+dfEMhlqMzYMILn8jeS9yWlInu
=+rA4
-END PGP SIGNATURE-
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5.5rc2

2012-02-23 Thread Paul H. Hargrove
This is consistent with my findings w/ XLC (mostly on BG/L and BG/P 
front end nodes).
None of the 7.0,  8.0, 9.0 or 11.1 versions of XLC I tested could 
generate correct atomics.

They either failed at build time, or failed the tests in test/asm/.

-Paul


On 2/23/2012 8:17 PM, Christopher Samuel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

On 24/02/12 15:12, Christopher Samuel wrote:


I suspect this is irrelevant, but I got a build failure trying to
compile it on our BG/P front end node (login node) with the IBM XL
compilers.

Oops, forgot how I built it..

export
PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH

CC=xlc CXX=xlC F77=xlf ./configure&&  make

- -- 
 Christopher Samuel - Senior Systems Administrator

  VLSCI - Victorian Life Sciences Computation Initiative
  Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545
  http://www.vlsci.unimelb.edu.au/

-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.11 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/

iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD
rdcAni+dfEMhlqMzYMILn8jeS9yWlInu
=+rA4
-END PGP SIGNATURE-
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-23 Thread Paul H. Hargrove
Sorry folks.  That was intended just for Jeff's eyes, but my fingers 
moved faster than my brain.

No offense was intended.

-Paul

On 2/23/2012 10:01 AM, Paul H. Hargrove wrote:



I think the VT folks get blamed often enough for build issues w/o 
attributing one more problem to them.


On 2/23/2012 9:47 AM, Jeffrey Squyres wrote:

Cool; thanks for setting this straight.




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-23 Thread Paul H. Hargrove



I think the VT folks get blamed often enough for build issues w/o 
attributing one more problem to them.


On 2/23/2012 9:47 AM, Jeffrey Squyres wrote:

Cool; thanks for setting this straight.


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-23 Thread Paul H. Hargrove

Just a Minor correction.  Instead of:

- The C++ part of the build (VT) is deep within the OMPI build; it works 
fine with the C compiler all the way up until that point


The correct facts are:

- The C++ part of the VT build requires CXXFLAGS=-library=stlport4 when 
using the SS12 compilers.
- Addition of that flag leads to the reported error when compiling 
ompi/mpi/cxx/file.cc (NOT in VT)


-Paul

On 2/23/2012 7:23 AM, Jeffrey Squyres wrote:

Terry and I talked about this on the phone.  Supporting facts (some of these 
are repeated from Paul's prior posts):

- This happens with the C++ SS 12.2 compiler on supported Linux platforms
- The C++ part of the build (VT) is deep within the OMPI build; it works fine 
with the C compiler all the way up until that point
- /usr/include/sys/types.h typedefs u_char, and is directly included in event.h
- So SS 12.2/C++ is somehow mucking up  to make that typedef not 
be available
- The upgrade from 12.2 to 12.3 is a free download

This feels like a SS 12.2 C++ compiler bug to me.  And it's free to upgrade to 
a version that does not have this problem.  Hence, this has become a README 
note.

The road to v1.5.5 just got a little shorter.



On Feb 22, 2012, at 3:16 PM, Paul H. Hargrove wrote:


I think I have the beginning of a fix for this issue.

I had not even noticed earlier that the error in event.h is from the C++
compiler, when compiling file.cxx in the c++ bindings.  That makes the
vendor-specific addition of "-library=stlport4" to CXXFLAGS quite
relevant to the problem/solution.

It eventually occurred to me that when VT's sub-configure told me to add
configure arguments, I could have used --with-contrib-vt-flags to pass
that ONLY to VT and perhaps NOT mess with whatever karma was providing
the definition of u_char.  However, when I tried that I was disappointed
to find that the bit of configure logic that suggests/requires
CXXFLAGS=-library=stlport4 (from ompi/contrib/vt/configure.m4) runs
BEFORE the processing of --with-contrib-vt-flags.  So, that was a dead end.

So, the next idea was to look for a fix specific to sltport.  I tried
adding near the top of opal/event/event.h (after the WINDOWS equivalent):

#ifdef STLPORT
typedef unsigned char u_char;
#endif

That managed to clear up the original problem w/ SS12.2.
With SS12.3, things also built fine.
This suggests the typedef is not "conflicting" with whatever other defn
was present.
I think the "safety" of this needs to be examined more widely before
this can be adopted.
My concern is that some system could "typedef char u_char" if it has
char unsigned by default, leading to a conflict.
Now that would, I suppose, only be a problem if STLPORT is also defined.
So, maybe I am over thinking this.

-Paul

On 2/21/2012 11:10 PM, Paul H. Hargrove wrote:

More notes:

I've tested ompi-1.5.4 and it has the same problem.  So, this is NOT a
regression.

Terry D. has observed that Ubuntu is NOT a supported platform for the
Solaris Studio compilers.
So, I've reproduced on a Scientific Linux 5.5 platform (Red Hat
Enterprise Linux 5.5 clone, like CentOS) to be sure that was NOT the
cause.

When I configure for the SS12.x compilers, I've been passing
CXXFLAGS="-library=stlport4" as the VT sub-configure has informed me I
should, due to something wrong the the default STL.  I tried dropping
that from configure, and THE BUILD WAS SUCCESSFUL.

So, one has 2 choices:
+ build w/ SS12.2 without VT
+ update to SS12.3 and have VT

I don't think there is sufficient reason to delay 1.5.5 for this.

-Paul

On 2/21/2012 4:39 PM, Paul H. Hargrove wrote:

A few things to note:

1) This is NOT a problem w/ the SS12.3 compilers on the same machine.
So, one could say "upgrade your compiler" (a free download) and not
delay 1.5.5 for this issue.

2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2
and SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus)

3) Testing the trunk I DON'T see the problem with either SS12.2 or
SS12.3.
This is interesting, because it probably means that a u_char
definition is SOMEWHERE in the headers (because libevent *is* getting
built).

Whatever else may be done, I think this should be fixed "properly"
(whatever that may equate to) for 1.6.
The way I see it now, it feels like OMPI is getting a definition of
u_char only "by accident".

-Paul

On 2/21/2012 12:16 PM, Paul H. Hargrove wrote:

Building the v1.5 branch on Linux with the Solaris Studio 12.2
compilers I see the following failure:

"[srcdir]/opal/event/event.h", line 797: Error: Type name expected
instead of "u_char".
"[srcdir]/opal/event/event.h", line 798: Error: Type name expected
instead of "u_char".
"[srcdir]/opal/event/event.h", line 1184: Error: "," expected
instead of "*".

Where line 1184 is a prototype containing "u_char *".

As far as I can find, only sever

Re: [OMPI devel] 1.5 supported systems

2012-02-22 Thread Paul H. Hargrove
I can get exact info from my MacOS 10.7 machine later, but its gcc is 
llvm-gcc-4.2 IIRC.

Here are my 10.5 and 10.6:

ProductName:Mac OS X
ProductVersion: 10.5.8
BuildVersion:   9L31a
powerpc
lrwxr-xr-x  1 root  wheel   7 Nov  1  2008 /usr/bin/gcc -> gcc-4.0
-r-xr-xr-x  1 root  wheel  258368 Feb 19  2008 /usr/bin/gcc-3.3
-rwxr-xr-x  1 root  wheel   93088 Jul 17  2008 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  105680 May 18  2008 /usr/bin/gcc-4.2

ProductName:Mac OS X
ProductVersion: 10.5.8
BuildVersion:   9L30
i386
lrwxr-xr-x  1 root  wheel  7 Nov  8  2007 /usr/bin/gcc -> gcc-4.0
-rwxr-xr-x  1 root  wheel  93072 Sep 23  2007 /usr/bin/gcc-4.0

ProductName:Mac OS X
ProductVersion: 10.6.8
BuildVersion:   10K549
i386
lrwxr-xr-x  1 root  wheel   7 Sep 29  2009 /usr/bin/gcc -> gcc-4.2
-rwxr-xr-x  1 root  wheel   97392 May 18  2009 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  166128 May 18  2009 /usr/bin/gcc-4.2


On 2/22/2012 6:13 PM, Larry Baker wrote:

Paul,

Haven't you been running Intel compilers on OS X?

Also, do we have specifics about which gcc's on Mac OS X?  I have (OS 
X 10.5.8):



savaii:~ baker$ ls -l /usr/bin/gcc*
lrwxr-xr-x  1 root  wheel   7 Oct  2  2009 /usr/bin/gcc -> gcc-4.0
-r-xr-xr-x  1 root  wheel  258368 Feb 19  2008 /usr/bin/gcc-3.3
-rwxr-xr-x  1 root  wheel   93088 Feb  5  2009 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  105680 Apr 27  2009 /usr/bin/gcc-4.2



savaii:~ baker$ ls -l /usr/bin/cc*
lrwxr-xr-x  1 root  wheel  7 Oct  2  2009 /usr/bin/cc -> gcc-4.0



savaii:~ baker$ ls /Developer/usr/llvm-gcc-4.2/bin/*cc*
/Developer/usr/llvm-gcc-4.2/bin/i686-apple-darwin9-llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin9-llvm-gcc-4.2


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov <mailto:ba...@usgs.gov>

On 22 Feb 2012, at 5:55 PM, Paul H. Hargrove wrote:

Folks at Oracle should decide, but I suspect "Solaris 10" should be 
updated to "Solaris 10 and 11", or just "11".


-Paul

On 2/22/2012 2:44 PM, Jeffrey Squyres wrote:

Please verify this list of supported systems for the v1.5.5 release:

- The run-time systems that are currently supported are:
  - rsh / ssh
  - LoadLeveler
  - PBS Pro, Open PBS, Torque
  - Platform LSF (v7.0.2 and later)
  - SLURM
  - Cray XT-3, XT-4, and XT-5
  - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008)

- Systems that have been tested are:
  - Linux (various flavors/distros), 32 bit, with gcc, and Oracle
Solaris Studio 12
  - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft,
Intel, Portland, and Oracle Solaris Studio 12 compilers (*)
  - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
Absoft compilers (*)
  - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with
Oracle Solaris Studio 12

  (*) Be sure to read the Compiler Notes, below.

- Other systems have been lightly (but not fully tested):
  - Other 64 bit platforms (e.g., Linux on PPC64)
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008);
see the README.WINDOWS file.



--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5 supported systems

2012-02-22 Thread Paul H. Hargrove

I have NOT been running Intel's compilers on Macs, only on Linux.
I *tried* PGI's compilers on MacOS, but that was a flop.
I have used Clang (comes w/ XCode 4.2) on MacOS, and that works for me 
but is not extensively tested.


-Paul

On 2/22/2012 6:13 PM, Larry Baker wrote:

Paul,

Haven't you been running Intel compilers on OS X?

Also, do we have specifics about which gcc's on Mac OS X?  I have (OS 
X 10.5.8):



savaii:~ baker$ ls -l /usr/bin/gcc*
lrwxr-xr-x  1 root  wheel   7 Oct  2  2009 /usr/bin/gcc -> gcc-4.0
-r-xr-xr-x  1 root  wheel  258368 Feb 19  2008 /usr/bin/gcc-3.3
-rwxr-xr-x  1 root  wheel   93088 Feb  5  2009 /usr/bin/gcc-4.0
-rwxr-xr-x  1 root  wheel  105680 Apr 27  2009 /usr/bin/gcc-4.2



savaii:~ baker$ ls -l /usr/bin/cc*
lrwxr-xr-x  1 root  wheel  7 Oct  2  2009 /usr/bin/cc -> gcc-4.0



savaii:~ baker$ ls /Developer/usr/llvm-gcc-4.2/bin/*cc*
/Developer/usr/llvm-gcc-4.2/bin/i686-apple-darwin9-llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/llvm-gcc-4.2
/Developer/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin9-llvm-gcc-4.2


Larry Baker
US Geological Survey
650-329-5608
ba...@usgs.gov <mailto:ba...@usgs.gov>

On 22 Feb 2012, at 5:55 PM, Paul H. Hargrove wrote:

Folks at Oracle should decide, but I suspect "Solaris 10" should be 
updated to "Solaris 10 and 11", or just "11".


-Paul

On 2/22/2012 2:44 PM, Jeffrey Squyres wrote:

Please verify this list of supported systems for the v1.5.5 release:

- The run-time systems that are currently supported are:
  - rsh / ssh
  - LoadLeveler
  - PBS Pro, Open PBS, Torque
  - Platform LSF (v7.0.2 and later)
  - SLURM
  - Cray XT-3, XT-4, and XT-5
  - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008)

- Systems that have been tested are:
  - Linux (various flavors/distros), 32 bit, with gcc, and Oracle
Solaris Studio 12
  - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft,
Intel, Portland, and Oracle Solaris Studio 12 compilers (*)
  - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
Absoft compilers (*)
  - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with
Oracle Solaris Studio 12

  (*) Be sure to read the Compiler Notes, below.

- Other systems have been lightly (but not fully tested):
  - Other 64 bit platforms (e.g., Linux on PPC64)
  - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008);
    see the README.WINDOWS file.



--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5 supported systems

2012-02-22 Thread Paul H. Hargrove
Folks at Oracle should decide, but I suspect "Solaris 10" should be 
updated to "Solaris 10 and 11", or just "11".


-Paul

On 2/22/2012 2:44 PM, Jeffrey Squyres wrote:

Please verify this list of supported systems for the v1.5.5 release:

- The run-time systems that are currently supported are:
   - rsh / ssh
   - LoadLeveler
   - PBS Pro, Open PBS, Torque
   - Platform LSF (v7.0.2 and later)
   - SLURM
   - Cray XT-3, XT-4, and XT-5
   - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine
   - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008)

- Systems that have been tested are:
   - Linux (various flavors/distros), 32 bit, with gcc, and Oracle
 Solaris Studio 12
   - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft,
 Intel, Portland, and Oracle Solaris Studio 12 compilers (*)
   - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and
 Absoft compilers (*)
   - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with
 Oracle Solaris Studio 12

   (*) Be sure to read the Compiler Notes, below.

- Other systems have been lightly (but not fully tested):
   - Other 64 bit platforms (e.g., Linux on PPC64)
   - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008);
 see the README.WINDOWS file.



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-22 Thread Paul H. Hargrove

I think I have the beginning of a fix for this issue.

I had not even noticed earlier that the error in event.h is from the C++ 
compiler, when compiling file.cxx in the c++ bindings.  That makes the 
vendor-specific addition of "-library=stlport4" to CXXFLAGS quite 
relevant to the problem/solution.


It eventually occurred to me that when VT's sub-configure told me to add 
configure arguments, I could have used --with-contrib-vt-flags to pass 
that ONLY to VT and perhaps NOT mess with whatever karma was providing 
the definition of u_char.  However, when I tried that I was disappointed 
to find that the bit of configure logic that suggests/requires 
CXXFLAGS=-library=stlport4 (from ompi/contrib/vt/configure.m4) runs 
BEFORE the processing of --with-contrib-vt-flags.  So, that was a dead end.


So, the next idea was to look for a fix specific to sltport.  I tried 
adding near the top of opal/event/event.h (after the WINDOWS equivalent):

#ifdef STLPORT
typedef unsigned char u_char;
#endif


That managed to clear up the original problem w/ SS12.2.
With SS12.3, things also built fine.
This suggests the typedef is not "conflicting" with whatever other defn 
was present.
I think the "safety" of this needs to be examined more widely before 
this can be adopted.
My concern is that some system could "typedef char u_char" if it has 
char unsigned by default, leading to a conflict.

Now that would, I suppose, only be a problem if STLPORT is also defined.
So, maybe I am over thinking this.

-Paul

On 2/21/2012 11:10 PM, Paul H. Hargrove wrote:

More notes:

I've tested ompi-1.5.4 and it has the same problem.  So, this is NOT a 
regression.


Terry D. has observed that Ubuntu is NOT a supported platform for the 
Solaris Studio compilers.
So, I've reproduced on a Scientific Linux 5.5 platform (Red Hat 
Enterprise Linux 5.5 clone, like CentOS) to be sure that was NOT the 
cause.


When I configure for the SS12.x compilers, I've been passing  
CXXFLAGS="-library=stlport4" as the VT sub-configure has informed me I 
should, due to something wrong the the default STL.  I tried dropping 
that from configure, and THE BUILD WAS SUCCESSFUL.


So, one has 2 choices:
+ build w/ SS12.2 without VT
+ update to SS12.3 and have VT

I don't think there is sufficient reason to delay 1.5.5 for this.

-Paul

On 2/21/2012 4:39 PM, Paul H. Hargrove wrote:

A few things to note:

1) This is NOT a problem w/ the SS12.3 compilers on the same machine.
So, one could say "upgrade your compiler" (a free download) and not 
delay 1.5.5 for this issue.


2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 
and SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus)


3) Testing the trunk I DON'T see the problem with either SS12.2 or 
SS12.3.
This is interesting, because it probably means that a u_char 
definition is SOMEWHERE in the headers (because libevent *is* getting 
built).


Whatever else may be done, I think this should be fixed "properly" 
(whatever that may equate to) for 1.6.
The way I see it now, it feels like OMPI is getting a definition of 
u_char only "by accident".


-Paul

On 2/21/2012 12:16 PM, Paul H. Hargrove wrote:
Building the v1.5 branch on Linux with the Solaris Studio 12.2 
compilers I see the following failure:
"[srcdir]/opal/event/event.h", line 797: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 798: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 1184: Error: "," expected 
instead of "*".

Where line 1184 is a prototype containing "u_char *".

As far as I can find, only several files below opal/event/ contain 
any use of "u_char".

There is a typedef for u_char in hwloc, but no use that I could see.

To the best of my knowledge u_char is NOT defined by any standard, 
and thus there is no particular header one can reliably find it in.
The alternatives, of course are "unsigned char" or "uint8_t" 
(defined in stdint.h).


I had a look at the trunk and VISUALLY is appears the same problem 
exists in:

   opal/event/event.h
   opal/mca/event/libevent2013/libevent/event.h
However, my testing is currently confined to the v1.5 branch in the 
hopes of finally getting the next 1.5.5rc out the door.


-Paul







--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-22 Thread Paul H. Hargrove

More notes:

I've tested ompi-1.5.4 and it has the same problem.  So, this is NOT a 
regression.


Terry D. has observed that Ubuntu is NOT a supported platform for the 
Solaris Studio compilers.
So, I've reproduced on a Scientific Linux 5.5 platform (Red Hat 
Enterprise Linux 5.5 clone, like CentOS) to be sure that was NOT the cause.


When I configure for the SS12.x compilers, I've been passing  
CXXFLAGS="-library=stlport4" as the VT sub-configure has informed me I 
should, due to something wrong the the default STL.  I tried dropping 
that from configure, and THE BUILD WAS SUCCESSFUL.


So, one has 2 choices:
+ build w/ SS12.2 without VT
+ update to SS12.3 and have VT

I don't think there is sufficient reason to delay 1.5.5 for this.

-Paul

On 2/21/2012 4:39 PM, Paul H. Hargrove wrote:

A few things to note:

1) This is NOT a problem w/ the SS12.3 compilers on the same machine.
So, one could say "upgrade your compiler" (a free download) and not 
delay 1.5.5 for this issue.


2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 
and SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus)


3) Testing the trunk I DON'T see the problem with either SS12.2 or 
SS12.3.
This is interesting, because it probably means that a u_char 
definition is SOMEWHERE in the headers (because libevent *is* getting 
built).


Whatever else may be done, I think this should be fixed "properly" 
(whatever that may equate to) for 1.6.
The way I see it now, it feels like OMPI is getting a definition of 
u_char only "by accident".


-Paul

On 2/21/2012 12:16 PM, Paul H. Hargrove wrote:
Building the v1.5 branch on Linux with the Solaris Studio 12.2 
compilers I see the following failure:
"[srcdir]/opal/event/event.h", line 797: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 798: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 1184: Error: "," expected 
instead of "*".

Where line 1184 is a prototype containing "u_char *".

As far as I can find, only several files below opal/event/ contain 
any use of "u_char".

There is a typedef for u_char in hwloc, but no use that I could see.

To the best of my knowledge u_char is NOT defined by any standard, 
and thus there is no particular header one can reliably find it in.
The alternatives, of course are "unsigned char" or "uint8_t" (defined 
in stdint.h).


I had a look at the trunk and VISUALLY is appears the same problem 
exists in:

   opal/event/event.h
   opal/mca/event/libevent2013/libevent/event.h
However, my testing is currently confined to the v1.5 branch in the 
hopes of finally getting the next 1.5.5rc out the door.


-Paul





--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove
My build with the "2011_sp1.8.273" Intel compilers passes the same tests 
as I detailed below for "2011_sp1.7.256".
I don't suspect any longer that the compiler is at fault, but am willing 
to try additional/alternate tests to help confirm.


-Paul

On 2/21/2012 5:40 PM, Paul H. Hargrove wrote:

Here are the first of the results of the testing I promised.
I am not 100% sure how to reach the code that Eugene reported as 
problematic, so I tried just running the ring test with various 
-bind-to-* options.   I am quite willing to run additional test 
cases.  All runs are w/ OMPI_MCA_btl=sm,self.


+ 2011.5.220
  FAIL: "make check" fails opal_datatype_test
  OK: mpirun -np 2 ./ring_c
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

+ 2011_sp1.7.256
  OK: "make check"
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

So, I don't think the "2011_sp1.7.256" compilers are broken (and are 
"better" than the ones I've been using).
I have a build with "2011_sp1.8.273" churning away right now (est. 
45minutes to complete - should have disabled the Fortan bindings)


If there is something other than the -bind-to-* flags I should be 
using to reach the problematic code, let me know.
But based on what I've seen so far, I think we can probably rule out 
the compiler as the problem.


-Paul


On 2/21/2012 4:37 PM, Paul H. Hargrove wrote:
I have been testing v1.5 with slightly older Intel 
"composerxe-2011.5.220" compilers.
I see a "make check" failure in opal_datatype_test which is not 
present with any other compiler (such as gcc on the same node).
This has been seen most recently on the 1.5.5rc2r25990 tarball 
generated earlier today.
With "make check -k" I can confirm that opal_datatype_test is the 
ONLY failure I see with this compiler.
So, I have just assumed this was a buggy compiler and thought nothing 
more of it.


I have not yet tested them, but also have the same 
"composer_xe_2011_sp1.7.256" compiler and a more recent 
"composer_xe_2011_sp1.8.273".  I will test both ASAP and report back 
with my findings.


-Paul


On 2/21/2012 4:20 PM, Eugene Loh wrote:
We have some amount of MTT testing going on every night and on ONE 
of our systems v1.5 has been dead since r25914.  The system is


Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


and I'm encountering the problem with Intel 
(composer_xe_2011_sp1.7.256) compilers.  I haven't poked around 
enough yet to figure out what the problematic characteristic of this 
configuration is.


In r25914, orte/mca/odls/base/odls_base_open.c, we get

222 /* get the number of local sockets unless we were given 
a number */

223 if (0 == orte_default_num_sockets_per_board) {
224 
opal_paffinity_base_get_socket_info(_odls_globals.num_sockets);

225 }
226 /* get the number of local processors */
227 
opal_paffinity_base_get_processor_info(_odls_globals.num_processors);

228 /* compute the base number of cores/socket, if not given */
229 if (0 == orte_default_num_cores_per_socket) {
230 orte_odls_globals.num_cores_per_socket = 
orte_odls_globals.num_processors / orte_odls_globals.num_sockets;

231 }

Well, we execute the branch at line 224, but num_sockets remains 0.  
This leads to the divide-by-0 at line 230.  Digging deeper, the call 
at line 224 led us to 
opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff 
left out):


static int module_get_socket_info(int *num_sockets) {
hwloc_topology_t *t = _hwloc_topology;
*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, 
HWLOC_OBJ_SOCKET);

return OPAL_SUCCESS;
}

Anyhow, SOCKET is somehow an unknown layer, so num_sockets is 
returning 0.


I can poke around more, but does someone want to advise?
_______
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel






--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove

Here are the first of the results of the testing I promised.
I am not 100% sure how to reach the code that Eugene reported as 
problematic, so I tried just running the ring test with various 
-bind-to-* options.   I am quite willing to run additional test cases.  
All runs are w/ OMPI_MCA_btl=sm,self.


+ 2011.5.220
  FAIL: "make check" fails opal_datatype_test
  OK: mpirun -np 2 ./ring_c
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

+ 2011_sp1.7.256
  OK: "make check"
  OK: mpirun -np 2 -bind-to-none ./ring_c
  OK: mpirun -np 2 -bind-to-core ./ring_c
  OK: mpirun -np 2 -bind-to-socket ./ring_c

So, I don't think the "2011_sp1.7.256" compilers are broken (and are 
"better" than the ones I've been using).
I have a build with "2011_sp1.8.273" churning away right now (est. 
45minutes to complete - should have disabled the Fortan bindings)


If there is something other than the -bind-to-* flags I should be using 
to reach the problematic code, let me know.
But based on what I've seen so far, I think we can probably rule out the 
compiler as the problem.


-Paul


On 2/21/2012 4:37 PM, Paul H. Hargrove wrote:
I have been testing v1.5 with slightly older Intel 
"composerxe-2011.5.220" compilers.
I see a "make check" failure in opal_datatype_test which is not 
present with any other compiler (such as gcc on the same node).
This has been seen most recently on the 1.5.5rc2r25990 tarball 
generated earlier today.
With "make check -k" I can confirm that opal_datatype_test is the ONLY 
failure I see with this compiler.
So, I have just assumed this was a buggy compiler and thought nothing 
more of it.


I have not yet tested them, but also have the same 
"composer_xe_2011_sp1.7.256" compiler and a more recent 
"composer_xe_2011_sp1.8.273".  I will test both ASAP and report back 
with my findings.


-Paul


On 2/21/2012 4:20 PM, Eugene Loh wrote:
We have some amount of MTT testing going on every night and on ONE of 
our systems v1.5 has been dead since r25914.  The system is


Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


and I'm encountering the problem with Intel 
(composer_xe_2011_sp1.7.256) compilers.  I haven't poked around 
enough yet to figure out what the problematic characteristic of this 
configuration is.


In r25914, orte/mca/odls/base/odls_base_open.c, we get

222 /* get the number of local sockets unless we were given a 
number */

223 if (0 == orte_default_num_sockets_per_board) {
224 
opal_paffinity_base_get_socket_info(_odls_globals.num_sockets);

225 }
226 /* get the number of local processors */
227 
opal_paffinity_base_get_processor_info(_odls_globals.num_processors);

228 /* compute the base number of cores/socket, if not given */
229 if (0 == orte_default_num_cores_per_socket) {
230 orte_odls_globals.num_cores_per_socket = 
orte_odls_globals.num_processors / orte_odls_globals.num_sockets;

231 }

Well, we execute the branch at line 224, but num_sockets remains 0.  
This leads to the divide-by-0 at line 230.  Digging deeper, the call 
at line 224 led us to 
opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff left 
out):


static int module_get_socket_info(int *num_sockets) {
hwloc_topology_t *t = _hwloc_topology;
*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET);
return OPAL_SUCCESS;
}

Anyhow, SOCKET is somehow an unknown layer, so num_sockets is 
returning 0.


I can poke around more, but does someone want to advise?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-21 Thread Paul H. Hargrove

A few things to note:

1) This is NOT a problem w/ the SS12.3 compilers on the same machine.
So, one could say "upgrade your compiler" (a free download) and not 
delay 1.5.5 for this issue.


2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 and 
SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus)


3) Testing the trunk I DON'T see the problem with either SS12.2 or SS12.3.
This is interesting, because it probably means that a u_char definition 
is SOMEWHERE in the headers (because libevent *is* getting built).


Whatever else may be done, I think this should be fixed "properly" 
(whatever that may equate to) for 1.6.
The way I see it now, it feels like OMPI is getting a definition of 
u_char only "by accident".


-Paul

On 2/21/2012 12:16 PM, Paul H. Hargrove wrote:
Building the v1.5 branch on Linux with the Solaris Studio 12.2 
compilers I see the following failure:
"[srcdir]/opal/event/event.h", line 797: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 798: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead 
of "*".

Where line 1184 is a prototype containing "u_char *".

As far as I can find, only several files below opal/event/ contain any 
use of "u_char".

There is a typedef for u_char in hwloc, but no use that I could see.

To the best of my knowledge u_char is NOT defined by any standard, and 
thus there is no particular header one can reliably find it in.
The alternatives, of course are "unsigned char" or "uint8_t" (defined 
in stdint.h).


I had a look at the trunk and VISUALLY is appears the same problem 
exists in:

   opal/event/event.h
   opal/mca/event/libevent2013/libevent/event.h
However, my testing is currently confined to the v1.5 branch in the 
hopes of finally getting the next 1.5.5rc out the door.


-Paul



--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.5 r25914 DOA

2012-02-21 Thread Paul H. Hargrove
I have been testing v1.5 with slightly older Intel 
"composerxe-2011.5.220" compilers.
I see a "make check" failure in opal_datatype_test which is not present 
with any other compiler (such as gcc on the same node).
This has been seen most recently on the 1.5.5rc2r25990 tarball generated 
earlier today.
With "make check -k" I can confirm that opal_datatype_test is the ONLY 
failure I see with this compiler.
So, I have just assumed this was a buggy compiler and thought nothing 
more of it.


I have not yet tested them, but also have the same 
"composer_xe_2011_sp1.7.256" compiler and a more recent 
"composer_xe_2011_sp1.8.273".  I will test both ASAP and report back 
with my findings.


-Paul


On 2/21/2012 4:20 PM, Eugene Loh wrote:
We have some amount of MTT testing going on every night and on ONE of 
our systems v1.5 has been dead since r25914.  The system is


Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 
2007 x86_64 x86_64 x86_64 GNU/Linux


and I'm encountering the problem with Intel 
(composer_xe_2011_sp1.7.256) compilers.  I haven't poked around enough 
yet to figure out what the problematic characteristic of this 
configuration is.


In r25914, orte/mca/odls/base/odls_base_open.c, we get

222 /* get the number of local sockets unless we were given a 
number */

223 if (0 == orte_default_num_sockets_per_board) {
224 
opal_paffinity_base_get_socket_info(_odls_globals.num_sockets);

225 }
226 /* get the number of local processors */
227 
opal_paffinity_base_get_processor_info(_odls_globals.num_processors);

228 /* compute the base number of cores/socket, if not given */
229 if (0 == orte_default_num_cores_per_socket) {
230 orte_odls_globals.num_cores_per_socket = 
orte_odls_globals.num_processors / orte_odls_globals.num_sockets;

231 }

Well, we execute the branch at line 224, but num_sockets remains 0.  
This leads to the divide-by-0 at line 230.  Digging deeper, the call 
at line 224 led us to 
opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff left 
out):


static int module_get_socket_info(int *num_sockets) {
hwloc_topology_t *t = _hwloc_topology;
*num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET);
return OPAL_SUCCESS;
}

Anyhow, SOCKET is somehow an unknown layer, so num_sockets is 
returning 0.


I can poke around more, but does someone want to advise?
___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux

2012-02-21 Thread Paul H. Hargrove
Building the v1.5 branch on Linux with the Solaris Studio 12.2 compilers 
I see the following failure:
"[srcdir]/opal/event/event.h", line 797: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 798: Error: Type name expected 
instead of "u_char".
"[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead 
of "*".

Where line 1184 is a prototype containing "u_char *".

As far as I can find, only several files below opal/event/ contain any 
use of "u_char".

There is a typedef for u_char in hwloc, but no use that I could see.

To the best of my knowledge u_char is NOT defined by any standard, and 
thus there is no particular header one can reliably find it in.
The alternatives, of course are "unsigned char" or "uint8_t" (defined in 
stdint.h).


I had a look at the trunk and VISUALLY is appears the same problem 
exists in:

   opal/event/event.h
   opal/mca/event/libevent2013/libevent/event.h
However, my testing is currently confined to the v1.5 branch in the 
hopes of finally getting the next 1.5.5rc out the door.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Paul H. Hargrove

Thanks, Ralph.
Excellent point about not needing to use the "FC" name with its special 
(absurd?) behavior.


-Paul

On 2/21/2012 1:52 AM, Ralph Castain wrote:
I went ahead and applied this, with a tweak. There is no reason to 
call our flag "FC" as all we use it for is to call the write wrapper. 
So I renamed it to something less problematic.


On Feb 21, 2012, at 1:20 AM, Paul H. Hargrove wrote:

And while we are looking at examples/Makefile on Solaris-10, why are 
the F77 examples getting built w/ mpif90?
Because w/ the Solaris make setting FC also silently sets F77 (yes, I 
am NOT kidding)!
So, reordering the F77= and FC= lines in Makefile resolves that 
mis-behavior.


Attached is my patch to fix both F77/FC and the "better" ompi_info 
queries mentioned in my previous post.

This REPLACES the patch in the previous post.

-Paul

On 2/20/2012 11:36 PM, Paul H. Hargrove wrote:
The addition on Monday of the Java cases to examples/Makefile has 
shown that the default "make" in Solaris-10 will stop on the first 
failed grep command in the "all" rule:

$ make
mpicc -g   -o hello_c hello_c.c
mpicc -g   -o ring_c ring_c.c
mpicc -g   -o connectivity_c connectivity_c.c
mpic++ -g   -o hello_cxx hello_cxx.cc
mpic++ -g   -o ring_cxx ring_cxx.cc
mpif90 -g hello_f77.f -o hello_f77
mpif90 -g ring_f77.f -o ring_f77
mpif90 -g hello_f90.f90 -o hello_f90
mpif90 -g ring_f90.f90 -o ring_f90
*** Error code 1
The following command caused the error:
if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; 
then \

make Hello.class; \
fi
make: Fatal error: Command failed for target `all'


The addition of java did NOT break anything, but exposed a 
pre-existing problem which  was not evident in my prior testing 
because all language bindings were being build prior to adding java.


The attached patch resolves the problem in my (admittedly minimal) 
testing with the smallest possible change.
However an entirely different avoids both "test" and "true" and 
simply looks like:

@ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then
I have *also* tested that approach, and it works fine too.

I *did* warn that the introduction of the java bindings would bring 
collateral damage.

I just didn't anticipate encountering it personally.

-Paul



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
___
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Paul H. Hargrove
And while we are looking at examples/Makefile on Solaris-10, why are the 
F77 examples getting built w/ mpif90?
Because w/ the Solaris make setting FC also silently sets F77 (yes, I am 
NOT kidding)!
So, reordering the F77= and FC= lines in Makefile resolves that 
mis-behavior.


Attached is my patch to fix both F77/FC and the "better" ompi_info 
queries mentioned in my previous post.

This REPLACES the patch in the previous post.

-Paul

On 2/20/2012 11:36 PM, Paul H. Hargrove wrote:
The addition on Monday of the Java cases to examples/Makefile has 
shown that the default "make" in Solaris-10 will stop on the first 
failed grep command in the "all" rule:

$ make
mpicc -g   -o hello_c hello_c.c
mpicc -g   -o ring_c ring_c.c
mpicc -g   -o connectivity_c connectivity_c.c
mpic++ -g   -o hello_cxx hello_cxx.cc
mpic++ -g   -o ring_cxx ring_cxx.cc
mpif90 -g hello_f77.f -o hello_f77
mpif90 -g ring_f77.f -o ring_f77
mpif90 -g hello_f90.f90 -o hello_f90
mpif90 -g ring_f90.f90 -o ring_f90
*** Error code 1
The following command caused the error:
if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
make Hello.class; \
fi
make: Fatal error: Command failed for target `all'


The addition of java did NOT break anything, but exposed a 
pre-existing problem which  was not evident in my prior testing 
because all language bindings were being build prior to adding java.


The attached patch resolves the problem in my (admittedly minimal) 
testing with the smallest possible change.
However an entirely different avoids both "test" and "true" and simply 
looks like:

@ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then
I have *also* tested that approach, and it works fine too.

I *did* warn that the introduction of the java bindings would bring 
collateral damage.

I just didn't anticipate encountering it personally.

-Paul



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

Index: Makefile
===
--- Makefile(revision 25980)
+++ Makefile(working copy)
@@ -25,8 +25,8 @@
 CC = mpicc
 CXX = mpic++
 CCC = mpic++
+FC = mpif90
 F77 = mpif77
-FC = mpif90
 JAVAC = mpijavac

 # Using -g is not necessary, but it is helpful for example programs,
@@ -49,19 +49,19 @@
 # if Open MPI was build with the relevant language bindings.

 all: hello_c ring_c connectivity_c
-   @ if test "`ompi_info --parsable | grep bindings:cxx:yes`" != ""; then \
+   @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then \
$(MAKE) hello_cxx ring_cxx; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:f77:yes`" != ""; then \
+   @ if ompi_info --parsable | grep bindings:f77:yes >/dev/null; then \
$(MAKE) hello_f77 ring_f77; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:f90:yes`" != ""; then \
+   @ if ompi_info --parsable | grep bindings:f90:yes >/dev/null; then \
$(MAKE) hello_f90 ring_f90; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then 
\
+   @ if ompi_info --parsable | grep bindings:java:yes >/dev/null; then \
$(MAKE) Hello.class; \
fi
-   @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then 
\
+   @ if ompi_info --parsable | grep bindings:java:yes >/dev/null ; then \
$(MAKE) Ring.class; \
fi



[OMPI devel] non-portable code in examples/Makefile

2012-02-21 Thread Paul H. Hargrove
The addition on Monday of the Java cases to examples/Makefile has shown 
that the default "make" in Solaris-10 will stop on the first failed grep 
command in the "all" rule:

$ make
mpicc -g   -o hello_c hello_c.c
mpicc -g   -o ring_c ring_c.c
mpicc -g   -o connectivity_c connectivity_c.c
mpic++ -g   -o hello_cxx hello_cxx.cc
mpic++ -g   -o ring_cxx ring_cxx.cc
mpif90 -g hello_f77.f -o hello_f77
mpif90 -g ring_f77.f -o ring_f77
mpif90 -g hello_f90.f90 -o hello_f90
mpif90 -g ring_f90.f90 -o ring_f90
*** Error code 1
The following command caused the error:
if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
make Hello.class; \
fi
make: Fatal error: Command failed for target `all'


The addition of java did NOT break anything, but exposed a pre-existing 
problem which  was not evident in my prior testing because all language 
bindings were being build prior to adding java.


The attached patch resolves the problem in my (admittedly minimal) 
testing with the smallest possible change.
However an entirely different avoids both "test" and "true" and simply 
looks like:

@ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then
I have *also* tested that approach, and it works fine too.

I *did* warn that the introduction of the java bindings would bring 
collateral damage.

I just didn't anticipate encountering it personally.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

Index: Makefile
===
--- Makefile	(revision 25980)
+++ Makefile	(working copy)
@@ -49,19 +49,19 @@
 # if Open MPI was build with the relevant language bindings.

 all: hello_c ring_c connectivity_c
-	@ if test "`ompi_info --parsable | grep bindings:cxx:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:cxx:yes || true`" != ""; then \
 	$(MAKE) hello_cxx ring_cxx; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:f77:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:f77:yes || true`" != ""; then \
 	$(MAKE) hello_f77 ring_f77; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:f90:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:f90:yes || true`" != ""; then \
 	$(MAKE) hello_f90 ring_f90; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:java:yes || true``" != ""; then \
 	$(MAKE) Hello.class; \
 	fi
-	@ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \
+	@ if test "`ompi_info --parsable | grep bindings:java:yes || true`" != ""; then \
 	$(MAKE) Ring.class; \
 	fi



Re: [OMPI devel] [EXTERNAL] Re: trunk build failure on Altix [w/ WORK AROUND]

2012-02-21 Thread Paul H. Hargrove
Testing tonight's trunk tarball on the Altix system I have access to 
looks fine now.


Thanks, Brian.
-Paul

On 2/20/2012 11:49 AM, Paul H. Hargrove wrote:

Brian,

Thanks for looking into this.
I'll plan to take a look at the trunk tarball tonight and report back.

-Paul

On 2/20/2012 8:49 AM, Barrett, Brian W wrote:

Hi Paul -

Thanks for noticing this.  I guess we don't have many Altix 
developers.  I
think I've fixed it on the trunk with r25968, plus r25967 to make 
sure the

Altix component gets selected over the Linux component on Altix systems.
I don't have an Altix to test on; can you give it a go and let me 
know if

it works?  In the trunk right now, and should be in the trunk nightly
tarball tomorrow morning.

The problem cropped up when we started running the configure macros for
components that couldn't possibly succeed (which we needed to make
Automake happy in a couple of situations) sometime late in the 1.5 
series.
  Before that, a component could never think it succeeded and then 
later be
told it didn't.  We added yet another macro to handle issues like 
this, so

it was a fairly easy fix.

Thanks,

Brian

On 2/17/12 4:26 PM, "Paul H. Hargrove"<phhargr...@lbl.gov>  wrote:




I've poked enough at the ompi configure magic to *think* I
understand the source of the problem I've seen w/ both trunk and
1.5.x on the Altix.

The problem appears to be that both timer/altix/configure.m4 and
timer/linux/configure.m4 are setting the value of
$timer_base_include and the LAST one "wins".  Meanwhile, only the
FIRST one is getting added to $static_components ("there can be 
only
one").  So, I suspect the difference I saw between trunk and 1.5 
was

just a matter of which configure probe ran first.

The result of having FIRST and LAST "win" in different settings 
is a

mismatch.


$ grep -e timer:linux -e timer:altix
  configure.out
  --- MCA component timer:linux (m4 configuration macro, priority
  30)
  checking for MCA component timer:linux compile mode... static
  checking if MCA component timer:linux can compile... yes
  --- MCA component timer:altix (m4 configuration macro, priority
  30)
  checking for MCA component timer:altix compile mode... static
  checking if MCA component timer:altix can compile... no


which picks timer:linux and rejects timer:altix, as compared to:


$ grep -e '"MCA_opal_timer_[SD]' -e
  MCA_timer_ config.status
  S["MCA_opal_timer_DSO_SUBDIRS"]=""
  S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux"

S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la 


  "
  S["MCA_opal_timer_DSO_COMPONENTS"]=""
  S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux"
  D["MCA_timer_IMPLEMENTATION_HEADER"]="
  \"opal/mca/timer/altix/timer_altix.h\""


Which will build timer:linux but has improperly picked up the
timer:altix HEADER!


For the present, an explicit --with-timer=altix works-around the
problem in either branch.
However, the setting of the header variable by a NON-selected
component is ERRONEOUS and should get fixed.
In trunk, it may also make sense to raise the priority of
timer:altix above that of timer:linux.

-Paul

On 2/15/2012 12:41 AM, Paul Hargrove wrote:

  I've configured the ompi trunk (nightly tarball 1.7a1r25927)
on an SGI Altix.
  I used no special arguments indicating that this is an Altix,
and there does not appear to be an altix-specific file in
contrib/platform.


  My build fails as follows:




make:
  Entering directory
`/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrapper 


s'
CC opal_wrapper.o
CCLD   opal_wrapper
  ../../../opal/.libs/libopen-pal.so: undefined reference to
  `opal_timer_altix_mmdev_timer_addr'
  ../../../opal/.libs/libopen-pal.so: undefined reference to
  `opal_timer_altix_freq'
  collect2: ld returned 1 exit status






  The configure-generated opal_config.h contains
  #define MCA_timer_IMPLEMENTATION_HEADER
"opal/mca/timer/altix/timer_altix.h"


  Nothing appears to have been built in
BUILDDIR/opal/mca/timer/altix.
  However, BUILDDIR/opal/mca/timer/linux has been built.


  -Paul


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
<tel:%2B1-510-486-6900>




-- Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Dep

[OMPI devel] Invalid format strings in ROMIO

2012-02-20 Thread Paul H. Hargrove
Both the v1.5 branch and trunk are getting lots of warnings from Clang 
like the following:

  CC ad_coll_exch_new.lo
../../../../../../../../ompi/mca/io/romio/romio/adio/common/ad_coll_exch_new.c:51:28: 
warning: length modifier
 'L' results in undefined behavior or no effect with 'd' conversion 
specifier [-Wformat]
fprintf(stderr, "%d=(%Ld,%Ld)\n", i, 
flatlist_node_p->indices[i],


Manpages from both Linux (glibc) and FreeBSD (NOT glibc) agree that "L" 
is only a valid length modifier for the floating-point conversion 
specifiers.


Grepping both v1.5 and trunk show instances of "%Ld" in:

ompi/mca/io/romio/romio/adio/common/ad_write_nolock.c
ompi/mca/io/romio/romio/adio/common/ad_coll_build_req_new.c
ompi/mca/io/romio/romio/adio/common/ad_coll_exch_new.c
ompi/mca/io/romio/romio/adio/ad_gridftp/ad_gridftp_write.c
ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2_io_dtype.c
ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2_io_list.c

Not sure how much one cares, but I am reporting on the off chance 
somebody does want to fix this.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] flex warning from flex-2.5.4

2012-02-20 Thread Paul H. Hargrove


On 2/20/2012 3:36 PM, Paul H. Hargrove wrote:
NOTE: I've not yet actually tested the resulting show_help utility 
[but soon]. 


An "instrumented" version of test/opal_sos.c is getting the same string 
back from opal_show_help_string() both with and without my patch.  So, I 
believe it to  be correct/safe.  Hopefully is anything "deeper" needs to 
be tested then Ralph can see to it.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] [OMPI svn] svn:open-mpi r25966

2012-02-20 Thread Paul H. Hargrove
The original problem of a missing aio.h was seen on OpenBSD-5.0 (which 
was released Nov 1, 2011)

See http://www.open-mpi.org/community/lists/devel/2012/02/10470.php


-Paul

On 2/20/2012 4:03 PM, Edgar Gabriel wrote:

just out of curiosity, what platform did not have support for the aio
operations?

Also, the proper solution will be to not compile the section using the
aio functions, but still compile the rest of the module. I will try to
implement that properly ASAP. The POSIX is the most basic module that
shall be used if everything else breaks, so disabling it basically means
that we should not compile OMPIO at all.

Thanks
Edgar

On 2/20/2012 4:36 PM, Ralph Castain wrote:

I'm afraid this commit breaks the ability to build from a tarball. I created a tarball from the 
trunk and then did a configure followed by "make clean". The make command failed to 
execute because it could not "make clean" in the mca/fbtl/posix directory as there is no 
Makefile in it.

I checked and the Makefile -is- being created when built in an svn checkout, 
but is -not- being created when built from tarball. This was done on a Mac.


On Feb 20, 2012, at 8:53 AM, jsquy...@osl.iu.edu wrote:


Author: jsquyres
Date: 2012-02-20 10:53:20 EST (Mon, 20 Feb 2012)
New Revision: 25966
URL: https://svn.open-mpi.org/trac/ompi/changeset/25966

Log:
Ensure that we have aio.h before trying to compile this component.

Added:
   trunk/ompi/mca/fbtl/posix/configure.m4

Added: trunk/ompi/mca/fbtl/posix/configure.m4
==
--- (empty file)
+++ trunk/ompi/mca/fbtl/posix/configure.m4  2012-02-20 10:53:20 EST (Mon, 
20 Feb 2012)
@@ -0,0 +1,33 @@
+# -*- shell-script -*-
+#
+# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
+# University Research and Technology
+# Corporation.  All rights reserved.
+# Copyright (c) 2004-2005 The University of Tennessee and The University
+# of Tennessee Research Foundation.  All rights
+# reserved.
+# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart,
+# University of Stuttgart.  All rights reserved.
+# Copyright (c) 2004-2012 The Regents of the University of California.
+# All rights reserved.
+# Copyright (c) 2010  Cisco Systems, Inc.  All rights reserved.
+# Copyright (c) 2008-2011 University of Houston. All rights reserved.
+# $COPYRIGHT$
+#
+# Additional copyrights may follow
+#
+# $HEADER$
+#
+
+# MCA_fbtl_posix_CONFIG(action-if-can-compile,
+#[action-if-cant-compile])
+# 
+AC_DEFUN([MCA_ompi_fbtl_posix_CONFIG],[
+AC_CHECK_HEADER([aio.h],
+[fbtl_posix_happy="yes"],
+[fbtl_posix_happy="no"])
+
+AS_IF([test "$fbtl_posix_happy" = "yes"],
+  [$1],
+  [$2])
+])dnl
___
svn mailing list
s...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/svn


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel



___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] flex warning from flex-2.5.4

2012-02-20 Thread Paul H. Hargrove

Ralph,

The change below removes the warning, but very slightly changes the 
syntax that is parsed.
In the original, anything following the "[tag]" was considered trailing 
context.
However that made inputs like "[tag]foo]" ambiguous to the parser (hence 
the warning).

With the change below, both "]" will be in the matched string.
I am pretty sure that shouldn't ever happen in valid inputs anyway.
NOTE: I've not yet actually tested the resulting show_help utility [but 
soon].


-Paul

Index: opal/util/show_help_lex.l
===
--- opal/util/show_help_lex.l   (revision 25974)
+++ opal/util/show_help_lex.l   (working copy)
@@ -62,7 +62,7 @@

 #.*\n   ; /* comment line */

-^\[.+\]/.*\n { BEGIN(CHOMP); return OPAL_SHOW_HELP_PARSE_TOPIC; }
+^\[.+\]/[^\]\n]*\n { BEGIN(CHOMP); return OPAL_SHOW_HELP_PARSE_TOPIC; }

.*\n { BEGIN(INITIAL); }



On 2/20/2012 3:26 PM, Ralph Castain wrote:

My bad - didn't look closely enough. I'll take a look at it and see if there is 
anything we can do.

On Feb 20, 2012, at 4:12 PM, Paul H. Hargrove wrote:


Ralph,

Are you sure this is a flex-generated file?
I am looking at opal/util/show_help_lex.l in the svn trunk and it certainly 
looks human-generated to me.
Please clue me in if I am missing something.

The warning is from flex when processing the .l file, NOT from the compilation 
of the flex-generated .c file.

-Paul

On 2/19/2012 7:55 PM, Ralph Castain wrote:

We get that everywhere, unfortunately - it comes from flex and is outside our 
control as the file it complains about is actually generated by flex itself. 
Unfortunately, flex is no longer maintained, and so nothing has been done to 
correct it.


On Feb 19, 2012, at 8:47 PM, Paul H. Hargrove wrote:


I've not checked any other systems, but building the trunk on OpenBSD and 
FreeBSD (w/ flex-2.5.4) I see the following:

  LEXshow_help_lex.c
"[srcdir]/opal/util/show_help_lex.l", line 65: warning, dangerous trailing 
context

I found this message in the flex documentation, and it mentions that the POSIX 
draft for LEX leaves such cases undefined.
http://flex.sourceforge.net/manual/Limitations.html

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] flex warning from flex-2.5.4

2012-02-20 Thread Paul H. Hargrove

Ralph,

Are you sure this is a flex-generated file?
I am looking at opal/util/show_help_lex.l in the svn trunk and it 
certainly looks human-generated to me.

Please clue me in if I am missing something.

The warning is from flex when processing the .l file, NOT from the 
compilation of the flex-generated .c file.


-Paul

On 2/19/2012 7:55 PM, Ralph Castain wrote:

We get that everywhere, unfortunately - it comes from flex and is outside our 
control as the file it complains about is actually generated by flex itself. 
Unfortunately, flex is no longer maintained, and so nothing has been done to 
correct it.


On Feb 19, 2012, at 8:47 PM, Paul H. Hargrove wrote:


I've not checked any other systems, but building the trunk on OpenBSD and 
FreeBSD (w/ flex-2.5.4) I see the following:

  LEXshow_help_lex.c
"[srcdir]/opal/util/show_help_lex.l", line 65: warning, dangerous trailing 
context

I found this message in the flex documentation, and it mentions that the POSIX 
draft for LEX leaves such cases undefined.
http://flex.sourceforge.net/manual/Limitations.html

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r25966

2012-02-20 Thread Paul H. Hargrove

s/Jeff/Paul/

Jeff's only fault was trusting me too much.
-Paul

On 2/20/2012 2:41 PM, Barrett, Brian W wrote:

That's because Jeff forgot to copy the line:

   AC_CONFIG_FILES([ompi/mca/fbtl/posix/Makefile])

> From whatever configure.m4 script he used as the base for his new macro:).

Brian


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable test operator in configure

2012-02-20 Thread Paul H. Hargrove
For those keeping score at home, that should have said "/usr/ucb" 
instead of "/usr/ucb/bin".
I make mistakes too (as Ralph's observation of breakage w/ r25966 shows 
quite clearly).


-Paul

On 2/20/2012 2:37 PM, Paul H. Hargrove wrote:

Short version:
The "expr: Paren problem" comes from having /usr/ucb/bin ahead of 
/usr/bin in one's $PATH.
So, I needed to fix my $PATH. 


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable test operator in configure

2012-02-20 Thread Paul H. Hargrove

Short version:
The "expr: Paren problem" comes from having /usr/ucb/bin ahead of 
/usr/bin in one's $PATH.

So, I needed to fix my $PATH.

Long version:

This error is coming from configure's own argument parsing logic when 
the ROMIO sub-configure is invoked.
The issue appears to be that the expr implementation of parens (for 
match capture), doesn't like the length of the match.

This works:

$ expr 'XCPPFLAGS=-D_REENTRANT -I/foo/bar' : '[^=]*=\(.*\)'
-D_REENTRANT -I/foo/bar


This (from config.log) does not:
$ expr 'XCPPFLAGS= -D_REENTRANT 
-I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/openmpi-1.7a1r25959/opal/mca/hwloc/hwloc132/hwloc/include 
-I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/BB/opal/mca/hwloc/hwloc132/hwloc/include  
-I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/openmpi-1.7a1r25959/opal/mca/event/libevent2013/libevent 
-I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/openmpi-1.7a1r25959/opal/mca/event/libevent2013/libevent/include 
-I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/BB/opal/mca/event/libevent2013/libevent/include   
-I/usr/include/infiniband -I/usr/include/infiniband' : '[^=]*=\(.*\)'

expr: Paren problem


This one works:
expr 'XCPPFLAGS=-D_REENTRANT 
-I/foo/bar/baz/1234567890/acdefghijklmnopqrstuvwxyz/this-is-getting-too-long-for-solaris-expr-to-handle-correctly/' 
: '[^=]*=\(.*\)'
-D_REENTRANT 
-I/foo/bar/baz/1234567890/acdefghijklmnopqrstuvwxyz/this-is-getting-too-long-for-solaris-expr-to-handle-correctly/


While 1 more character breaks:
{phargrov@cloon BB}$ expr 'XCPPFLAGS=-D_REENTRANT 
-I/foo/bar/baz/1234567890/acdefghijklmnopqrstuvwxyz/this-is-getting-too-long-for-solaris-expr-to-handle-correctly/1' 
: '[^=]*=\(.*\)'

expr: Paren problem


The work-around appears to be to ensure /usr/bin is before /usr/ucb/bin 
in PATH, since /usr/bin/expr doesn't display this problem.

I've fixed my own PATH accordingly for my future Solaris testing.

Even using /bin/sh, I saw no other "odd" behaviors with configure on 
Solaris-10.


-Paul

On 2/20/2012 1:16 PM, Paul H. Hargrove wrote:

Argh!!
I am now trying to track down "expr: Paren problem" on Solaris.
The dash shell on Linux doesn't reproduce this one, unfortunately.

-Paul

On 2/20/2012 1:12 PM, Paul H. Hargrove wrote:
I'll report back ASAP on my slowlaris10 results. 




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-20 Thread Paul H. Hargrove



On 2/20/2012 1:26 PM, Brice Goglin wrote:

Le 08/02/2012 22:33, Paul H. Hargrove a écrit :

Tests on the virtual node I have access to where that problem report
originated is still not quite right.
There is now a different assertion failing than I had seen before:

lt-linux-libnuma:
/users/phh1/OMPI/hwloc-1.3.2rc1-linux-ppc64-gcc//hwloc-1.3.2rc1/tests/linux-libnuma.c:83:
main: Assertion `!memcmp(,_all_nodes,
sizeof(nodemask_t))' failed.
/bin/sh: line 5: 19416 Aborted ${dir}$tst
FAIL: linux-libnuma

I don't have any clue if that represents forward or backward progress.

Can you try the attached patch?

It removes nodemask checks (this deprecated interface is too
buggy/strange in libnuma, no way to assert its behavior reliably).
Then, it fixes the libnuma helpers to properly use os_index instead
logical_index (important on your machine because node ids are sparse).
And finally it makes sure the test actually checks what we want
(shouldn't matter in your case).

I've tested this on your topology, a 8-node machine with out-of-order
numa node ids, and some basic nodes, with a recent and a less recent
libnuma release.

My current plan is to apply all these in all branches and then remove
the nodemask conversion helpers from trunk.

Brice



I applied to the svn trunk and can now PASS "make check" on my odd 
virtual node.

Thanks, Brice.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable test operator in configure

2012-02-20 Thread Paul H. Hargrove

Argh!!
I am now trying to track down "expr: Paren problem" on Solaris.
The dash shell on Linux doesn't reproduce this one, unfortunately.

-Paul

On 2/20/2012 1:12 PM, Paul H. Hargrove wrote:
I'll report back ASAP on my slowlaris10 results. 


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable test operator in configure

2012-02-20 Thread Paul H. Hargrove
Not what I had expected to find, but a pretty simple fix (missing line 
continuation):


Index: orte/mca/ess/alps/configure.m4
===
--- orte/mca/ess/alps/configure.m4  (revision 25970)
+++ orte/mca/ess/alps/configure.m4  (working copy)
@@ -53,7 +53,7 @@
 [orte_mca_ess_alps_happy="yes"],
 [orte_mca_ess_alps_happy="no"])

-AS_IF([test "$orte_mca_ess_alps_happy" = "yes" -a 
"$orte_without_full_support" = 0 -a
+AS_IF([test "$orte_mca_ess_alps_happy" = "yes" -a 
"$orte_without_full_support" = 0 -a \

"$orte_mca_ess_alps_have_cnos" = 1],
   [$1],
   [$2])


That is sufficient to let "dash" on an Ubuntu system make it through 
configure.

I'll report back ASAP on my slowlaris10 results.

NOTE: this is NOT present in the v1.5 branch (no cmr is required).

-Paul

On 2/20/2012 12:46 PM, Jeffrey Squyres wrote:

Ah, ok.

On Feb 20, 2012, at 3:45 PM, Paul H. Hargrove wrote:


Jeff,

The one in config/ompi_load_platform.m4 was on my original hit-list.
Getting PAST that one shows a new problem that appears NOT to be a "==".
The autoconf manual warns about use of "-a" and "-o" together with variables 
that may expand to the empty string, and I suspect that is the new problem I am hitting.   I hope 
to know soon.

-Paul


On 2/20/2012 12:41 PM, Jeffrey Squyres wrote:

grep == configure | grep test

only shows one more.  I found it in config/ompi_load_platform.m4 and fixed it 
on the trunk.


On Feb 20, 2012, at 3:38 PM, Paul H. Hargrove wrote:


I am afraid that with the $with_platform instance fixed, configure on Solaris 
10 gets far enough to find another problem.
I'll provide a patch once I've tracked this down. Sigh.

FYI:
One can root out bashisms by using the "dash" shell on a Debian or Ubuntu 
system:
$ env CONFIG_SHELL=dash dash [path_to]/configure [options]

-Paul



On 2/20/2012 5:42 AM, Jeffrey Squyres wrote:

Fixed -- thanks!

On Feb 20, 2012, at 4:11 AM, Paul H. Hargrove wrote:


Please note that "==" is NOT a portable binary operator for the "test" utility.
It is supported only by the bash built-in version of "test".
The correct operator is a simple "=".

The following appear in the svn trunk

./orte/config/orte_check_alps.m4:   AS_IF([test "$orte_check_alps_pmi_happy" == 
"yes" -a "$orte_without_full_support" = 0],
./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then

The $with_platform test breaks configure fairly early on at least Solaris 10.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable test operator in configure

2012-02-20 Thread Paul H. Hargrove

Jeff,

The one in config/ompi_load_platform.m4 was on my original hit-list.
Getting PAST that one shows a new problem that appears NOT to be a "==".
The autoconf manual warns about use of "-a" and "-o" together with 
variables that may expand to the empty string, and I suspect that is the 
new problem I am hitting.   I hope to know soon.


-Paul


On 2/20/2012 12:41 PM, Jeffrey Squyres wrote:

grep == configure | grep test

only shows one more.  I found it in config/ompi_load_platform.m4 and fixed it 
on the trunk.


On Feb 20, 2012, at 3:38 PM, Paul H. Hargrove wrote:


I am afraid that with the $with_platform instance fixed, configure on Solaris 
10 gets far enough to find another problem.
I'll provide a patch once I've tracked this down. Sigh.

FYI:
One can root out bashisms by using the "dash" shell on a Debian or Ubuntu 
system:
$ env CONFIG_SHELL=dash dash [path_to]/configure [options]

-Paul



On 2/20/2012 5:42 AM, Jeffrey Squyres wrote:

Fixed -- thanks!

On Feb 20, 2012, at 4:11 AM, Paul H. Hargrove wrote:


Please note that "==" is NOT a portable binary operator for the "test" utility.
It is supported only by the bash built-in version of "test".
The correct operator is a simple "=".

The following appear in the svn trunk

./orte/config/orte_check_alps.m4:   AS_IF([test "$orte_check_alps_pmi_happy" == 
"yes" -a "$orte_without_full_support" = 0],
./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then

The $with_platform test breaks configure fairly early on at least Solaris 10.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

_______
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

_______
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel




--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] non-portable test operator in configure

2012-02-20 Thread Paul H. Hargrove
I am afraid that with the $with_platform instance fixed, configure on 
Solaris 10 gets far enough to find another problem.

I'll provide a patch once I've tracked this down. Sigh.

FYI:
One can root out bashisms by using the "dash" shell on a Debian or 
Ubuntu system:

$ env CONFIG_SHELL=dash dash [path_to]/configure [options]

-Paul



On 2/20/2012 5:42 AM, Jeffrey Squyres wrote:

Fixed -- thanks!

On Feb 20, 2012, at 4:11 AM, Paul H. Hargrove wrote:


Please note that "==" is NOT a portable binary operator for the "test" utility.
It is supported only by the bash built-in version of "test".
The correct operator is a simple "=".

The following appear in the svn trunk

./orte/config/orte_check_alps.m4:   AS_IF([test "$orte_check_alps_pmi_happy" == 
"yes" -a "$orte_without_full_support" = 0],
./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then

The $with_platform test breaks configure fairly early on at least Solaris 10.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] [EXTERNAL] Re: trunk build failure on Altix [w/ WORK AROUND]

2012-02-20 Thread Paul H. Hargrove

Brian,

Thanks for looking into this.
I'll plan to take a look at the trunk tarball tonight and report back.

-Paul

On 2/20/2012 8:49 AM, Barrett, Brian W wrote:

Hi Paul -

Thanks for noticing this.  I guess we don't have many Altix developers.  I
think I've fixed it on the trunk with r25968, plus r25967 to make sure the
Altix component gets selected over the Linux component on Altix systems.
I don't have an Altix to test on; can you give it a go and let me know if
it works?  In the trunk right now, and should be in the trunk nightly
tarball tomorrow morning.

The problem cropped up when we started running the configure macros for
components that couldn't possibly succeed (which we needed to make
Automake happy in a couple of situations) sometime late in the 1.5 series.
  Before that, a component could never think it succeeded and then later be
told it didn't.  We added yet another macro to handle issues like this, so
it was a fairly easy fix.

Thanks,

Brian

On 2/17/12 4:26 PM, "Paul H. Hargrove"<phhargr...@lbl.gov>  wrote:




I've poked enough at the ompi configure magic to *think* I
understand the source of the problem I've seen w/ both trunk and
1.5.x on the Altix.

The problem appears to be that both timer/altix/configure.m4 and
timer/linux/configure.m4 are setting the value of
$timer_base_include and the LAST one "wins".  Meanwhile, only the
FIRST one is getting added to $static_components ("there can be only
one").  So, I suspect the difference I saw between trunk and 1.5 was
just a matter of which configure probe ran first.

The result of having FIRST and LAST "win" in different settings is a
mismatch.


$ grep -e timer:linux -e timer:altix
  configure.out
  --- MCA component timer:linux (m4 configuration macro, priority
  30)
  checking for MCA component timer:linux compile mode... static
  checking if MCA component timer:linux can compile... yes
  --- MCA component timer:altix (m4 configuration macro, priority
  30)
  checking for MCA component timer:altix compile mode... static
  checking if MCA component timer:altix can compile... no


which picks timer:linux and rejects timer:altix, as compared to:


$ grep -e '"MCA_opal_timer_[SD]' -e
  MCA_timer_ config.status
  S["MCA_opal_timer_DSO_SUBDIRS"]=""
  S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux"

S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la
  "
  S["MCA_opal_timer_DSO_COMPONENTS"]=""
  S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux"
  D["MCA_timer_IMPLEMENTATION_HEADER"]="
  \"opal/mca/timer/altix/timer_altix.h\""


Which will build timer:linux but has improperly picked up the
timer:altix HEADER!


For the present, an explicit --with-timer=altix works-around the
problem in either branch.
However, the setting of the header variable by a NON-selected
component is ERRONEOUS and should get fixed.
In trunk, it may also make sense to raise the priority of
timer:altix above that of timer:linux.

-Paul

On 2/15/2012 12:41 AM, Paul Hargrove wrote:

  I've configured the ompi trunk (nightly tarball 1.7a1r25927)
on an SGI Altix.
  I used no special arguments indicating that this is an Altix,
and there does not appear to be an altix-specific file in
contrib/platform.


  My build fails as follows:




make:
  Entering directory
`/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrapper
s'
CC opal_wrapper.o
CCLD   opal_wrapper
  ../../../opal/.libs/libopen-pal.so: undefined reference to
  `opal_timer_altix_mmdev_timer_addr'
  ../../../opal/.libs/libopen-pal.so: undefined reference to
  `opal_timer_altix_freq'
  collect2: ld returned 1 exit status






  The configure-generated opal_config.h contains
  #define MCA_timer_IMPLEMENTATION_HEADER
"opal/mca/timer/altix/timer_altix.h"


  Nothing appears to have been built in
BUILDDIR/opal/mca/timer/altix.
  However, BUILDDIR/opal/mca/timer/linux has been built.


  -Paul


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
<tel:%2B1-510-486-6900>




-- 
Paul H. Hargrove  phhargr...@lbl.gov

Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



___
dev

Re: [OMPI devel] Fwd: [OMPI users] WEXITSTATUS: OpenMPI 1.5.5 RC1 doesn't build on NetBSD (and perhaps other BSDs)

2012-02-20 Thread Paul H. Hargrove

That was on my list of defects back in December.
This change is ALREADY present in the v1.5 branch in svn.

-Paul

On 2/20/2012 8:48 AM, Jeffrey Squyres wrote:

VT guys --

Can you have a look at this?  I don't know if  needs to be 
protected or not, but it looks like it's needed.

Begin forwarded message:


From: Aleksej Saushev<a...@inbox.ru>
Subject: [OMPI users] WEXITSTATUS: OpenMPI 1.5.5 RC1 doesn't build on NetBSD 
(and perhaps other BSDs)
Date: February 18, 2012 3:11:49 PM EST
To: us...@open-mpi.org
Reply-To: a...@inbox.ru, Open MPI Users<us...@open-mpi.org>

  Hello!

WEXITSTATUS is defined in, see the patch attached.

(Sorry, I couldn't find simple mail interface for bug reports.)




--
HE CE3OH...
___
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users




___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] non-portable test operator in configure

2012-02-20 Thread Paul H. Hargrove
Please note that "==" is NOT a portable binary operator for the "test" 
utility.

It is supported only by the bash built-in version of "test".
The correct operator is a simple "=".

The following appear in the svn trunk

./orte/config/orte_check_alps.m4:   AS_IF([test 
"$orte_check_alps_pmi_happy" == "yes" -a "$orte_without_full_support" = 0],

./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then

The $with_platform test breaks configure fairly early on at least 
Solaris 10.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] trunk build failure on OpenBSD-5.0 [SOLVED/PATCH]

2012-02-20 Thread Paul H. Hargrove
The attachment adds the necessary (cached) check for aio.h before 
enabling fbtl:posix.

-Paul

On 2/17/2012 12:55 AM, Paul Hargrove wrote:

OpenBSD lacks an aio.h header.
configure knows this:

$ grep aio.h configure.log
checking aio.h usability... no
checking aio.h presence... no
checking for aio.h... no


Yet fbtl/posix is enabled, despite needing aio.h:

checking if MCA component fbtl:posix can compile... yes


I am guessing this problem will appear on any platform w/o aio.h.

I think is just a simple matter of requiring OPAL_HAVE_AIO_H when 
"checking if component fbtl:posix can compile".


-Paul

--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

Index: ompi/mca/fbtl/posix/configure.m4
===
--- ompi/mca/fbtl/posix/configure.m4	(revision 0)
+++ ompi/mca/fbtl/posix/configure.m4	(revision 0)
@@ -0,0 +1,34 @@
+# -*- shell-script -*-
+#
+# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana
+# University Research and Technology
+# Corporation.  All rights reserved.
+# Copyright (c) 2004-2005 The University of Tennessee and The University
+# of Tennessee Research Foundation.  All rights
+# reserved.
+# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, 
+# University of Stuttgart.  All rights reserved.
+# Copyright (c) 2004-2012 The Regents of the University of California.
+# All rights reserved.
+# Copyright (c) 2010  Cisco Systems, Inc.  All rights reserved.
+# Copyright (c) 2008-2011 University of Houston. All rights reserved.
+# $COPYRIGHT$
+# 
+# Additional copyrights may follow
+# 
+# $HEADER$
+#
+
+
+# MCA_fbtl_posix_CONFIG(action-if-can-compile, 
+#[action-if-cant-compile])
+# 
+AC_DEFUN([MCA_ompi_fbtl_posix_CONFIG],[
+AC_CHECK_HEADER([aio.h],
+[fbtl_posix_happy="yes"],
+[fbtl_posix_happy="no"])
+
+AS_IF([test "$fbtl_posix_happy" = "yes"],
+  [$1],
+  [$2])
+])dnl


[OMPI devel] flex warning from flex-2.5.4

2012-02-19 Thread Paul H. Hargrove
I've not checked any other systems, but building the trunk on OpenBSD 
and FreeBSD (w/ flex-2.5.4) I see the following:

  LEXshow_help_lex.c
"[srcdir]/opal/util/show_help_lex.l", line 65: warning, dangerous 
trailing context


I found this message in the flex documentation, and it mentions that the 
POSIX draft for LEX leaves such cases undefined.

http://flex.sourceforge.net/manual/Limitations.html

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] excessive warnings from xlc w/ hwloc-trunk

2012-02-19 Thread Paul H. Hargrove



On 2/19/2012 12:45 PM, Samuel Thibault wrote:
[snip[

Ah, right, it's an inline, so we need to declare it first with the
attribute, and then define it:

static __hwloc_inline const char *
hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name)  
__hwloc_attribute_pure;
static __hwloc_inline const char *
hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name)
{
...
}

does it work that way?

Samuel


Yes.  That worked!
-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] excessive warnings from xlc w/ hwloc-trunk

2012-02-19 Thread Paul H. Hargrove



On 2/19/2012 10:54 AM, Samuel Thibault wrote:
[snip]

Does it still complain if using the following?

static __hwloc_inline const char *
  hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) 
__hwloc_attribute_pure

That'd be safer to make sure that the attribute is applied to the
function, not something else.

[snip]

I should have mentioned that I had tried Samuel's suggested form first.
Yes, it complains but worse considers this form to by a syntax error 
rather than just warning about it:

  CC topology.lo
"/users/phh1/hwloc-trunk/include/hwloc.h", line 1247.1: 1506-277 (S) 
Syntax error: possible missing ';' or ','?

make[1]: *** [topology.lo] Error 1


So, we are safer sticking with the current form and ignoring the warning.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[hwloc-devel] excessive warnings from xlc w/ hwloc-trunk

2012-02-17 Thread Paul H. Hargrove
The patch below is required to avoid xlc-11.1 on Linux/ppc64 from 
issuing lots of warnings:


"[srcdir]/hwloc-trunk/include/hwloc.h", line 1245.34: 1506-1385 (W) The 
attribute "pure" is not a valid type attribute.


The problem appears to be that when the function returns a pointer type, 
XLC thinks the attribute is being applied to the return type rather than 
the function.  That is why no other instances of __hwloc_attribute_pure 
need to be reordered.


I am not sure of the risk/reward on applying this change, however.
Gcc seems to be happy enough either way as far as I could tell.

-Paul

--- hwloc-1.5a1r4308/include/hwloc.h~   2012-02-17 17:45:45.0 -0600
+++ hwloc-1.5a1r4308/include/hwloc.h2012-02-17 17:52:20.0 -0600
@@ -1242,7 +1242,7 @@
  *
  * \return \c NULL if no such key exists.
  */
-static __hwloc_inline const char * __hwloc_attribute_pure
+static __hwloc_inline __hwloc_attribute_pure const char *
 hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name)
 {
   unsigned i;

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] trunk build failure on Altix [w/ WORK AROUND]

2012-02-17 Thread Paul H. Hargrove
I've poked enough at the ompi configure magic to *think* I understand 
the source of the problem I've seen w/ both trunk and 1.5.x on the Altix.


The problem appears to be that both timer/altix/configure.m4 and 
timer/linux/configure.m4 are setting the value of $timer_base_include 
and the LAST one "wins".  Meanwhile, only the FIRST one is getting added 
to $static_components ("there can be only one").  So, I suspect the 
difference I saw between trunk and 1.5 was just a matter of which 
configure probe ran first.


The result of having FIRST and LAST "win" in different settings is a 
mismatch.



$ grep -e timer:linux -e timer:altix configure.out
--- MCA component timer:linux (m4 configuration macro, priority 30)
checking for MCA component timer:linux compile mode... static
checking if MCA component timer:linux can compile... yes
--- MCA component timer:altix (m4 configuration macro, priority 30)
checking for MCA component timer:altix compile mode... static
checking if MCA component timer:altix can compile... no


which picks timer:linux and rejects timer:altix, as compared to:


$ grep -e '"MCA_opal_timer_[SD]' -e MCA_timer_ config.status
S["MCA_opal_timer_DSO_SUBDIRS"]=""
S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux"
S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la "
S["MCA_opal_timer_DSO_COMPONENTS"]=""
S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux"
D["MCA_timer_IMPLEMENTATION_HEADER"]=" 
\"opal/mca/timer/altix/timer_altix.h\""


Which will build timer:linux but has improperly picked up the 
timer:altix HEADER!



For the present, an explicit --with-timer=altix works-around the problem 
in either branch.
However, the setting of the header variable by a NON-selected component 
is ERRONEOUS and should get fixed.
In trunk, it may also make sense to raise the priority of timer:altix 
above that of timer:linux.


-Paul

On 2/15/2012 12:41 AM, Paul Hargrove wrote:
I've configured the ompi trunk (nightly tarball 1.7a1r25927) on an SGI 
Altix.
I used no special arguments indicating that this is an Altix, and 
there does not appear to be an altix-specific file in contrib/platform.


My build fails as follows:

make: Entering directory
`/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrappers'
  CC opal_wrapper.o
  CCLD   opal_wrapper
../../../opal/.libs/libopen-pal.so: undefined reference to
`opal_timer_altix_mmdev_timer_addr'
../../../opal/.libs/libopen-pal.so: undefined reference to
`opal_timer_altix_freq'
collect2: ld returned 1 exit status



The configure-generated opal_config.h contains
#define MCA_timer_IMPLEMENTATION_HEADER 
"opal/mca/timer/altix/timer_altix.h"


Nothing appears to have been built in BUILDDIR/opal/mca/timer/altix.
However, BUILDDIR/opal/mca/timer/linux has been built.

-Paul

--
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352 
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 
<tel:%2B1-510-486-6900>


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] excessive warnings on some BSDs [w/ PATCH]

2012-02-17 Thread Paul H. Hargrove
When building trunk or 1.5.x on OpenBSD-5.0 (and maybe others), I get 
*LOTS* of the following:
/usr/include/arpa/inet.h:74: warning: 'struct in_addr' declared inside 
parameter list
/usr/include/arpa/inet.h:74: warning: its scope is only this 
definition or declaration, which is probably not what you want
/usr/include/arpa/inet.h:75: warning: 'struct in_addr' declared inside 
parameter list


This is trivial to fix by including netinet/in.h before arpa/inet.h (see 
attached patch).
The patch applies cleanly to both the trunk and the 1.5 branch (perhaps 
to hold back until 1.6)


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

--- openmpi-1.7a1r25944/opal/include/opal/types.h~	Fri Feb 17 12:01:41 2012
+++ openmpi-1.7a1r25944/opal/include/opal/types.h	Fri Feb 17 11:58:46 2012
@@ -33,6 +33,9 @@
 #ifdef HAVE_SYS_SELECT_H
 #include 
 #endif
+#ifdef HAVE_NETINET_IN_H
+#include 
+#endif
 #ifdef HAVE_ARPA_INET_H
 #include 
 #endif
--- openmpi-1.7a1r25944/orte/mca/rml/oob/rml_oob_component.c~	Fri Feb 17 12:11:58 2012
+++ openmpi-1.7a1r25944/orte/mca/rml/oob/rml_oob_component.c	Fri Feb 17 12:12:08 2012
@@ -23,6 +23,9 @@
 #include "orte_config.h"
 #include "orte/constants.h"

+#ifdef HAVE_NETINET_IN_H
+#include 
+#endif
 #ifdef HAVE_ARPA_INET_H
 #include 
 #endif
--- openmpi-1.7a1r25944/ompi/mca/btl/tcp/btl_tcp_proc.c~	Fri Feb 17 12:11:06 2012
+++ openmpi-1.7a1r25944/ompi/mca/btl/tcp/btl_tcp_proc.c	Fri Feb 17 12:11:21 2012
@@ -19,11 +19,11 @@

 #include "ompi_config.h"

-#ifdef HAVE_ARPA_INET_H
-#include 
-#endif
 #ifdef HAVE_NETINET_IN_H
 #include 
+#endif
+#ifdef HAVE_ARPA_INET_H
+#include 
 #endif

 #include "opal/class/opal_hash_table.h"


[OMPI devel] OPAL_ENABLE_FT_CR build broken in 1.5 branch

2012-02-17 Thread Paul H. Hargrove
I've tried to build from both the 1.5 and trunk nightly tarballs 
configured with "--enable-ft=cr --with-blcr=" .

I am using Intel compilers on Linux/x86.

The trunk was fine, but on the 1.5 branch I see the build fail with:

Making all in mca/btl/sm
make[2]: Entering directory 
`/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1/BLD/ompi/mca/btl/sm'

  CC mca_btl_sm_la-btl_sm.lo
/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1107): 
error: struct "mca_btl_sm_component_t" has no field "mmap_file"

  if( NULL != mca_btl_sm_component.mmap_file ) {
   ^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1113): 
error: struct "mca_btl_sm_component_t" has no field "mmap_file"
  opal_crs_base_metadata_write_token(NULL, 
CRS_METADATA_TOUCH, mca_btl_sm_component.mmap_file->map_path);

^


/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1121): 
error: struct "mca_btl_sm_component_t" has no field "mmap_file"

  if( NULL != mca_btl_sm_component.mmap_file ) {
   ^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1125): 
error: struct "mca_btl_sm_component_t" has no field "mmap_file"
  
opal_crs_base_cleanup_append(mca_btl_sm_component.mmap_file->map_path, 
false);

^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1134): 
error: struct "mca_btl_sm_component_t" has no field "mmap_file"

  if( NULL != mca_btl_sm_component.mmap_file ) {
   ^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1144): 
error: struct "mca_btl_sm_component_t" has no field "mmap_file"
  
opal_crs_base_cleanup_append(mca_btl_sm_component.mmap_file->map_path, 
false);

^

compilation aborted for 
/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c 
(code 2)


Pushing past that error with "make -k" yields a similar problem in 
mpool/sm as well:



Making all in mca/mpool/sm
make[2]: Entering directory 
`/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1/BLD/ompi/mca/mpool/sm'

  CC mpool_sm_module.lo
/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(146): 
error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap"

  unlink(sm_module->sm_common_mmap->map_path);
^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(183): 
error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap"

  if (NULL != self_sm_module->sm_common_mmap) {
  ^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(184): 
error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap"
  
opal_crs_base_cleanup_append(self_sm_module->sm_common_mmap->map_path, 
false);

   ^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(198): 
error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap"

  if (NULL != self_sm_module->sm_common_mmap) {
  ^

/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(199): 
error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap"
  
opal_crs_base_cleanup_append(self_sm_module->sm_common_mmap->map_path, 
false);

   ^

compilation aborted for 
/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c 
(code 2)



-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] btl/gm build broken on trunk

2012-02-16 Thread Paul H. Hargrove
I just tried to build yesterday's ompi trunk tarball (1.7a1r25937) with 
the Intel compilers.

Sorry if this was fixed in the past 23 hours or so.


My system has GM-2.1.30 installed, and icc wasn't happy with 
btl_gm_component.c:
/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gm2-icc-8.1//openmpi-trunk/ompi/mca/btl/gm/btl_gm_component.c(606): 
error #165: too few arguments in function call
  btl->error_cb(>super, 
MCA_BTL_ERROR_FLAGS_FATAL);
  
^


/home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gm2-icc-8.1//openmpi-trunk/ompi/mca/btl/gm/btl_gm_component.c(632): 
error #165: too few arguments in function call
  btl->error_cb(>super, 
MCA_BTL_ERROR_FLAGS_FATAL);
  
^




Usage of btl->error_cb() appears correct on the 1.5 branch (just a 
visual inspection).


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] -fvisibility probe broken [w/ 3-line PATCH]

2012-02-16 Thread Paul H. Hargrove


After seeing some odd behaviors in a build of the trunk last night, I 
took a closer look at the configure probe for -fvisibility support and 
found that recent changes where applied incompletely/incorrectly.  What 
is currently in opal/config/opal_check_visibility.m4:


AC_LINK_IFELSE([AC_LANG_PROGRAM([[
__attribute__((visibility("default"))) int foo;
]],[[fprintf(stderr, "Hello, world\n");]])],
[],
[AS_IF([test -s conftest.err],
   [$GREP -iq visibility conftest.err
# If we find "visibility" in the stderr, then
# assume it doesn't work
AS_IF([test "$?" = "0"], [opal_add=])])
])

Here is a dissection of the args to AC_LINK_IFELSE:
arg1: AC_LANG_PROGRAM
arg2: action-on-success is EMPTY
arg3: action-on-failure is an AS_IF

Unfortunately that is incorrect in 3 ways:

Error #1:
The AS_IF belongs as arg2 (where there is an empty "[]" now).
That is because the intent of that logic is to "double check" a 
successful link to check the stderr for "visibility".
The idea there is that warnings like "unknown attribute visibility 
ignored" will be treated as a fail.

That was the case in the "original" (r22138) version of the logic as well.

However, it appears that this logic has been broken since r23923 when 
Jeff recoded AC_TRY_LINK to AC_LINK_IFELSE in Oct 2010.
Those changes failed to account for the fact that LINK_IFELSE takes 1 
argument for the PROGRAM where TRY_LINK has separate INCLUDE and BODY 
arguments.  That lead to the unintended movement of the 
AS_IF[...grep...] logic from the on-success to the on-failure slots, and 
nothing has been the same since.


Error #2:
action-on-failure should be another instance of "[opal_add=]", do avoid 
using visibility if the link test failed.
This had survived r23923 as a "extra" 4th argument to AC_LINK_IFELSE, 
and was later removed.
This error leads to use of -fvisiblity on compilers that totally failed 
to compile or link the test!


Error #3:
Missing include of stdio.h leads some compilers to fail the test 
unnecessarily.
Unlike the other 2 problems, this leads to REJECTING visibility even 
though it is working (except that error #2 currently hides this).


These problems, which I previously detailed in off-list emails to Jeff, 
apparently got lost in the rush to get hwloc-1.3.2 out the door.

Here is the relatively simple correction:

--- ompi-trunk/opal/config/opal_check_visibility.m4 (revision 25941)
+++ ompi-trunk/opal/config/opal_check_visibility.m4 (working copy)
@@ -56,15 +56,15 @@

 AC_MSG_CHECKING([if $CC supports $opal_add])
 AC_LINK_IFELSE([AC_LANG_PROGRAM([[
+#include 
 __attribute__((visibility("default"))) int foo;
 ]],[[fprintf(stderr, "Hello, world\n");]])],
-[],
 [AS_IF([test -s conftest.err],
[$GREP -iq visibility conftest.err
 # If we find "visibility" in the stderr, then
 # assume it doesn't work
 AS_IF([test "$?" = "0"], [opal_add=])])
-])
+], [opal_add=])
 AS_IF([test "$opal_add" = ""],
   [AC_MSG_RESULT([no])],
   [AC_MSG_RESULT([yes])])


Just to confuse things, the instance of OPAL_CHECK_VISIBLITY in the 
libevent2013 configure is getting the result right "by accident".
In that case the CFLAGS give more verbose warnings and the LINK fails 
(due to missing stdio.h), while yielding "visibility" in the warning 
message:
conftest.c(87): remark #1418: external definition with no prior 
declaration

  __attribute__((visibility("default"))) int foo;



Unfortunately, the incorrect logic made it into hwloc-1.3.2.
So, I'd suggest fixing this in OMPI's embedded hwloc and hwloc's trunk also.

My sincere apologies for not having caught this in the hwloc-1.3.2 
testing where Jeff and I thought we had this issue fixed.
I don't know for sure how I missed re-testing the final cut, but can 
only guess that I left --disable-visibility in my testing scripts.


-Paul "thorough testing is a double-edged sword" Hargrove

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] More on OMPI on MacOS 10.4/ppc [WORK AROUND]

2012-02-16 Thread Paul H. Hargrove
As I already discover (see 
http://www.open-mpi.org/community/lists/devel/2012/02/10444.php), MacOS 
10.4 is NOT listed as a supported platform any longer.  So, this message 
is really just for the archives.


From "man ld" on a MacOS 10.4 system (x86 or ppc):

   MACOSX_DEPLOYMENT_TARGET
  This is set to indicate the oldest Mac OS X  version  
that  that
  the output is to be used on.  When this is set to a 
release that
  is older than the current release features that are 
incompatible
  with that release will be disabled.  If a feature is 
seen in the
  input that can't be in the output due to this setting a  
warning
  is issued.  The current allowable values for this are 
10.1, 10.2
  10.3, and 10.4 with the default being 10.4 for the  
i386  archi-

  tecture and 10.1 for all other architectures.


The last sentence of that seems like a good starting point for why the 
behaviors I see on ppc and x86 differ.

So, before configuring OMPI (tarball 1.7a1r25937 or 1.5.5rc2r25933) I did

$ export MACOSX_DEPLOYMENT_TARGET=10.4


And, everything worked!
Both branches had the previously described errors w/o this env var, but 
now both work fine.


So, anybody in need of OMPI on MacOS 10.4/ppc now has a work-around.

-Paul


--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")

2012-02-15 Thread Paul H. Hargrove
Ah, I see that my problems on MacOS 10.4 are already a moot point, as my 
"option (c)" has already been implemented.


From README in 1.5 branch:

  - OS X (10.5, 10.6), 32 and 64 bit (x86_64), with gcc and Absoft
compilers (*)

and from trunk:

  - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc
and Absoft compilers (*).


-Paul

On 2/15/2012 7:24 PM, Paul H. Hargrove wrote:

As a point for discussion, I am going to offer a simple solution:

c) Ignore this for 1.5.5 and raise the minimum MacOS version from 10.4 
to 10.5 for ompi 1.6.x and 1.7.x


Any strong opinions?

-Paul

On 2/15/2012 10:29 AM, Paul H. Hargrove wrote:
I wanted to note that MacOS 10.4 on *X86* has the same "WEAK" 
definitions in opal_config.h.

Yet it can build ompi-1.5.x with only WARNING about duplicate symbols.
I just tried, and the test code Matthias sent worked too:

$ ./bin/mpicc pmpi_test.c
/usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize
/var/tmp//cc7tP6zp.o definition of _MPI_Finalize in section 
(__TEXT,__text)
/Users/phhargrove/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single 
module) definition of _MPI_Finalize


$ ./a.out
inside MPI_Finalize() wrapper

So, I don't think (A) is an appropriate solution.
I am also wondering if there is some compiler/linker flag we 
could/should pass to "fix" the PPC.



Going back to the 10.4/PPC I see now that despite the warnings, a 
working executable WAS generated:

$ ./a.out
inside MPI_Finalize() wrapper

So, I don't think we have managed to reproduce the source of the 
build problem.


-Paul

On 2/15/2012 9:25 AM, Matthias Jurenz wrote:

Thanks for testing, Paul.

I think we need an additional configure test which disables VT if

a) weak symbol support is disabled/not available

- or more precise -

b) configuring on PPC/Mac10.4 and the used GNU compiler version is 
older or

equal to 4.0.1

I prefer to option b) because VT (i.e. PMPI) should also work 
without weak
symbol support (at least it does on my laptop with gcc 4.4.3 and 
'--disable-
weak-symbols'). On the other hand, in the most cases the compiler 
supports
weak symbols, so option a) would also work, unless weak symbol 
support is

disabled by the configure option '--disable-weak-symbols'.
Jeff, what's your opinion?

Matthias

On Wednesday 15 February 2012 10:33:30 Paul Hargrove wrote:

See responses mixed in below.

On Wed, Feb 15, 2012 at 1:02 AM, Matthias Jurenz<

matthias.jur...@tu-dresden.de>  wrote:
Unfortunately, we don't have access to a PPC system with MacOS 
10.4 to

try to
reproduce the error.
Not too surprising.  I'll see what I can do to help resolve the 
problem.



Paul, could you please check for the definition of the macro
OPAL_HAVE_WEAK_SYMBOLS in ompi_config.h?

ompi_config.h doesn't contain that macro.
However, opal_config.h shows no weak symbol support:
#define HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS 0
#define OPAL_HAVE_ATTRIBUTE_WEAK_ALIAS 0
#define OPAL_HAVE_WEAK_SYMBOLS 0


I assume that the ancient GNU compiler
on PPC/MacOS10.4 does not support weak-symbols which cause the 
multiply

definitions.

Does that mean I should simply not expect to get VT built there?


Furthermore, could you please try to build the following code to test
whether
the PMPI interface of Open MPI works in general?

#include
#include

int MPI_Finalize() {

  printf( "inside MPI_Finalize() wrapper\n" );
  return PMPI_Finalize();

}

int main(int argc, char** argv) {

   MPI_Init(,);
   MPI_Finalize();

}

I am assuming I am supposed to build that with VT disabled in my OMPI
build.
Doing so, I see that PMPI is apparently not working:
$ ./bin/mpicc pmpi_test.c
/usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize
/var/tmp//ccHZvZ3B.o definition of _MPI_Finalize in section 
(__TEXT,__text)
/Users/phargrov/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single 
module)

definition of _MPI_Finalize


Maybe the error occurs only if this code is in a shared library which
depends
on the MPI library (as does the libvt-mpi). Therefor, run the 
following:


$ gcc -fPIC -shared pmpi_test.c -I  -o 
libpmpi_test.dylib

-L  -lmpi
I assume this check might be redundant given that the previous one 
failed.

However, here it is anyway:
$ gcc -fPIC -shared pmpi_test.c -Iinclude -o libpmpi_test.dylib -Llib
powerpc-apple-darwin8-gcc-4.0.1: unrecognized option '-shared'
/usr/bin/ld: Undefined symbols:
_MPI_Init
_PMPI_Finalize
collect2: ld returned 1 exit status


-Paul


Thanks!

Matthias


On 12/14/2011 2:51 PM, Paul H. Hargrove wrote:
I've attempted to reproduce the failure reported below for MacOS 
10.4

for PPC on an X86-64 system.
First, I've realized that while I reported "make check" as the 
source

of the problem, it occurs at "make".

Regardless of that mistake in my reporting, I was unable to 
reproduce
the problem, making this a PPC-specific problem as far as I can 
tell.
Instead of 255 instances of "ld: multiple definitions of symbol 

Re: [OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")

2012-02-15 Thread Paul H. Hargrove

As a point for discussion, I am going to offer a simple solution:

c) Ignore this for 1.5.5 and raise the minimum MacOS version from 10.4 
to 10.5 for ompi 1.6.x and 1.7.x


Any strong opinions?

-Paul

On 2/15/2012 10:29 AM, Paul H. Hargrove wrote:
I wanted to note that MacOS 10.4 on *X86* has the same "WEAK" 
definitions in opal_config.h.

Yet it can build ompi-1.5.x with only WARNING about duplicate symbols.
I just tried, and the test code Matthias sent worked too:

$ ./bin/mpicc pmpi_test.c
/usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize
/var/tmp//cc7tP6zp.o definition of _MPI_Finalize in section 
(__TEXT,__text)
/Users/phhargrove/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single 
module) definition of _MPI_Finalize


$ ./a.out
inside MPI_Finalize() wrapper

So, I don't think (A) is an appropriate solution.
I am also wondering if there is some compiler/linker flag we 
could/should pass to "fix" the PPC.



Going back to the 10.4/PPC I see now that despite the warnings, a 
working executable WAS generated:

$ ./a.out
inside MPI_Finalize() wrapper

So, I don't think we have managed to reproduce the source of the build 
problem.


-Paul

On 2/15/2012 9:25 AM, Matthias Jurenz wrote:

Thanks for testing, Paul.

I think we need an additional configure test which disables VT if

a) weak symbol support is disabled/not available

- or more precise -

b) configuring on PPC/Mac10.4 and the used GNU compiler version is 
older or

equal to 4.0.1

I prefer to option b) because VT (i.e. PMPI) should also work without 
weak
symbol support (at least it does on my laptop with gcc 4.4.3 and 
'--disable-
weak-symbols'). On the other hand, in the most cases the compiler 
supports
weak symbols, so option a) would also work, unless weak symbol 
support is

disabled by the configure option '--disable-weak-symbols'.
Jeff, what's your opinion?

Matthias

On Wednesday 15 February 2012 10:33:30 Paul Hargrove wrote:

See responses mixed in below.

On Wed, Feb 15, 2012 at 1:02 AM, Matthias Jurenz<

matthias.jur...@tu-dresden.de>  wrote:

Unfortunately, we don't have access to a PPC system with MacOS 10.4 to
try to
reproduce the error.
Not too surprising.  I'll see what I can do to help resolve the 
problem.



Paul, could you please check for the definition of the macro
OPAL_HAVE_WEAK_SYMBOLS in ompi_config.h?

ompi_config.h doesn't contain that macro.
However, opal_config.h shows no weak symbol support:
#define HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS 0
#define OPAL_HAVE_ATTRIBUTE_WEAK_ALIAS 0
#define OPAL_HAVE_WEAK_SYMBOLS 0


I assume that the ancient GNU compiler
on PPC/MacOS10.4 does not support weak-symbols which cause the 
multiply

definitions.

Does that mean I should simply not expect to get VT built there?


Furthermore, could you please try to build the following code to test
whether
the PMPI interface of Open MPI works in general?

#include
#include

int MPI_Finalize() {

  printf( "inside MPI_Finalize() wrapper\n" );
  return PMPI_Finalize();

}

int main(int argc, char** argv) {

   MPI_Init(,);
   MPI_Finalize();

}

I am assuming I am supposed to build that with VT disabled in my OMPI
build.
Doing so, I see that PMPI is apparently not working:
$ ./bin/mpicc pmpi_test.c
/usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize
/var/tmp//ccHZvZ3B.o definition of _MPI_Finalize in section 
(__TEXT,__text)
/Users/phargrov/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single 
module)

definition of _MPI_Finalize


Maybe the error occurs only if this code is in a shared library which
depends
on the MPI library (as does the libvt-mpi). Therefor, run the 
following:


$ gcc -fPIC -shared pmpi_test.c -I  -o libpmpi_test.dylib
-L  -lmpi
I assume this check might be redundant given that the previous one 
failed.

However, here it is anyway:
$ gcc -fPIC -shared pmpi_test.c -Iinclude -o libpmpi_test.dylib -Llib
powerpc-apple-darwin8-gcc-4.0.1: unrecognized option '-shared'
/usr/bin/ld: Undefined symbols:
_MPI_Init
_PMPI_Finalize
collect2: ld returned 1 exit status


-Paul


Thanks!

Matthias


On 12/14/2011 2:51 PM, Paul H. Hargrove wrote:
I've attempted to reproduce the failure reported below for MacOS 10.4
for PPC on an X86-64 system.
First, I've realized that while I reported "make check" as the source
of the problem, it occurs at "make".

Regardless of that mistake in my reporting, I was unable to reproduce
the problem, making this a PPC-specific problem as far as I can tell.
Instead of 255 instances of "ld: multiple definitions of symbol 
_MPI_*"
I get instances of "ld: warning multiple definitions of symbol 
_MPI*",

where the only difference is the addition of the word "warning".
However, this is apparently non-fatal on the x86-64 but fatal by
default on PPC.

-Paul

On 12/13/2011 9:30 PM, Paul H. Hargrove wrote:

Using the 1.5.5rc1 tarball, I've repeated tests on the following
platforms for which I recently reporte

Re: [OMPI devel] VT build failure with Clang++

2012-02-15 Thread Paul H. Hargrove

Dmitri,

Since I have not seen any error like this from gcc, pgi, pathcc, xlc, 
icc, open64 or suncc, I am pretty sure the problem is Clang-specific 
even if not a true "bug" in Clang.


I just test everything I can get my hands on and report what I find.
If there is not a simple fix for this then it is not a big deal YET.
However, it is widely expected that Apple will move to a Clang-only (no 
gcc/g++) release of Xcode as soon as they are able.

So, it *might* become a concern in the near future.

So, how should we proceed on this?

-Paul

On 2/15/2012 8:38 AM, Dmitri Gribenko wrote:

I don't know if it is a Clang bug, but here's my understanding of the problem.

[...excellent description removed...]


I'm not sure if this is a bug in Clang because I don't know if Clang
should have tried to instantiate create().

Dmitri


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] libevent build fails when configured with --disable-hwloc

2012-02-15 Thread Paul H. Hargrove

Thanks, Ralph.

Your commit missed the nightly tarball, but the configure logic to 
exclude the rank_file component was in there.
So, I dropped the new libevent2013_module.c into tonight's tarball 
(1.7a1r25937).
My build configured --without-hwloc now PASSes "make all install check 
clean".


And thanks for the nfs4 fix too, BTW:
$ svn praise test/util/opal_path_nfs.c | grep nfs4
 25939rhc 0 == strcasecmp (fs, "nfs4") ||

-Paul

On 2/15/2012 6:52 PM, Ralph Castain wrote:
Thanks Paul. I modified the patch a bit to silence some warnings, but 
added it to the trunk.



On Wed, Feb 15, 2012 at 2:17 PM, Paul H. Hargrove <phhargr...@lbl.gov 
<mailto:phhargr...@lbl.gov>> wrote:


The following 1-line change resolves the problem for me, and I see
no potential down-side to it:

---
openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c~
 2012-02-15 14:11:22.274734667 -0800
+++
openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c
  2012-02-15 14:11:25.183478598 -0800
@@ -4,7 +4,7 @@
 */
 #include "opal_config.h"
 #include "opal/constants.h"
-#include "config.h"
+#include "libevent/config.h"

 #ifdef HAVE_SYS_TYPES_H
 #include 


-Paul


On 2/15/2012 1:58 PM, Paul H. Hargrove wrote:

Here is a bit more on this.
When I configure w/ only a --prefix and CFLAGS=-save-temps, I
can examine libevent2013_module.i which contains the following:

# 7
"../../../../../opal/mca/event/libevent2013/libevent2013_module.c"
2
# 1

"../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen/config.h"
1
# 8
"../../../../../opal/mca/event/libevent2013/libevent2013_module.c"
2

What that says is that the '#include "config.h"' on line 7 of
libevent2013_module.c has included hwloc's config.h, as I had
claimed earlier (and this was much easier than manually
traversing the -I list as I had done before).


-- 
Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>

Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
<tel:%2B1-510-495-2352>
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
<tel:%2B1-510-486-6900>

___
devel mailing list
de...@open-mpi.org <mailto:de...@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/devel




_______
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] opal_path_nfs test failure on NFS4 [w/ PATCH]

2012-02-15 Thread Paul H. Hargrove
Testing a trunk tarball (1.7a1r25927) I am seeing an opal_path_nfs 
failure from "make check":

 Failure : Mismatch: input "/opt/cluster", expected:0 got:1

SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 20 failed)
FAIL: opal_path_nfs


The "mount" command reports /opt/cluster as "nfs4" which appears to be 
distinct from "nfs" (which is reported for other mount points):
X:/cluster on /opt/cluster type nfs4 
(rw,intr,nolock,addr=,clientaddr=XXX)


Notice that the failure was "expected:0 got:1".
That means opal_path_nfs() is "smarter" than the test in this case.

The 1-line addition below fixes this for me , and should apply cleanly 
to 1.5.x as well (or hold for 1.6 if desired).


-Paul

--- openmpi-1.7a1r25927/test/util/opal_path_nfs.c   2012-02-15 
03:27:46.0 +0100
+++ openmpi-1.7a1r25927m/test/util/opal_path_nfs.c  2012-02-16 
01:49:18.882418827 +0100

@@ -154,6 +154,7 @@

 nfs_tmp[mount_known] = false;
 if (0 == strcasecmp (fs, "nfs") ||
+0 == strcasecmp (fs, "nfs4") ||
 0 == strcasecmp (fs, "lustre") ||
     0 == strcasecmp (fs, "panfs") ||
 0 == strcasecmp (fs, "gpfs"))


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] libevent build fails when configured with --disable-hwloc

2012-02-15 Thread Paul H. Hargrove
The following 1-line change resolves the problem for me, and I see no 
potential down-side to it:


--- 
openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c~  
2012-02-15 14:11:22.274734667 -0800
+++ 
openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c   
2012-02-15 14:11:25.183478598 -0800

@@ -4,7 +4,7 @@
  */
 #include "opal_config.h"
 #include "opal/constants.h"
-#include "config.h"
+#include "libevent/config.h"

 #ifdef HAVE_SYS_TYPES_H
 #include 


-Paul

On 2/15/2012 1:58 PM, Paul H. Hargrove wrote:

Here is a bit more on this.
When I configure w/ only a --prefix and CFLAGS=-save-temps, I can 
examine libevent2013_module.i which contains the following:


# 7 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2
# 1 
"../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen/config.h" 
1

# 8 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2

What that says is that the '#include "config.h"' on line 7 of 
libevent2013_module.c has included hwloc's config.h, as I had claimed 
earlier (and this was much easier than manually traversing the -I list 
as I had done before).


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] libevent build fails when configured with --disable-hwloc

2012-02-15 Thread Paul H. Hargrove

Here is a bit more on this.
When I configure w/ only a --prefix and CFLAGS=-save-temps, I can 
examine libevent2013_module.i which contains the following:


# 7 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2
# 1 
"../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen/config.h" 
1

# 8 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2

What that says is that the '#include "config.h"' on line 7 of 
libevent2013_module.c has included hwloc's config.h, as I had claimed 
earlier (and this was much easier than manually traversing the -I list 
as I had done before).


This is a VPATH build from a trunk tarball (1.7a1r25927).
Perhaps Ralph could not reproduce because of a difference between svn 
and tarball, such as autotools versions, or use of a non-VPATH build?


For me there is a generated
   BLDDIR/opal/mca/event/libevent2013/libevent/config.h
but that directory does NOT appear in the -I's, though the $(srcdir) 
version does.
So, I suspect a non-VPATH build would work when configured with 
--without-hwloc


Below is a reformatted version of the compile command from "make V=1".
I've marked two things:
1 = the hwloc directory from whence config.h is being included
2 = two instances of $(srcdir)/libevent (key: 5*"../"  = srcdir, 4*"../" 
= blddir)



gcc -DHAVE_CONFIG_H
-I.
-I../../../../../opal/mca/event/libevent2013
-I../../../../opal/include
-I../../../../orte/include
-I../../../../ompi/include
1-> 
-I../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen

-I../../../../opal/mca/hwloc/hwloc132/hwloc/include/hwloc/autogen
2-> -I../../../../../opal/mca/event/libevent2013/libevent
-I../../../../../opal/mca/event/libevent2013/libevent/include
-I./libevent/include
-I../../../../../opal/mca/event/libevent2013/libevent/compat
-I../../../../..
-I../../../..
-I../../../../../opal/include
-I../../../../../orte/include
-I../../../../../ompi/include

-I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/hwloc/hwloc132/hwloc/include

-I/home/phargrov/OMPI/openmpi-1.7a1r25927/BLD-with/opal/mca/hwloc/hwloc132/hwloc/include
2->  
-I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent

-I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent/include

-I/home/phargrov/OMPI/openmpi-1.7a1r25927/BLD-with/opal/mca/event/libevent2013/libevent/include

-I/usr/include/infiniband
-I/usr/include/infiniband
-O3 -DNDEBUG -save-temps -finline-functions -fno-strict-aliasing
-pthread

-I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/hwloc/hwloc132/hwloc/include
-MT libevent2013_module.lo -MD -MP -MF 
.deps/libevent2013_module.Tpo
-c 
../../../../../opal/mca/event/libevent2013/libevent2013_module.c

-fPIC -DPIC -o .libs/libevent2013_module.o



-Paul

On 2/15/2012 1:16 PM, Paul H. Hargrove wrote:

Thanks, Ralph.

I am a little deficient in the autotools department.
So, I will probably only be able to retest after a new trunk tarball 
is generated tonight.


In the meantime I might be able to get more info on the config.h problem.
If I add -save-temps to CFLAGS I should be able to examine the .i file 
w/o and w/ --disable-hwloc.

That will either prove or disprove what I've claimed is happening.

-Paul

On 2/15/2012 5:47 AM, Ralph Castain wrote:

Built on Linux --without-hwloc as well, with the fix.

On Wed, Feb 15, 2012 at 3:13 AM, Ralph Castain <r...@open-mpi.org 
<mailto:r...@open-mpi.org>> wrote:


Hi Paul

The rank_file component should not attempt to build if
--without-hwloc is given - I've fixed that now. Thanks for
reporting it.

With that fix, I was able to build the trunk on Mac - testing
Linux now. I haven't checked for the config.h confusion you
report, though - just noting that it built.



--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900


___
devel mailing list
de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] trunk build failed when configured with --disable-hwloc

2012-02-15 Thread Paul H. Hargrove

Thanks, Ralph.

I am a little deficient in the autotools department.
So, I will probably only be able to retest after a new trunk tarball is 
generated tonight.


In the meantime I might be able to get more info on the config.h problem.
If I add -save-temps to CFLAGS I should be able to examine the .i file 
w/o and w/ --disable-hwloc.

That will either prove or disprove what I've claimed is happening.

-Paul

On 2/15/2012 5:47 AM, Ralph Castain wrote:

Built on Linux --without-hwloc as well, with the fix.

On Wed, Feb 15, 2012 at 3:13 AM, Ralph Castain <r...@open-mpi.org 
<mailto:r...@open-mpi.org>> wrote:


Hi Paul

The rank_file component should not attempt to build if
--without-hwloc is given - I've fixed that now. Thanks for
reporting it.

With that fix, I was able to build the trunk on Mac - testing
Linux now. I haven't checked for the config.h confusion you
report, though - just noting that it built.



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")

2012-02-15 Thread Paul H. Hargrove
I wanted to note that MacOS 10.4 on *X86* has the same "WEAK" 
definitions in opal_config.h.

Yet it can build ompi-1.5.x with only WARNING about duplicate symbols.
I just tried, and the test code Matthias sent worked too:

$ ./bin/mpicc pmpi_test.c
/usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize
/var/tmp//cc7tP6zp.o definition of _MPI_Finalize in section (__TEXT,__text)
/Users/phhargrove/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single 
module) definition of _MPI_Finalize


$ ./a.out
inside MPI_Finalize() wrapper

So, I don't think (A) is an appropriate solution.
I am also wondering if there is some compiler/linker flag we 
could/should pass to "fix" the PPC.



Going back to the 10.4/PPC I see now that despite the warnings, a 
working executable WAS generated:

$ ./a.out
inside MPI_Finalize() wrapper

So, I don't think we have managed to reproduce the source of the build 
problem.


-Paul

On 2/15/2012 9:25 AM, Matthias Jurenz wrote:

Thanks for testing, Paul.

I think we need an additional configure test which disables VT if

a) weak symbol support is disabled/not available

- or more precise -

b) configuring on PPC/Mac10.4 and the used GNU compiler version is older or
equal to 4.0.1

I prefer to option b) because VT (i.e. PMPI) should also work without weak
symbol support (at least it does on my laptop with gcc 4.4.3 and '--disable-
weak-symbols'). On the other hand, in the most cases the compiler supports
weak symbols, so option a) would also work, unless weak symbol support is
disabled by the configure option '--disable-weak-symbols'.
Jeff, what's your opinion?

Matthias

On Wednesday 15 February 2012 10:33:30 Paul Hargrove wrote:

See responses mixed in below.

On Wed, Feb 15, 2012 at 1:02 AM, Matthias Jurenz<

matthias.jur...@tu-dresden.de>  wrote:

Unfortunately, we don't have access to a PPC system with MacOS 10.4 to
try to
reproduce the error.

Not too surprising.  I'll see what I can do to help resolve the problem.


Paul, could you please check for the definition of the macro
OPAL_HAVE_WEAK_SYMBOLS in ompi_config.h?

ompi_config.h doesn't contain that macro.
However, opal_config.h shows no weak symbol support:
#define HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS 0
#define OPAL_HAVE_ATTRIBUTE_WEAK_ALIAS 0
#define OPAL_HAVE_WEAK_SYMBOLS 0


I assume that the ancient GNU compiler
on PPC/MacOS10.4 does not support weak-symbols which cause the multiply
definitions.

Does that mean I should simply not expect to get VT built there?


Furthermore, could you please try to build the following code to test
whether
the PMPI interface of Open MPI works in general?

#include
#include

int MPI_Finalize() {

  printf( "inside MPI_Finalize() wrapper\n" );
  return PMPI_Finalize();

}

int main(int argc, char** argv) {

   MPI_Init(,);
   MPI_Finalize();

}

I am assuming I am supposed to build that with VT disabled in my OMPI
build.
Doing so, I see that PMPI is apparently not working:
$ ./bin/mpicc pmpi_test.c
/usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize
/var/tmp//ccHZvZ3B.o definition of _MPI_Finalize in section (__TEXT,__text)
/Users/phargrov/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single module)
definition of _MPI_Finalize


Maybe the error occurs only if this code is in a shared library which
depends
on the MPI library (as does the libvt-mpi). Therefor, run the following:

$ gcc -fPIC -shared pmpi_test.c -I  -o libpmpi_test.dylib
-L  -lmpi

I assume this check might be redundant given that the previous one failed.
However, here it is anyway:
$ gcc -fPIC -shared pmpi_test.c -Iinclude -o libpmpi_test.dylib -Llib
powerpc-apple-darwin8-gcc-4.0.1: unrecognized option '-shared'
/usr/bin/ld: Undefined symbols:
_MPI_Init
_PMPI_Finalize
collect2: ld returned 1 exit status


-Paul


Thanks!

Matthias


On 12/14/2011 2:51 PM, Paul H. Hargrove wrote:
I've attempted to reproduce the failure reported below for MacOS 10.4
for PPC on an X86-64 system.
First, I've realized that while I reported "make check" as the source
of the problem, it occurs at "make".

Regardless of that mistake in my reporting, I was unable to reproduce
the problem, making this a PPC-specific problem as far as I can tell.
Instead of 255 instances of "ld: multiple definitions of symbol _MPI_*"
I get instances of "ld: warning multiple definitions of symbol _MPI*",
where the only difference is the addition of the word "warning".
However, this is apparently non-fatal on the x86-64 but fatal by
default on PPC.

-Paul

On 12/13/2011 9:30 PM, Paul H. Hargrove wrote:

Using the 1.5.5rc1 tarball, I've repeated tests on the following
platforms for which I recently reported 1.4.5rc1 results:

MacOS 10.5 (Leopard) on PPC:
powerpc-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5488)
MacOS 10.4 (Tiger) on PPC:
powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc.
build 5341)
MacOS 10.3 (Panther) on PPC:
gcc (GCC) 3.3 20030304

[OMPI devel] More MIPS asm patches

2012-02-15 Thread Paul H. Hargrove



On 2/14/2012 10:13 PM, Paul H. Hargrove wrote:
On the linux/mips64el platform I also tried the PathScale 3.3a 
compilers on both branches.
On both branches the atomic_*_noinline tests all PASS, which validates 
these patches.

On trunk all the tests in test/asm are PASSing.
However, the versions NOT suffixed with _noinline are FAILing on the 
1.5 branch.
Since those failures DO NOT use the files touched by these patches, 
they are irrelevant. 


Oops - I was looking at the wrong output when I stated pathcc/trunk was 
PASSing all tests.

The *inline* atomics tests SIGBUS w/ the pathcc compilers on BOTH branches.

I know from previous encounters with pathcc on MIPS that the problem is 
due to the explict use of "$1" (aka "AT", the "Assembler Temporary" 
register).  Unlike gcc, pathcc schedules this as a normal register.  
Indeed the attached patch (which should apply cleanly to both branches) 
resolves the problem simply by conditionally adding "at" to the clobbers 
for the inline asm.


This is independent of the patches in my previous posting.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

--- openmpi-1.7a1r25913/opal/include/opal/sys/mips/atomic.h 2012-02-13 
20:00:06.0 -0600
+++ openmpi-1.7a1r25913m/opal/include/opal/sys/mips/atomic.h2012-02-15 
00:23:44.648085811 -0600
@@ -119,7 +119,11 @@
  ".set reorder  \n"
  : "="(ret), "=m"(*addr)
  : "m"(*addr), "r"(oldval), "r"(newval)
- : "cc", "memory");
+ : "cc", "memory"
+#ifdef __PATHCC__
+   , "at"
+#endif
+);
return (ret == oldval);
 }

@@ -168,7 +172,11 @@
  ".set reorder  \n"
  : "=" (ret), "=m" (*addr)
  : "m" (*addr), "r" (oldval), "r" (newval)
- : "cc", "memory");
+ : "cc", "memory"
+#ifdef __PATHCC__
+   , "at"
+#endif
+);

return (ret == oldval);
 }


[OMPI devel] Fixes for MIPS assembly [w/ PATCHES]

2012-02-15 Thread Paul H. Hargrove
The attached patches fix three problems with the non-inline ASM for MIPS 
(and MIPS64EL):

1) ".set rerorder" was placed too early.
   This was causing loss of the SLTU instruction in the jump delay
   slot which follows the return instruction.  Since that SLTU is
   used to set the return value, this was fatal to most tests in test/asm.
2) The 64-bit cmpset code was performing the XOR (to compare the read
   value to 'oldval') using 'addr' as the destination register.  Since
   XOR is in the delay slot of the retry branch instruction (except in
   the acq variant) any retry would load from an invalid 'addr' (SEGV).
3) The 64-bit cmpset code was using the wrong destination register for
   the SLTU and thus not setting the return value (even after the
   ".set reorder" was placed correctly).

There is one patch each for the 1.5 branch and trunk.
Both have been testing with on:
linux/mips32 w/ -march=4kc in the *FLAGS (gcc-4.4.5)
linux/mips64 w/ -mabi=n32 in the *FLAGS (gcc-4.3.2)
linux/mips64 w/ -mabi=64 in the *FLAGS (gcc-4.3.2)
linux/mips64el (gcc-4.2.3)

Of those 8 builds, the mips32/ompi-1.5 build is the only one that fails.
That is because, unlike trunk, it tries to build the 64-bit atomics 
which the assembler then rejects.

I have not attempted to backport the fix(es) for that from trunk to 1.5.

On the linux/mips64el platform I also tried the PathScale 3.3a compilers 
on both branches.
On both branches the atomic_*_noinline tests all PASS, which validates 
these patches.

On trunk all the tests in test/asm are PASSing.
However, the versions NOT suffixed with _noinline are FAILing on the 1.5 
branch.
Since those failures DO NOT use the files touched by these patches, they 
are irrelevant.


If/when these patches have been committed, I may consider returning to 
the 1.5 branch to backport/CMR

+ support for MIPS32 (should not be trying to build the 64-bit atomics)
+ fix for the inline atomics (the FAILures on the inline tests) w/ pathcc

-Paul

--
Paul H. hargrovephhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

diff -ur openmpi-1.5.5rc2r25906/opal/asm/base/MIPS.asm 
openmpi-1.5.5rc2r25906m/opal/asm/base/MIPS.asm
--- openmpi-1.5.5rc2r25906/opal/asm/base/MIPS.asm   2012-02-10 
21:16:29.0 -0600
+++ openmpi-1.5.5rc2r25906m/opal/asm/base/MIPS.asm  2012-02-14 
16:16:26.948085714 -0600
@@ -34,11 +34,10 @@
sc $2, 0($4) 
beqz   $2, retry1
 done1: 
-   .set reorder  
-
xor $3,$3,$5
j   ra
sltu$2,$3,1
+   .set reorder  
 END(opal_atomic_cmpset_32)


@@ -52,11 +51,10 @@
beqz   $2, retry2   
 done2: 
sync
-   .set reorder  
-
xor $3,$3,$5
j   ra
sltu$2,$3,1
+   .set reorder  
 END(opal_atomic_cmpset_acq_32)


@@ -70,16 +68,15 @@
sc $2, 0($4) 
beqz   $2, retry3   
 done3: 
-   .set reorder  
-
xor $3,$3,$5
j   ra
sltu$2,$3,1
+   .set reorder  
 END(opal_atomic_cmpset_rel_32)


 LEAF(opal_atomic_cmpset_64)
-   .set noreorder
+   .set noreorder
 retry4:
lld$3, 0($4) 
bne$3, $5, done4   
@@ -87,11 +84,10 @@
scd$2, 0($4) 
beqz   $2, retry4   
 done4: 
-   .set reorder  
-
-   xor $4,$3,$5
+   xor $3,$3,$5
j   ra
-   sltu$3,$4,1
+   sltu$2,$3,1
+   .set reorder  
 END(opal_atomic_cmpset_64)


@@ -104,11 +100,11 @@
scd$2, 0($4) 
beqz   $2, retry5   
 done5: 
-   .set reorder  
sync
-   xor $4,$3,$5
+   xor $3,$3,$5
j   ra
-   sltu$3,$4,1
+   sltu$2,$3,1
+   .set reorder  
 END(opal_atomic_cmpset_acq_64)


@@ -122,9 +118,8 @@
scd$2, 0($4) 
beqz   $2, retry6   
 done6: 
-   .set reorder  
-
-   xor $4,$3,$5
+   xor $3,$3,$5
j   ra
-   sltu$3,$4,1
+   sltu$2,$3,1
+   .set reorder  
 END(opal_atomic_cmpset_rel_64)
diff -ur openmpi-1.5.5rc2r25906/opal/asm/generated/atomic-mips-irix.s 
openmpi-1.5.5rc2r25906m/opal/asm/generated/atomic-mips-irix.s
--- openmpi-1.5.5rc2r25906/opal/asm/generated/atomic-mips-irix.s
2012-02-10 21:25:44.0 -0600
+++ openmpi-1.5.5rc2r25906m/opal/asm/generated/atomic-mips-irix.s   
2012-02-14 16:29:55.140085838 -0600
@@ -33,11 +33,10 @@
sc $2, 0($4) 
beqz   $2, retry1
 done1: 
-   .set reorder  
-
xor $3,$3,$5
j   ra
sltu$2,$3,1
+ 

Re: [OMPI devel] trunk build failed when configured with --disable-hwloc

2012-02-14 Thread Paul H. Hargrove



On 2/14/2012 5:10 PM, Paul H. Hargrove wrote:
I have configured the ompi-trunk (from last night's tarball: 
1.7a1r25913) with --without-hwloc.

Having done so, I see the following failure at build time:


  CC rmaps_rank_file_component.lo
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo 


nent.c: In function 'orte_rmaps_rank_file_open':
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo 

nent.c:111: error: 'opal_hwloc_binding_policy' undeclared (first use 
in this function)
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo 


nent.c:111: error: (Each undeclared identifier is reported only once
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo 


nent.c:111: error: for each function it appears in.)
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_component.c:111: 
error: 'OPAL_BIND_TO_CPUSET' undeclared (first use in this function)


Looks like this code is not "aware" that hwloc has been configured out.
This is not present in the 1.5 branch configured with identical 
arguments.


-Paul



The following appears to "fix" that, but I am uncertain if this is the 
desired fix.
--- orte/mca/rmaps/rank_file/rmaps_rank_file_component.c~   
2012-02-14 17:25:07.653483222 -0800
+++ orte/mca/rmaps/rank_file/rmaps_rank_file_component.c
2012-02-14 17:25:28.803483261 -0800

@@ -107,8 +107,10 @@
 }
 ORTE_SET_MAPPING_POLICY(orte_rmaps_base.mapping, 
ORTE_MAPPING_BYUSER);
 ORTE_SET_MAPPING_DIRECTIVE(orte_rmaps_base.mapping, 
ORTE_MAPPING_GIVEN);

+#if OPAL_HAVE_HWLOC
 /* we are going to bind to cpuset since the user is 
specifying the cpus */
 OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, 
OPAL_BIND_TO_CPUSET);

+#endif
 /* make us first */
 my_priority = 1;
 }



HOWEVER, I am now also seeing the following occurring ONLY when 
configured with --disable-hwloc:
make[1]: Entering directory 
`/home/phargrov/openmpi-1.7a1r25913/BLD2/opal/mca/event/libevent2013'

  CC libevent2013_module.lo
../../../../../opal/mca/event/libevent2013/libevent2013_module.c:7:20: 
error: config.h: No such file or directory
../../../../../opal/mca/event/libevent2013/libevent2013_module.c: In 
function 'opal_event_init':
../../../../../opal/mca/event/libevent2013/libevent2013_module.c:243: 
warning: ignoring return value of 'asprintf', declared with attribute 
warn_unused_result

make[1]: *** [libevent2013_module.lo] Error 1


It seems VERY odd to me that disabling hwloc should have that effect.
Looking deeper it appears that '#include "config.h"' in 
libevent2013_module.c has been including the config.h from HWLOC, 
instead of the one from libevent2013.  If one examines the -I options 
carefully, you will see that $(builddr)/libevent is NOT in the include 
path, but that is the location of the config.h generated by libevent's 
configure script!


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] trunk build failed when configured with --disable-hwloc

2012-02-14 Thread Paul H. Hargrove
I have configured the ompi-trunk (from last night's tarball: 
1.7a1r25913) with --without-hwloc.

Having done so, I see the following failure at build time:


  CC rmaps_rank_file_component.lo
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo
nent.c: In function 'orte_rmaps_rank_file_open':
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo
nent.c:111: error: 'opal_hwloc_binding_policy' undeclared (first use 
in this function)

/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo
nent.c:111: error: (Each undeclared identifier is reported only once
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo
nent.c:111: error: for each function it appears in.)
/home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_component.c:111: 
error: 'OPAL_BIND_TO_CPUSET' undeclared (first use in this function)


Looks like this code is not "aware" that hwloc has been configured out.
This is not present in the 1.5 branch configured with identical arguments.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] the dangers of configure probing argument counts

2012-02-14 Thread Paul H. Hargrove
There was recently a fair amount of work done in hwloc to get configure 
to work correctly for a probe that was intended to determine how many 
arguments appear in a specific function prototype.  The "issue" was that 
the C spec doesn't require that the C compiler issue an error for either 
too-many or too-few arguments.  While gcc and most other compilers make 
both cases an error, there are two compilers of non-trivial importance 
which do NOT:
+  By default the IBM (xlc) C compiler warns for the case of too many 
argument.
+  By default the Intel (icc) C compiler warns for the case of too few 
arguments.


This renders configure-time tests that want to check argument counts 
unreliable unless one takes special care to add something "special" to 
CFLAGS.  While hacking on hwloc we determined that is was NOT safe for 
configure to add to CFLAGS in general, nor to ask the user to do so.  It 
was only safe to /temporarily/ add to CFLAGS for the duration of the 
argument count probe.


So, WHY am I tell you all this?
Because of the following in 
openmpi-1.7a1r25865/ompi/config/ompi_check_openib.m4:

  [AC_CACHE_CHECK(
  [number of arguments to ibv_create_cq],

which performs exactly the sort of test I am warning against.

So, I would encourage somebody to make the effort to reuse the configure 
logic Jeff and I developed for hwloc.
In particular look for setting and use of HWLOC_STRICT_ARGS_CFLAGS in 
config/hwloc.m4


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] hwloc 1.3.2rc2 released

2012-02-14 Thread Paul H. Hargrove


On 2/13/2012 1:30 PM, Jeff Squyres wrote:

Due to the volume of off-list emails, I'm kinda expecting this rc to be good / 
final.  However, please do at least some cursory testing so that we can be sure.


I disregarded the "cursory" and ran on 61 arch/os/compiler combinations.
I can see only 2 problems at this point:
+ known libnuma issues on a "wierd" virtual node - NOT expected to fix 
in 1.3.x
+ "make check" failure w/ icc-8.0 on x86/Linux - BUT icc-9.0 and gcc are 
both fine on the same node (so probably a compiler bug).


So, I agree this looks "final" to me.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



[OMPI devel] 1.5.5rc2r25906 test results

2012-02-12 Thread Paul H. Hargrove



On 2/10/2012 6:04 PM, Jeff Squyres wrote:

On Feb 10, 2012, at 8:57 PM, Jeff Squyres wrote:


1.5.5rc2 coming soon.

I should qualify that statement: many things have been resolved, but there's a 
few more things to go.  A new rc will come when they have been resolved:

   https://svn.open-mpi.org/trac/ompi/report/15




I just tried tonight's nightly tarball for the 1.5 branch (1.5.5rc2r25906).
I found the following issues, which I wad previously reported against 
1.5.5rc1, for which I did NOT find a corresponding ticket in 
"report/15".  My apologies is I've missed a ticket, or if any of these 
were deferred to 1.6.x (as was Lion+PGI, for instance).


+ GNU Make required for "make clean" due to use of non-standard $(RM)
Reported in http://www.open-mpi.org/community/lists/devel/2011/12/10184.php

+ MacOS 10.4 on ppc fails linking libvt-mpi.la (multiply defined symbols)
Reported in http://www.open-mpi.org/community/lists/devel/2011/12/10090.php
My MacOS 10.4/x86 machine is down, but I don't believe it had this 
problem w/ rc1.


+ ROMIO uses explicit MAKE=make, causing problems if one builds ompi w/ 
gmake

Reported in http://www.open-mpi.org/community/lists/devel/2012/01/10300.php

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] v1.4.5rc6 released

2012-02-11 Thread Paul H. Hargrove

No new problems to report w/ 60+ platforms tested.

I was unable to retest MacOS 10.4 because both my x86 and ppc hosts are 
down.
Relative to the last time I reported my list of platforms, I have add 
FreeBSD9 on i386 and amd86.
There have also been some additional compilers added on Linux x86 and/or 
Linux x86-64.


I believe that all my odd platform/compiler issues have been addressed 
in README.
Several platforms were documented as not supported, and some of these 
configure is now smart enough to reject.

Others that required work-arounds have been documented as well.

This looks like ready to go from my point of view (wide portability).
If there are other things I might help test to speed the release, let me 
know.


-Paul

On 2/10/2012 6:11 PM, Jeff Squyres wrote:

Usual location:

 http://www.open-mpi.org/software/ompi/v1.4/

Changes since rc5:

- document LD_LIBRARY_PATH for -m32 with Ubuntu/Sun compilers
- refuse to configure with gccffs
- LANL TLCC2 platform files



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove
That's probably a reflection of the status of the "Open MPI User 
Documentation" sub-project :-)


On 2/10/2012 5:12 PM, Jeff Squyres wrote:

FWIW: google analytics indicates that the FAQ and the mailing list archives are 
among the most heavily used sections of the web site.  :-)


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove

Much better - at least to the extent that users actually read FAQs :-)
-Paul

On 2/10/2012 5:01 PM, Jeff Squyres (jsquyres) wrote:

Check out #220 now; I updated it.


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 12:21 PM, Jeff Squyres wrote:

On Feb 10, 2012, at 3:14 PM, Paul H. Hargrove wrote:


+ User knows nothing about xen, and thus nothing about virbr0
+ User has a local-only interface (eth8 in my made up example)
+ User reads FAQ entry "220. How do I tell Open MPI which TCP networks to use?"
+ User follows instructions given in said FAQ, yielding my example command line.

Do you mean that eth8 is the only non-loopback interface on their laptop, and 
it's disconnected?  (e.g., sitting on a train with no wifi and no wired 
ethernet)

Then OMPI would have disqualified that interface, anyway (because it wasn't up).

I think I'm missing the zen of your question... :-\



The point of the question isn't related to WHY eth8 is useless - just 
assume it is.
Assume it is UP, but useless for whatever reasons motivated writing FAQ 
#220.

It could be Terry's example of a port connected to the service processor.

The concern is what happens in this situation when the user, following 
the advice in the FAQ, passes an explicit setting for 
btl_tcp_if_exclude, which DOES NOT include virbr0?
They don't know it was there before, or that it needs to be there (the 
FAQ states that lo MUST be included).

So, by following the FAQ they don't resolve their problem.
OMPI ceases any attempts use of eth8 (or whatever), but loss of the 
implicit virbr0 from the exclude list results in their system attempting 
to use virbr0 (and thus continue to fail).  Right?


Maybe the FAQ just needs an update to address my concern.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 12:03 PM, Jeff Squyres wrote:

On Feb 10, 2012, at 1:44 PM, Paul H. Hargrove wrote:


Since the situation described is one where the user didn't know they 
could/should disable xen, it is reasonable to think they ALSO don't know they 
need to exclude virbr0.

That's what I'm thinking.


So, I read the question as meaning the following:
 What happens when a user who doesn't know anything about virbr0 does
  mpirun --mca btl_tcp_if_exclude lo,eth8

I'm not sure I understand your question -- the above will exclude loopback and 
eth8.

(where did eth8 come from?)



Sorry, if I wasn't clear.
I'll try again:

+ User knows nothing about xen, and thus nothing about virbr0
+ User has a local-only interface (eth8 in my made up example)
+ User reads FAQ entry "220. How do I tell Open MPI which TCP networks 
to use?"
+ User follows instructions given in said FAQ, yielding my example 
command line.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 failures w/ icc on x86 (visibility?)

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 11:19 AM, Jeff Squyres wrote:

I'll go compare.


I already did...


HWLOC (1.3.2rc1) tries:
AC_LINK_IFELSE([AC_LANG_PROGRAM([[
__attribute__((visibility("default"))) int foo;
]],[[int i;]])],
[],
[hwloc_add=])

While OMPI (1.4.5rc5) tries:
AC_TRY_LINK([
#include 
__attribute__((visibility("default"))) int foo;
void bar(void) { fprintf(stderr, "bar\n"); };
],[],
[if test -s conftest.err ; then
$GREP -iq "visibility" conftest.err
if test "$?" = "0" ; then
ompi_cv_cc_fvisibility="no"
else
ompi_cv_cc_fvisibility="yes"
fi
 else
ompi_cv_cc_fvisibility="yes"
 fi],
    [ompi_cv_cc_fvisibility="no"])
])

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 11:00 AM, Jeff Squyres wrote:

Here's the final logic -- is it what you intended?


Yes, that works for me.
I pasted you version into config/hwloc.m4 on 1.3.2rc1 and faked the 
$hwloc_c_vendor setting.

The results were the same as with my version.
(Yes, I did autoreconf to make sure I was testing the right version.)

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 failures w/ icc on x86 (visibility?)

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 11:04 AM, Jeff Squyres wrote:

It's kinda weird that icc supported the visibility stuff but gcc did not...


See my post that crossed yours in flight.
The configure logic in ompi thinks icc does NOT support visibility on 
this platform.

I think ompi is a touch smarter than hwloc in this respect.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 failures w/ icc on x86 (visibility?)

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 9:27 AM, Paul H. Hargrove wrote:


I have versions 8.1.032, 9.0.024 and 9.1.042 of the Intel compilers on 
a Linux/x86 (32-bit) host.
All three can configure and build hwloc-1.3.2rc1, but all are failing 
"make check" in the same way.

What I see is ton(ne)s of linker messages and every executable SEGVs.

The linker messages look like:

  CC hwloc_synthetic.o
  CCLD   hwloc_synthetic
ld: hwloc_synthetic.o(.text+0x1c): unresolvable relocation against 
symbol `hwloc_topology_init'
ld: hwloc_synthetic.o(.text+0x2a): unresolvable relocation against 
symbol `hwloc_topology_set_synthetic'
ld: hwloc_synthetic.o(.text+0x33): unresolvable relocation against 
symbol `hwloc_topology_load'
ld: hwloc_synthetic.o(.text+0x3c): unresolvable relocation against 
symbol `hwloc_topology_check'
ld: hwloc_synthetic.o(.text+0x46): unresolvable relocation against 
symbol `hwloc_topology_get_depth'
ld: hwloc_synthetic.o(.text+0x64): unresolvable relocation against 
symbol `hwloc_get_nbobjs_by_depth'
ld: hwloc_synthetic.o(.text+0x8a): unresolvable relocation against 
symbol `hwloc_get_obj_by_depth'
ld: hwloc_synthetic.o(.text+0xc6): unresolvable relocation against 
symbol `hwloc_topology_destroy'

Where most tests have far more of these.

For the moment, I am going to assume the SEGVs are a result of the 
linker problems.


As compared to gcc on the same system, the only difference in 
include/private/autogen/config.h is:

 /* Whether C compiler supports symbol visibility or not */
-#define HWLOC_C_HAVE_VISIBILITY 1
+#define HWLOC_C_HAVE_VISIBILITY 0

Where the '1' is the build with the Intel compiler.
So, my current suspicion falls on the visibility crud.
I can confirm that "HWLOC_CFLAGS = -fvisibility=hidden" in Makefile.
Other then that, I don't know where to begin looking at this problem.

-Paul



For comparison, tried building the OMPI 1.4.5rc5 with these Intel compilers.
icc-9.1.042: caused assertion failure in ld - let not consider this one
icc-9.0.024: PASSed "make all install check clean"
icc-8.1.032: PASSed "make all install check clean"

So, I believe that the two PASS results shows that the correct 
visibility logic is "known" in ompi.
The key difference appears to be that ompi has decided NOT to use 
-fvisibility with these compilers:


== Symbol Visibility Feature

checking if icc supports -fvisibility... no
checking enable symbol visibility... no

And from the ompi-1.4.5rc5 config.log:

configure:164594: checking if icc supports -fvisibility
configure:164624: icc -o conftest -O3 -DNDEBUG -finline-functions 
-fno-strict-aliasing -restrict -pthread -fvisibility=hidden 
conftest.c -lnsl -lutil >&5

/tmp/iccFBKDBg.o: In function `bar':
conftest.c:(.text+0x26): undefined reference to `fputs'
ld: conftest: hidden symbol `fputs' isn't defined
ld: final link failed: Nonrepresentable section on output
configure:164631: $? = 1


As compared to hwloc-1.3.2rc1:

configure:8253: checking if icc supports -fvisibility
configure:8268: icc -o conftest  -fvisibility=hidden -Werror   
conftest.c >&5

configure:8268: $? = 0
configure:8279: result: yes


So, my educated guess is that one needs to (back)port the configure 
logic for visibility support.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?

2012-02-10 Thread Paul H. Hargrove



On 2/10/2012 10:38 AM, Jeff Squyres wrote:

On Feb 10, 2012, at 1:00 PM, TERRY DONTJE wrote:


>>  Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude?
>>  

>  What happens to that value if you then set btl_tcp_if_exclude to some value 
on the mpirun command line?

It works just fine.  I.e., if you

 mpirun --mca btl_tcp_if_exclude lo,virbr0 ...

That works like a champ.


Since the situation described is one where the user didn't know they 
could/should disable xen, it is reasonable to think they ALSO don't know 
they need to exclude virbr0.  So, I read the question as meaning the 
following:

 What happens when a user who doesn't know anything about virbr0 does
  mpirun --mca btl_tcp_if_exclude lo,eth8
And my guess is "nothing good happens".

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-09 Thread Paul H. Hargrove



On 2/9/2012 2:26 PM, Paul H. Hargrove wrote:

We then test if *either* set the variable.
Sort of a double-negative. 


One of De Morgan's Laws:
   NOT (A AND B) = (NOT A) OR (NOT B)

Applied to give:
   NOT (TEST1_FAIL AND TEST2_FAIL)
  = (NOT TEST1_FAIL) OR (NOT TEST2_FAIL)
  = TEST1_PASS OR TEST2_PASS

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun

2012-02-09 Thread Paul H. Hargrove



On 2/9/2012 1:19 PM, Brice Goglin wrote:

So you can find out that you are "bound" by a Linux cgroup (I am not
saying Linux "cpuset" to avoid confusion) by comparing root->cpuset and
root->online_cpuset.


If I understood the problem as stated earlier in this thread the current 
code was looping over a (singleton) cpuset and not finding finding the 
current process to be bound to any of the cpus in the set.  For that 
case the fact that the cpuset is a singleton should already have been 
enough information to know that one is effectively bound.  Is there 
really more to this than a need for special-casing the singleton?


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-09 Thread Paul H. Hargrove

Jeff,

What you have for the "Make sure..." is wrong in the same way as the one 
that was in rc1.
The problem is that the AC_COMPILE_IFELSE code tests too-few and 
too-many args together.
Since xlc makes too many an error by default, we don't notice its 
MISbehavior when given too few.
So, one needs to split the too-many and too-few tests as I did in the 
patch I sent.


I don't think we should drop that AC_COMPILE_IFELSE entirely (or rather 
we shouldn't drop the TWO once split).
If we were to encounter another Linux compiler that didn't STOP on 
too-few arguments the binding code would get silently broken again.


I was also partial to the "structure" of my patch which needed to test 
$hwloc_c_vendor only once.
This would allow adding compiler-specific logic in exactly one place if 
other cases arise.


I *do* like the way you've run the AC_COMPILE_IFELSE test AFTER adding 
the compiler-specific flag (thus confirming that it actually resolved 
the problem).  However, as noted above you will need to split the 
too-few and too-many arg tests for that to be effective.


And regarding the "older, buggy" comment:
This is a recent XLC compiler, and this behavior is NOT a bug because 
the C spec doesn't require a fatal error here.
That is why I commented (with  delimiters) on the evils of 
configure probes that try to determine how many arguments appear in a 
prototype.


-Paul

On 2/9/2012 5:08 AM, Jeff Squyres wrote:

How's this patch (against v1.3, assuming
https://svn.open-mpi.org/trac/hwloc/changeset/4285)?

Is the test that checks to see if compilers error when the wrong number of 
params are passed now mooot?

Index: config/hwloc.m4
===
--- config/hwloc.m4 (revision 4285)
+++ config/hwloc.m4 (working copy)
@@ -268,22 +268,24 @@
  AS_IF([test "$HWLOC_VISIBILITY_CFLAGS" != ""],
[AC_MSG_WARN(["$HWLOC_VISIBILITY_CFLAGS" has been added to the 
hwloc CFLAGS])])

-# make sure the compiler returns an error code when function arg count is 
wrong,
-# otherwise sched_setaffinity checks may fail
+# Make sure the compiler returns an error code when function arg
+# count is wrong, otherwise sched_setaffinity checks may fail.
+# For older, buggy versions of the xlc compilers, we need to set
+# an additional compiler flag to catch these situations.
+AS_IF([test "$hwloc_c_vendor" = "ibm"],
+  [HWLOC_CFLAGS_save=$CFLAGS
+   CFLAGS="$CFLAGS -qhalt=e"])
  AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
  extern int one_arg(int x);
  extern int two_arg(int x, int y);
  int foo(void) { return one_arg(1, 2) + two_arg(3); }
  ]])], [
  AC_MSG_WARN([Your C compiler does not consider incorrect argument 
counts to be a fatal error.])
-if test "$hwloc_check_compiler_vendor_result" = "ibm"; then
-AC_MSG_WARN([For XLC you may try appending '-qhalt=-e' to the 
value of CFLAGS.])
-AC_MSG_WARN([Alternatively you may configure with a different 
compiler.])
-else
-AC_MSG_WARN([Please report this failure, and configure using a 
different C compiler if possible.])
-fi
  AC_MSG_ERROR([Cannot continue.])
  ])
+# Restore the CFLAGS if we modified them above
+AS_IF([test "$hwloc_c_vendor" = "ibm"],
+  [CFLAGS=HWLOC_CFLAGS])

  #
  # Now detect support
@@ -387,6 +389,12 @@
AC_DEFINE_UNQUOTED(hwloc_thread_t, $hwloc_thread_t, [Define this to the 
thread ID type])
  fi

+# For older, buggy versions of the xlc compilers, we need to set
+# an additional compiler flag to catch cases where the wrong
+# number of parameters are passed.
+AS_IF([test "$hwloc_c_vendor" = "ibm"],
+  [HWLOC_CFLAGS_save=$CFLAGS
+   CFLAGS="$CFLAGS -qhalt=e"])
  _HWLOC_CHECK_DECL([sched_setaffinity], [
AC_DEFINE([HWLOC_HAVE_SCHED_SETAFFINITY], [1], [Define to 1 if glibc 
provides a prototype of sched_setaffinity()])
AC_MSG_CHECKING([for old prototype of sched_setaffinity])
@@ -403,6 +411,9 @@
  #define _GNU_SOURCE
  #include
  ]])
+# Restore the CFLAGS if we modified them above
+AS_IF([test "$hwloc_c_vendor" = "ibm"],
+      [CFLAGS=HWLOC_CFLAGS])

  AC_MSG_CHECKING([for working CPU_SET])
  AC_LINK_IFELSE([




On Feb 8, 2012, at 7:47 PM, Paul H. Hargrove wrote:



On 2/8/2012 4:41 PM, Paul H. Hargrove wrote:

I do agree w/ Samuel that the BEST solution is to apply "-qhalt=e" ONLY to the test(s) 
where one expects the compiler to through errors (rather than warnings) for function calls with 
argument counts which don't match the prototypes.  At the moment, I am 90% certain that the 
"old sched_setaffinity()" probe is the only one fit

Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-09 Thread Paul H. Hargrove



On 2/9/2012 4:48 AM, Jeff Squyres wrote:

On Feb 8, 2012, at 6:02 PM, Paul H. Hargrove wrote:


The file config/hwloc_check_vendor.m4 that is present in trunk, is ABSENT in 
the 1.3.2rc1 tarball.
There is, correspondingly, no call to _HWLOC_C_COMPILER_VENDOR in hwloc.m4.

Correct -- we hadn't used $hwloc_c_vendor anywhere else in the 1.3 configury.


Are you sure?  It looks like my grep turned up several reads, mostly 
related to the visibility CFLAGS.

Perhaps that was just dead code from a backport?
Anyway, it looks suspicious to me.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 4:41 PM, Paul H. Hargrove wrote:


I do agree w/ Samuel that the BEST solution is to apply "-qhalt=e" 
ONLY to the test(s) where one expects the compiler to through errors 
(rather than warnings) for function calls with argument counts which 
don't match the prototypes.  At the moment, I am 90% certain that the 
"old sched_setaffinity()" probe is the only one fitting that description. 


I am hoping to be able contribute  patch for this soon.
-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 4:43 PM, Samuel Thibault wrote:

Paul H. Hargrove, le Thu 09 Feb 2012 01:41:47 +0100, a écrit :

On 2/8/2012 4:31 PM, Samuel Thibault wrote:

Paul H. Hargrove, le Thu 09 Feb 2012 01:28:53 +0100, a écrit :

Option #4:
CFLAGS='-qhalt=e -qsuppress=1506-077'
Appears to work for me for xlc-8.0 and xlc-9.0.

That still looks dangerous to me: we don't know whatever warning
might be added in the future. I'd rather add -qhalt=e only for the
sched_setaffinity test.

I don't recommend adding -qsuppress automatically, just documenting it for
users that need xlc-8 or xlc-9.

I'm not actually talking about the -qsuppress, but about still using
-qhalt=e, which might make a lot more other warnings fatal.



Right.
I realized that about 10 seconds after hitting SEND and was composing a 
retraction when the post above arrived.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 4:31 PM, Samuel Thibault wrote:

Paul H. Hargrove, le Thu 09 Feb 2012 01:28:53 +0100, a écrit :

Option #4:
CFLAGS='-qhalt=e -qsuppress=1506-077'
Appears to work for me for xlc-8.0 and xlc-9.0.

That still looks dangerous to me: we don't know whatever warning
might be added in the future. I'd rather add -qhalt=e only for the
sched_setaffinity test.


I don't recommend adding -qsuppress automatically, just documenting it 
for users that need xlc-8 or xlc-9.
If nothing else, this "work-around" is now in the hwloc-devel archives 
for the search engines.

Sorry that I wasn't clear on what I meant to do with those CFLAGS.

Regarding "we don't know whatever warning might be added in the future.":
"1506-077" is the number for this particular warning about invalid 
wchar_t constants.

So this suppresses ONLY the one message and should be pretty safe.
Based on looking at the constants, this message is being issued 
ERRONEOUSLY by these compilers.


I do agree w/ Samuel that the BEST solution is to apply "-qhalt=e" ONLY 
to the test(s) where one expects the compiler to through errors (rather 
than warnings) for function calls with argument counts which don't match 
the prototypes.  At the moment, I am 90% certain that the "old 
sched_setaffinity()" probe is the only one fitting that description.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 8:58 AM, Jeff Squyres wrote:

Please test!

 http://www.open-mpi.org/software/hwloc/v1.3/


I have access to BG/L, BG/P, Cray-XT and Cray-XE systems.
Are there any tests I could/should consider running on those?

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 1:44 PM, Brice Goglin wrote:


Ah, we need to use $hwloc_c_vendor instead. That's where's
$hwloc_check_compiler_vendor_result ends up before being cleared.



It looks like something is very wrong here:
Examining the 1.3.2rc1 tarball I seem to see $hwloc_c_vendor is read but 
NOT written!

$ grep hwloc_c_vendor configure
case "$hwloc_c_vendor" in
case "$hwloc_c_vendor" in
case "$hwloc_c_vendor" in
case "$hwloc_c_vendor" in
case "$hwloc_c_vendor" in


If this is really the case, then I can imagine visibility and other 
things going quite wrong with various compilers.


The file config/hwloc_check_vendor.m4 that is present in trunk, is 
ABSENT in the 1.3.2rc1 tarball.

There is, correspondingly, no call to _HWLOC_C_COMPILER_VENDOR in hwloc.m4.
Am I correct here, or have I missed something?

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 1:10 PM, Paul H. Hargrove wrote:



On 2/8/2012 8:58 AM, Jeff Squyres wrote:

* Detect when a compiler such as xlc may not report compile errors
   properly, causing some configure checks to be wrong. Thanks to
   Paul H. Hargrove for reporting the problem and providing a patch.


Looks like I botched this one!

I have added two Linux/ppc64 machines the xlc-7.0, xlc-8.0 and xlc-9.0 
to my testing.
These are NOT running on the odd virtual node that caused assertion 
failures when testing xlc-11.1. 


ARGH!!!

I've applied the patches I included, and tested on the xlc-11.1 system 
where auto tools are new enough.

Everything looked fine.

Now I've had a chance to retest earlier xlc (8 and 9, which are on 2 
different machines), with the explict CFLAGS=-qhalt=e.

The result was NOT good.

It seems that xlc dislikes some wchar constants (see below).
In a build w/ default CFLAGS they produce an "(E)" level message, but 
compilation continues to completion.

With the recommended CFLAGS=-qhalt=e these become fatal:

  CC lstopo-lstopo-text.o
"/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0/hwloc-1.3.2rc1/include/hwloc.h", 
line 1203.34: 1506-1385 (W) The attribute "pure" is not a valid type 
attribute.
"/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0//hwloc-1.3.2rc1/utils/lstopo-text.c", 
line 450.12: 1506-077 (E) The wchar_t value 0x250c is not valid.
"/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0//hwloc-1.3.2rc1/utils/lstopo-text.c", 
line 451.12: 1506-077 (E) The wchar_t value 0x2510 is not valid.
"/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0//hwloc-1.3.2rc1/utils/lstopo-text.c", 
line 452.12: 1506-077 (E) The wchar_t value 0x2514 is not valid.

[followed by another error for each case in the switch statement].

So, now I am not sure what to recommend.  Options include:
+ Don't worry about old xlc (which OMPI doesn't support since they can't 
build the opal atomics).

+ Rig things to use -qhalt=e ONLY for configure, but not for make?
+ Punt on 1.3 and revisit later

By the way:
xlc-11.1 on Linux doesn't make these complaints on lstopo-lstopo-text.
Nor does xlc-6.0 on MacOS-10.3 (honest, I am not making this up).
[And, YES, both platforms define HAVE_PUTWC]


-Paul


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 1:37 PM, Brice Goglin wrote:

Let's ignore this for 1.3.2. libnuma sucks, we're wasting way too much
time trying to make it sane. I'll look later if I find an easy way to
reproduce.


OK, fine by me.
I've verified that if I "disarm" that test, then the remaining tests PASS.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 8:58 AM, Jeff Squyres wrote:

* Fix conversion from/to Linux libnuma when some NUMA nodes have no memory.


Tests on the virtual node I have access to where that problem report 
originated is still not quite right.

There is now a different assertion failing than I had seen before:
lt-linux-libnuma: 
/users/phh1/OMPI/hwloc-1.3.2rc1-linux-ppc64-gcc//hwloc-1.3.2rc1/tests/linux-libnuma.c:83: 
main: Assertion `!memcmp(, _all_nodes, 
sizeof(nodemask_t))' failed.

/bin/sh: line 5: 19416 Aborted ${dir}$tst
FAIL: linux-libnuma


I don't have any clue if that represents forward or backward progress.
Maybe the sanity check is just different between 1.3 and trunk?
So, I figured I had better report it.

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 8:58 AM, Jeff Squyres wrote:

* Detect when a compiler such as xlc may not report compile errors
   properly, causing some configure checks to be wrong. Thanks to
   Paul H. Hargrove for reporting the problem and providing a patch.


Looks like I botched this one!

I have added two Linux/ppc64 machines the xlc-7.0, xlc-8.0 and xlc-9.0 
to my testing.
These are NOT running on the odd virtual node that caused assertion 
failures when testing xlc-11.1.


With these new xlc versions AND the original xlc-11.1 compiler (4 
compilers on 3 different nodes) I am STILL seeing the following 
INCORRECT result:

checking for old prototype of sched_setaffinity... yes

Where gcc on the same machines correctly gives a "no" result.

Looking in config.log, I do NOT see the -qhalt=E that was discussed as 
the solution to this problem:

configure:9065: checking for old prototype of sched_setaffinity
configure:9083: xlc -c  conftest.c >&5


And, of course, I didn't see the fatal error that should have occurred 
at configure time.

So, I poked around some more in config.log and found:

configure:8338: xlc -c -q32  conftest.c >&5
"conftest.c", line 62.43: 1506-099 (S) Unexpected argument.
"conftest.c", line 62.55: 1506-098 (E) Missing argument(s).


So, what this means is that the probe I wrote for "xlc needs -qhalt=E" 
is WRONG.


The following tests too many and too few as distinct cases, and appears 
to resolve the problem for me:

--- config/hwloc.m4~2012-02-08 20:55:03.188903698 +
+++ config/hwloc.m4 2012-02-08 20:57:29.987668761 +
@@ -269,11 +269,16 @@

 # make sure the compiler returns an error code when function arg 
count is wrong,

 # otherwise sched_setaffinity checks may fail
+hwloc_args_check_ok=yes
 AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
 extern int one_arg(int x);
+int foo(void) { return one_arg(1, 2); }
+]])], [ hwloc_args_check_ok=no ])
+AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[
 extern int two_arg(int x, int y);
-int foo(void) { return one_arg(1, 2) + two_arg(3); }
-]])], [
+int foo(void) { return two_arg(3); }
+]])], [ hwloc_args_check_ok=no ])
+AS_IF([test "$hwloc_args_check_ok" != "yes"],[
 AC_MSG_WARN([Your C compiler does not consider incorrect 
argument counts to be a fatal error.])

 if test "$hwloc_check_compiler_vendor_result" = "ibm"; then
 AC_MSG_WARN([For XLC you may try appending '-qhalt=-e' to 
the value of CFLAGS.])


With that change in place, configure stops as desired:
configure: WARNING: Your C compiler does not consider incorrect 
argument counts to be a fatal error.
configure: WARNING: Please report this failure, and configure using a 
different C compiler if possible.

configure: error: Cannot continue.


EXCEPT, that I am not seeing the "set CFLAGS..." message.
Is it possible that this check is running before 
hwloc_check_compiler_vendor_result has been set?


ALSO, the text of the (missing) message is incorrect:
284c284
< AC_MSG_WARN([For XLC you may try appending '-qhalt=-e' to 
the value of CFLAGS.])

---
> AC_MSG_WARN([For XLC you may try appending '-qhalt=e' to 
the value of CFLAGS.])


That is probably my fault, too.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.4.5rc5 has been released

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 11:14 AM, Paul H. Hargrove wrote:



On 2/8/2012 3:25 AM, TERRY DONTJE wrote:
+ Building w/ Solaris Studio 12.2 or 12.3 on Linux x86-64, with 
"-m32" required setting LD_LIBRARY_PATH.
Can the LD_LIBRARY_PATH be substituted with a rpath change in LDFLAGS 
of the build?


Terry sent more specific instructions for that offlist, and I am 
testing now.




I can confirm that both Solaris Studio 12.2 and 12.3 work with 
{C,CXX,F,FC}FLAGS=-m22 with the addition of LDFLAGS="-L/lib32 -R/lib32" 
on the configure line, as suggested by Terry.


I went to try 12 and 12.1 for good measure, but found that their C++ 
compilers choke on /usr/include/stdlib.h.
So, since the original error was a c++ one, I didn't pursue them any 
further.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] 1.3.2rc1 has escaped

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 9:18 AM, Samuel Thibault wrote:

Jeff Squyres, le Wed 08 Feb 2012 17:59:04 +0100, a écrit :

Please test!

 http://www.open-mpi.org/software/hwloc/v1.3/

Could somebody test it on AIX, and with xlc?

Thanks,
Samuel


No AIX, but I will hit xlc on Linux again today.
Do we care about xlc on MacOS 10.3?

-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [OMPI devel] 1.4.5rc5 has been released

2012-02-08 Thread Paul H. Hargrove



On 2/8/2012 3:25 AM, TERRY DONTJE wrote:
+ Building w/ Solaris Studio 12.2 or 12.3 on Linux x86-64, with 
"-m32" required setting LD_LIBRARY_PATH.
Can the LD_LIBRARY_PATH be substituted with a rpath change in LDFLAGS 
of the build?


Terry sent more specific instructions for that offlist, and I am testing 
now.


This is could either be Oracle's bug in the compiler, or a libtool 
problem.
My report was: 
http://www.open-mpi.org/community/lists/devel/2012/01/10272.php



I thought I responded to the above issue.


You did respond, but I didn't see any "resolution".  I apologize if I 
missed something in the past emails.


  I think this may be a OS distribution (Solaris Studio assumption) 
issue.  On my RH system /lib contains the 32 libraries and /lib64 has 
the 64 bit libs.  I assume your system may have it the other way 
around (/lib = 64 bit libs and /lib32 has 32 bit).  Can you confirm 
that your /lib contains 64 bit libs.  Also can you do a "cc -### -m32" 
compile and link of a simple program and confirm that the compiler is 
pulling in /lib (I am 99% certain it is). 


YES to "/lib = 64 bit libs and /lib32 has 32 bit".  There is also a 
/lib64->/lib symlink.


Here is the requested verbose output:
$ cc -### -m32 hello.c
### cc: Note: NLSPATH = 
/opt/SS12u3/solarisstudio12.3/prod/bin/../lib/locale/%L/LC_MESSAGES/%N.cat:/opt/SS12u3/solarisstudio12.3/prod/bin/../../lib/locale/%L/LC_MESSAGES/%N.cat

### command line files and options (expanded):
### -# -m32 hello.c
/opt/SS12u3/solarisstudio12.3/prod/bin/acomp -Qy -Xa -xc99=%all -i 
hello.c -D__SUNPRO_C=0x5120 -D__unix -D__unix__ -Dlinux -D__linux 
-D__linux__ -D__gnu__linux__ "-D__builtin_expect(e,x)=e" -D__i386 
-D__i386__ -D__BUILTIN_VA_ARG_INCR -D__C99FEATURES__ 
-D__PRAGMA_REDEFINE_EXTNAME -Dunix -Di386 -D__RESTRICT 
-D__FLT_EVAL_METHOD__=-1 -D__SUN_PREFETCH -D__NOVECTORSIZE__ -I-xbuiltin 
-I/opt/SS12u3/solarisstudio12.3/prod/lib/compilers/rtlibs/usr/include 
-I/opt/SS12u3/solarisstudio12.3/prod/include/cc -xbuiltin=%none 
-fsimple=0 -m32 -fparam_ir -xF=%none -xdbggen=no%stabs+dwarf2+usedonly 
-xdbggen=incl -xldscope=global -xivdep=loop -c99OS 
"-g/opt/SS12u3/solarisstudio12.3/prod/bin/cc -m32 " -destination_ir=yabe 
-y-fbe -y/opt/SS12u3/solarisstudio12.3/prod/bin/fbe -y-verbose -y-comdat 
-y-xarch=generic -y-comdat -y-xthreadvar=no%dynamic -y-xannotate=no -y-o 
-yhello.o -y-s

### cc: Note: LD_LIBRARY_PATH = (null)
### cc: Note: LD_RUN_PATH = (null)
### cc: Note: LD_OPTIONS  = (null)
/usr/bin/ld -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 
--enable-new-dtags /opt/SS12u3/solarisstudio12.3/prod/lib/crti.o 
/opt/SS12u3/solarisstudio12.3/prod/lib/crt1.o 
/opt/SS12u3/solarisstudio12.3/prod/lib/values-xa.o hello.o -o a.out -Y 
"/opt/SS12u3/solarisstudio12.3/prod/lib:/lib32:/usr/lib32" -Qy -lc 
/opt/SS12u3/solarisstudio12.3/prod/lib/libc_supp.a 
/opt/SS12u3/solarisstudio12.3/prod/lib/crtn.o

rm hello.o

HOWEVER, in the failing build there was the following bit of output 
showing that the system linker is NOT being used:
CC: Warning: failed to detect system linker version, falling back to 
custom linker usage



Also, is this /lib is 64 bit libraries a common thing, none of my 
Linux systems are set up this way.
This appears to be the default on Ubuntu (checked 3 hosts with 2 
different releases).


-Paul


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



  1   2   3   4   >