[OMPI devel] How do you generate your FAQ pages?
Sorry for the somewhat off-topic question: What tool(s) are you using to generate web pages for your wonderfully organized FAQ? -Paul -- Paul H. Hargrove Pronouns: he, him, his Computer Languages & Systems Software (CLaSS) Group Computer Science Department Lawrence Berkeley National Laboratory
Re: [hwloc-devel] [PATCH] Use plain "inline" in C++
FWIW: GASNet makes the assumption that every C++ compiler groks "inline" and has never encountered any counter-examples. -Paul On 5/9/2012 8:54 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 10/05/12 07:40, Jeff Squyres wrote: Huh -- really? I always thought that the C++ language itself included the keyword "inline". I asked via Twitter and got these responses.. # Inline was part of C++98 - the first c++ standard, and # the inline kwd is in the cfront 1.0 ('86) source. So # functionally, yes. ...and... # This may be a different question than "have all C++ # compilers always accepted inline?" I note that autoconf has an inline test for C: http://www.gnu.org/software/autoconf/manual/autoconf-2.67/html_node/C-Compiler.html But not for C++: http://www.gnu.org/savannah-checkouts/gnu/autoconf/manual/autoconf-2.69/html_node/C_002b_002b-Compiler.html So perhaps the fact that they've never needed to implement such a test is in itself a good guide ? cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk+rPAoACgkQO2KABBYQAh+fqwCfbsCOjeK5y+WEZnWQ1e+pQmQg DhQAoJdN6S7IJpUZ51IlXbE0QJOI1jjI =dWPv -END PGP SIGNATURE- ___ hwloc-devel mailing list hwloc-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] lstopo-nox strikes back
On 4/27/2012 10:39 AM, Brice Goglin wrote: Le 27/04/2012 19:22, Samuel Thibault a écrit : Brice Goglin, le Fri 27 Apr 2012 19:09:47 +0200, a écrit : Le 25/04/2012 15:42, Jiri Hladky a écrit : I would vote to make lstopo ASCII only and introduce new GUI binary "lstopo-gui" in the version 1.5 I'll commit that during the weekend unless somebody comes with a better solution. Of course, distros are free to add symlinks as Xlstopo then :) Xfoo is kinda reserved for X servers, not for X applications :) Ok let's put a X server inside hwloc then. No, Xlstopo should be for showing me the logical->physical layout of screens on a multi-headed X server, right? -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] BGQ empty topology with MPI
From the same machine that Dan is using: {hargrove@cetuslac1 ~}$ mpicc -v mpicc for MPICH2 version 1.4.1p1 [...hairy details omitted...] gcc version 4.4.6 (BGQ-dev-120305) -Paul On 3/22/2012 7:43 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 22/03/12 20:58, Brice Goglin wrote: So there's something strange going on when MPI is added. Which MPI are using? Is this a derivative of MPICH that embeds hwloc? (MPICH = 1.2.1 if I remember correctly) Not sure about BG/Q, but BG/P uses code derived from MPICH2 according to: http://wiki.bg.anl-external.org/index.php/Main_Page Our BG/P seems to claim it's from MPICH2 1.1: samuel@tambo:~> mpicc -v mpicc for 1.1 cheers, Chris - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9r42cACgkQO2KABBYQAh9mbwCeOYrI5bsk/XOiXFl128BksV2D SR4An1bs09e2lpyYadABbaRIG2dtg7Fr =ucpF -END PGP SIGNATURE- ___ hwloc-devel mailing list hwloc-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] Open MPI nightly tarballs suspended / 1.5.5rc3
On 2/28/2012 5:09 PM, Christopher Samuel wrote: On 29/02/12 07:44, Jeffrey Squyres wrote: > - BlueGene fixes rc3 fixes the builds on our front end node, thanks! And on a BG/L (not a typo) front-end too, where the same problem existed in prior versions. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] typo in a copyright message
By chance I noticed the following in the trunk: Index: ompi-trunk/orte/mca/rml/oob/rml_oob_component.c === --- ompi-trunk/orte/mca/rml/oob/rml_oob_component.c (revision 26069) +++ ompi-trunk/orte/mca/rml/oob/rml_oob_component.c (working copy) @@ -11,7 +11,7 @@ * Copyright (c) 2004-2005 The Regents of the University of California. * All rights reserved. * Copyright (c) 2007 Cisco Systems, Inc. All rights reserved. - * Copyright (c) 2011 Los Alamos Nation Security, LLC. + * Copyright (c) 2011 Los Alamos National Security, LLC. * All rights reserved. * $COPYRIGHT$ * -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] Open MPI nightly tarballs suspended / 1.5.5rc3
Testing 1.5.5rc3 on a "representative sampling" of my many platforms looks good. In particular, I've retested various platforms that showed any significant problems previously and found them to be fixed. Though minor, I do see that the following patches I've posted are not applied + Add a Mellanox PCI vendor ID to the device params file http://www.open-mpi.org/community/lists/devel/2012/02/10615.php Posted 13 hours ago and not yet on trunk + Fix show_help_lex.l to avoid undefined behavior (and silence associated warning from flex) http://www.open-mpi.org/community/lists/devel/2012/02/10521.php Was applied to trunk as r25983 + Reorder includes to avoid "'struct in_addr' declared inside parameter list" warnings http://www.open-mpi.org/community/lists/devel/2012/02/10484.php Was applied to trunk as r25984 Sorry if I've messed an exiting CMR for those last two. No big deal if these are held back for v1.6, but figured I mention them in case their exclusion was unintended. I am assuming there is no interest in the MIPS atomics fixes, or the PPC64 atomics work-around for an XLC bug. MIPS 1of2: http://www.open-mpi.org/community/lists/devel/2012/02/10416.php MIPS 2of2: http://www.open-mpi.org/community/lists/devel/2012/02/10417.php PPC64/XLC: http://www.open-mpi.org/community/lists/devel/2012/02/10603.php If there *is* interest in these, let me know if there is any assistance I can lend. -Paul On 2/28/2012 12:44 PM, Jeffrey Squyres wrote: There is a serious chilled water issue at IU right now; all non-essential servers (including Open MPI's nightly build server) have been turned off. So we have no new "official" 1.5.5 RC, and no new nightlies will be produced until further notice. However, to keep the 1.5.5 release train going, I've made an "unofficial" 1.5.5rc3 and posted it in the usual location: http://www.open-mpi.org/software/ompi/v1.5/ Note that since there are no nightly tarballs, this rc will be farther along than the latest 1.5 nightly until the nightlies are resumed. Changes since 1.5.5rc2: - Removed the ofud BTL - Updates to README and some copyright notices - Fix the lt_dladvise search that caused VPATH weirdness - Removed the pcie mpool - Bring in some upstream hwloc v1.3 fixes - VT updates: - non-GNU compiler _FORTIFY_SOURCE fixes - VT-specific CXXFLAGS - BlueGene fixes - Fix processor affinity for some old/weird platforms -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] 1.5.5rc2 missing a Mellanox PCI vendor ID
Testing 1.5.5rc2, I see warnings about an unknown IB HCA unless I make the following simple addition: --- ompi-v1.5/ompi/mca/btl/openib/mca-btl-openib-device-params.ini (revision 26056) +++ ompi-v1.5/ompi/mca/btl/openib/mca-btl-openib-device-params.ini (working copy) @@ -127,7 +127,7 @@ [Mellanox Tavor Infinihost] -vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba +vendor_id = 0x2c9,0x5ad,0x66a,0x8f1,0x1708,0x03ba,0x15b3 vendor_part_id = 23108 use_eager_rdma = 1 mtu = 1024 This one-line patch applies equally to v1.5 and to the trunk. I suspect that this vendor ID should be added to the Arbel and Sinai HCA entries as well. It is already listed for Hermon. -Paul On 2/23/2012 5:17 AM, Jeffrey Squyres wrote: We finally have 1.5.5rc2: http://www.open-mpi.org/software/ompi/v1.5/ Given the amount of testing we've had, this rc might actually be pretty close. Lots and lots of changes since rc1; I'm not even going to bother to list them all. Please test! -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5.5rc2 atomics fail w/ xlc-9
OK, this is NOT an issue for v1.5.5, IMHO. I was mistaken about the ppc atomics having an error that could impact builds with gcc. The problems I've seen with xlc-9.0 turn out to be just a plain xlc bug. When the asm takes as an argument the address of a signed 32-bit int, the compiler is incorrectly sign-extending the address (probably under the mistaken belief that it is manipulating the pointed-to type). For the ILP32 ABI that is not a problem. For the LP64 ABI, pointers get trashed by this incorrect operation. The attached patch works-around this bug by conditionally inserting a cast, and I believe it should apply cleanly to both v1.5 branch (for v1.6) and to the trunk. -Paul On 2/24/2012 5:46 PM, Paul H. Hargrove wrote: Hmm, I was certain I knew what was wrong, but the tests still fail. Nobody should hold their breath waiting for my patches, but I am still investigating. *IF* I can determine that I am right about the asm allowing gcc to generate bad code then I think this is important for 1.5.5. Otherwise, I think this is a 1.6 issue. -Paul On 2/24/2012 5:19 PM, Paul H. Hargrove wrote: I see now why I get "check" failures from the opal atomics w/ XLC-9.0. The inline asm is mildly incorrect and I am actually surprised gcc didn't produce bad code. Patch(es) will be sent ASAP as I think this should be fixed for 1.5.5. -Paul On 2/23/2012 8:24 PM, Paul H. Hargrove wrote: This is consistent with my findings w/ XLC (mostly on BG/L and BG/P front end nodes). None of the 7.0, 8.0, 9.0 or 11.1 versions of XLC I tested could generate correct atomics. They either failed at build time, or failed the tests in test/asm/. -Paul On 2/23/2012 8:17 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/02/12 15:12, Christopher Samuel wrote: I suspect this is irrelevant, but I got a build failure trying to compile it on our BG/P front end node (login node) with the IBM XL compilers. Oops, forgot how I built it.. export PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH CC=xlc CXX=xlC F77=xlf ./configure&& make - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD rdcAni+dfEMhlqMzYMILn8jeS9yWlInu =+rA4 -END PGP SIGNATURE- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 --- openmpi-1.5.5rc3r26035/opal/include/opal/sys/powerpc/atomic.h~ 2012-02-25 01:15:24.550922758 + +++ openmpi-1.5.5rc3r26035/opal/include/opal/sys/powerpc/atomic.h 2012-02-25 02:37:39.229857857 + @@ -117,6 +117,14 @@ */ #if OMPI_GCC_INLINE_ASSEMBLY +#ifdef __xlC__ +/* work-around bizzare xlc bug in which it sign-extends + a pointer to a 32-bit signed integer */ +#define OPAL_ASM_ADDR(a) ((uintptr_t)a) +#else +#define OPAL_ASM_ADDR(a) (a) +#endif + static inline int opal_atomic_cmpset_32(volatile int32_t *addr, int32_t oldval, int32_t newval) { @@ -130,7 +138,7 @@ " bne-1b \n\t" "2:" : "=" (ret), "=m" (*addr) - : "r" (addr), "r" (oldval), "r" (newval), "m" (*addr) + : "r" OPAL_ASM_ADDR(addr), "r" (oldval), "r" (newval), "m" (*addr) : "cc", "memory"); return (ret == oldval); @@ -249,7 +257,7 @@ "subfic r9,r5,0\n\t" "adde %0,r9,r5 \n\t" : "=" (ret) - : "r"(addr), + : "r"OPAL_ASM_ADDR(addr), "m"(oldval), "m"(newval) : "r4", "r5", "r9", "cc", "memory"); @@ -297,7 +305,7 @@ " stwcx. %0, 0, %3\n\t" " bne-1b \n\t" : "=" (t), "=m" (*v) -: &quo
Re: [OMPI devel] 1.5.5rc2 atomics fail w/ xlc-9
Hmm, I was certain I knew what was wrong, but the tests still fail. Nobody should hold their breath waiting for my patches, but I am still investigating. *IF* I can determine that I am right about the asm allowing gcc to generate bad code then I think this is important for 1.5.5. Otherwise, I think this is a 1.6 issue. -Paul On 2/24/2012 5:19 PM, Paul H. Hargrove wrote: I see now why I get "check" failures from the opal atomics w/ XLC-9.0. The inline asm is mildly incorrect and I am actually surprised gcc didn't produce bad code. Patch(es) will be sent ASAP as I think this should be fixed for 1.5.5. -Paul On 2/23/2012 8:24 PM, Paul H. Hargrove wrote: This is consistent with my findings w/ XLC (mostly on BG/L and BG/P front end nodes). None of the 7.0, 8.0, 9.0 or 11.1 versions of XLC I tested could generate correct atomics. They either failed at build time, or failed the tests in test/asm/. -Paul On 2/23/2012 8:17 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/02/12 15:12, Christopher Samuel wrote: I suspect this is irrelevant, but I got a build failure trying to compile it on our BG/P front end node (login node) with the IBM XL compilers. Oops, forgot how I built it.. export PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH CC=xlc CXX=xlC F77=xlf ./configure&& make - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD rdcAni+dfEMhlqMzYMILn8jeS9yWlInu =+rA4 -END PGP SIGNATURE- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] 1.5.5rc2 atomics fail w/ xlc-9
I see now why I get "check" failures from the opal atomics w/ XLC-9.0. The inline asm is mildly incorrect and I am actually surprised gcc didn't produce bad code. Patch(es) will be sent ASAP as I think this should be fixed for 1.5.5. -Paul On 2/23/2012 8:24 PM, Paul H. Hargrove wrote: This is consistent with my findings w/ XLC (mostly on BG/L and BG/P front end nodes). None of the 7.0, 8.0, 9.0 or 11.1 versions of XLC I tested could generate correct atomics. They either failed at build time, or failed the tests in test/asm/. -Paul On 2/23/2012 8:17 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/02/12 15:12, Christopher Samuel wrote: I suspect this is irrelevant, but I got a build failure trying to compile it on our BG/P front end node (login node) with the IBM XL compilers. Oops, forgot how I built it.. export PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH CC=xlc CXX=xlC F77=xlf ./configure&& make - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD rdcAni+dfEMhlqMzYMILn8jeS9yWlInu =+rA4 -END PGP SIGNATURE- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5.5rc2
Sorry, I just realized there was fair amount of context missing from my previous post: The fix that Mattias committed as r26042 on the trunk is intended to correct the improper auto-detection of BG/P (or /L) when one is building for the front-end. My suggested --with-platform=linux is a WORK-AROUND to allow testing w/o waiting for the CMR to be processed. -Paul On 2/24/2012 1:14 PM, Paul H. Hargrove wrote: Christopher, Just wanted to note that when you build like this on the BG/P front end, VT is detecting the BG/P environment and so trying to build for the BG/P compute node, meanwhile OMPI is building for the front-end node. (Somebody correct me if I've misunderstood). So, you may want to configure with --with-contrib-vt-flags="--with-platform=linux" to test a VT build for the Linux front-end. -Paul On 2/23/2012 8:17 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/02/12 15:12, Christopher Samuel wrote: I suspect this is irrelevant, but I got a build failure trying to compile it on our BG/P front end node (login node) with the IBM XL compilers. Oops, forgot how I built it.. export PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH CC=xlc CXX=xlC F77=xlf ./configure&& make - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD rdcAni+dfEMhlqMzYMILn8jeS9yWlInu =+rA4 -END PGP SIGNATURE- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5.5rc2
Christopher, Just wanted to note that when you build like this on the BG/P front end, VT is detecting the BG/P environment and so trying to build for the BG/P compute node, meanwhile OMPI is building for the front-end node. (Somebody correct me if I've misunderstood). So, you may want to configure with --with-contrib-vt-flags="--with-platform=linux" to test a VT build for the Linux front-end. -Paul On 2/23/2012 8:17 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/02/12 15:12, Christopher Samuel wrote: I suspect this is irrelevant, but I got a build failure trying to compile it on our BG/P front end node (login node) with the IBM XL compilers. Oops, forgot how I built it.. export PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH CC=xlc CXX=xlC F77=xlf ./configure&& make - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD rdcAni+dfEMhlqMzYMILn8jeS9yWlInu =+rA4 -END PGP SIGNATURE- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5.5rc2
This is consistent with my findings w/ XLC (mostly on BG/L and BG/P front end nodes). None of the 7.0, 8.0, 9.0 or 11.1 versions of XLC I tested could generate correct atomics. They either failed at build time, or failed the tests in test/asm/. -Paul On 2/23/2012 8:17 PM, Christopher Samuel wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 24/02/12 15:12, Christopher Samuel wrote: I suspect this is irrelevant, but I got a build failure trying to compile it on our BG/P front end node (login node) with the IBM XL compilers. Oops, forgot how I built it.. export PATH=/opt/ibmcmp/vac/bg/9.0/bin/:/opt/ibmcmp/vacpp/bg/9.0/bin:/opt/ibmcmp/xlf/bg/11.1/bin:$PATH CC=xlc CXX=xlC F77=xlf ./configure&& make - -- Christopher Samuel - Senior Systems Administrator VLSCI - Victorian Life Sciences Computation Initiative Email: sam...@unimelb.edu.au Phone: +61 (0)3 903 55545 http://www.vlsci.unimelb.edu.au/ -BEGIN PGP SIGNATURE- Version: GnuPG v1.4.11 (GNU/Linux) Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/ iEYEARECAAYFAk9HD1wACgkQO2KABBYQAh9EZgCcCz9x2i6KuE7/UpPzr194jHQD rdcAni+dfEMhlqMzYMILn8jeS9yWlInu =+rA4 -END PGP SIGNATURE- ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux
Sorry folks. That was intended just for Jeff's eyes, but my fingers moved faster than my brain. No offense was intended. -Paul On 2/23/2012 10:01 AM, Paul H. Hargrove wrote: I think the VT folks get blamed often enough for build issues w/o attributing one more problem to them. On 2/23/2012 9:47 AM, Jeffrey Squyres wrote: Cool; thanks for setting this straight. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux
I think the VT folks get blamed often enough for build issues w/o attributing one more problem to them. On 2/23/2012 9:47 AM, Jeffrey Squyres wrote: Cool; thanks for setting this straight. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux
Just a Minor correction. Instead of: - The C++ part of the build (VT) is deep within the OMPI build; it works fine with the C compiler all the way up until that point The correct facts are: - The C++ part of the VT build requires CXXFLAGS=-library=stlport4 when using the SS12 compilers. - Addition of that flag leads to the reported error when compiling ompi/mpi/cxx/file.cc (NOT in VT) -Paul On 2/23/2012 7:23 AM, Jeffrey Squyres wrote: Terry and I talked about this on the phone. Supporting facts (some of these are repeated from Paul's prior posts): - This happens with the C++ SS 12.2 compiler on supported Linux platforms - The C++ part of the build (VT) is deep within the OMPI build; it works fine with the C compiler all the way up until that point - /usr/include/sys/types.h typedefs u_char, and is directly included in event.h - So SS 12.2/C++ is somehow mucking up to make that typedef not be available - The upgrade from 12.2 to 12.3 is a free download This feels like a SS 12.2 C++ compiler bug to me. And it's free to upgrade to a version that does not have this problem. Hence, this has become a README note. The road to v1.5.5 just got a little shorter. On Feb 22, 2012, at 3:16 PM, Paul H. Hargrove wrote: I think I have the beginning of a fix for this issue. I had not even noticed earlier that the error in event.h is from the C++ compiler, when compiling file.cxx in the c++ bindings. That makes the vendor-specific addition of "-library=stlport4" to CXXFLAGS quite relevant to the problem/solution. It eventually occurred to me that when VT's sub-configure told me to add configure arguments, I could have used --with-contrib-vt-flags to pass that ONLY to VT and perhaps NOT mess with whatever karma was providing the definition of u_char. However, when I tried that I was disappointed to find that the bit of configure logic that suggests/requires CXXFLAGS=-library=stlport4 (from ompi/contrib/vt/configure.m4) runs BEFORE the processing of --with-contrib-vt-flags. So, that was a dead end. So, the next idea was to look for a fix specific to sltport. I tried adding near the top of opal/event/event.h (after the WINDOWS equivalent): #ifdef STLPORT typedef unsigned char u_char; #endif That managed to clear up the original problem w/ SS12.2. With SS12.3, things also built fine. This suggests the typedef is not "conflicting" with whatever other defn was present. I think the "safety" of this needs to be examined more widely before this can be adopted. My concern is that some system could "typedef char u_char" if it has char unsigned by default, leading to a conflict. Now that would, I suppose, only be a problem if STLPORT is also defined. So, maybe I am over thinking this. -Paul On 2/21/2012 11:10 PM, Paul H. Hargrove wrote: More notes: I've tested ompi-1.5.4 and it has the same problem. So, this is NOT a regression. Terry D. has observed that Ubuntu is NOT a supported platform for the Solaris Studio compilers. So, I've reproduced on a Scientific Linux 5.5 platform (Red Hat Enterprise Linux 5.5 clone, like CentOS) to be sure that was NOT the cause. When I configure for the SS12.x compilers, I've been passing CXXFLAGS="-library=stlport4" as the VT sub-configure has informed me I should, due to something wrong the the default STL. I tried dropping that from configure, and THE BUILD WAS SUCCESSFUL. So, one has 2 choices: + build w/ SS12.2 without VT + update to SS12.3 and have VT I don't think there is sufficient reason to delay 1.5.5 for this. -Paul On 2/21/2012 4:39 PM, Paul H. Hargrove wrote: A few things to note: 1) This is NOT a problem w/ the SS12.3 compilers on the same machine. So, one could say "upgrade your compiler" (a free download) and not delay 1.5.5 for this issue. 2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 and SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus) 3) Testing the trunk I DON'T see the problem with either SS12.2 or SS12.3. This is interesting, because it probably means that a u_char definition is SOMEWHERE in the headers (because libevent *is* getting built). Whatever else may be done, I think this should be fixed "properly" (whatever that may equate to) for 1.6. The way I see it now, it feels like OMPI is getting a definition of u_char only "by accident". -Paul On 2/21/2012 12:16 PM, Paul H. Hargrove wrote: Building the v1.5 branch on Linux with the Solaris Studio 12.2 compilers I see the following failure: "[srcdir]/opal/event/event.h", line 797: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 798: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead of "*". Where line 1184 is a prototype containing "u_char *". As far as I can find, only sever
Re: [OMPI devel] 1.5 supported systems
I can get exact info from my MacOS 10.7 machine later, but its gcc is llvm-gcc-4.2 IIRC. Here are my 10.5 and 10.6: ProductName:Mac OS X ProductVersion: 10.5.8 BuildVersion: 9L31a powerpc lrwxr-xr-x 1 root wheel 7 Nov 1 2008 /usr/bin/gcc -> gcc-4.0 -r-xr-xr-x 1 root wheel 258368 Feb 19 2008 /usr/bin/gcc-3.3 -rwxr-xr-x 1 root wheel 93088 Jul 17 2008 /usr/bin/gcc-4.0 -rwxr-xr-x 1 root wheel 105680 May 18 2008 /usr/bin/gcc-4.2 ProductName:Mac OS X ProductVersion: 10.5.8 BuildVersion: 9L30 i386 lrwxr-xr-x 1 root wheel 7 Nov 8 2007 /usr/bin/gcc -> gcc-4.0 -rwxr-xr-x 1 root wheel 93072 Sep 23 2007 /usr/bin/gcc-4.0 ProductName:Mac OS X ProductVersion: 10.6.8 BuildVersion: 10K549 i386 lrwxr-xr-x 1 root wheel 7 Sep 29 2009 /usr/bin/gcc -> gcc-4.2 -rwxr-xr-x 1 root wheel 97392 May 18 2009 /usr/bin/gcc-4.0 -rwxr-xr-x 1 root wheel 166128 May 18 2009 /usr/bin/gcc-4.2 On 2/22/2012 6:13 PM, Larry Baker wrote: Paul, Haven't you been running Intel compilers on OS X? Also, do we have specifics about which gcc's on Mac OS X? I have (OS X 10.5.8): savaii:~ baker$ ls -l /usr/bin/gcc* lrwxr-xr-x 1 root wheel 7 Oct 2 2009 /usr/bin/gcc -> gcc-4.0 -r-xr-xr-x 1 root wheel 258368 Feb 19 2008 /usr/bin/gcc-3.3 -rwxr-xr-x 1 root wheel 93088 Feb 5 2009 /usr/bin/gcc-4.0 -rwxr-xr-x 1 root wheel 105680 Apr 27 2009 /usr/bin/gcc-4.2 savaii:~ baker$ ls -l /usr/bin/cc* lrwxr-xr-x 1 root wheel 7 Oct 2 2009 /usr/bin/cc -> gcc-4.0 savaii:~ baker$ ls /Developer/usr/llvm-gcc-4.2/bin/*cc* /Developer/usr/llvm-gcc-4.2/bin/i686-apple-darwin9-llvm-gcc-4.2 /Developer/usr/llvm-gcc-4.2/bin/llvm-gcc-4.2 /Developer/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin9-llvm-gcc-4.2 Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov <mailto:ba...@usgs.gov> On 22 Feb 2012, at 5:55 PM, Paul H. Hargrove wrote: Folks at Oracle should decide, but I suspect "Solaris 10" should be updated to "Solaris 10 and 11", or just "11". -Paul On 2/22/2012 2:44 PM, Jeffrey Squyres wrote: Please verify this list of supported systems for the v1.5.5 release: - The run-time systems that are currently supported are: - rsh / ssh - LoadLeveler - PBS Pro, Open PBS, Torque - Platform LSF (v7.0.2 and later) - SLURM - Cray XT-3, XT-4, and XT-5 - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008) - Systems that have been tested are: - Linux (various flavors/distros), 32 bit, with gcc, and Oracle Solaris Studio 12 - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft, Intel, Portland, and Oracle Solaris Studio 12 compilers (*) - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and Absoft compilers (*) - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with Oracle Solaris Studio 12 (*) Be sure to read the Compiler Notes, below. - Other systems have been lightly (but not fully tested): - Other 64 bit platforms (e.g., Linux on PPC64) - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008); see the README.WINDOWS file. -- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov> Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org <mailto:de...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5 supported systems
I have NOT been running Intel's compilers on Macs, only on Linux. I *tried* PGI's compilers on MacOS, but that was a flop. I have used Clang (comes w/ XCode 4.2) on MacOS, and that works for me but is not extensively tested. -Paul On 2/22/2012 6:13 PM, Larry Baker wrote: Paul, Haven't you been running Intel compilers on OS X? Also, do we have specifics about which gcc's on Mac OS X? I have (OS X 10.5.8): savaii:~ baker$ ls -l /usr/bin/gcc* lrwxr-xr-x 1 root wheel 7 Oct 2 2009 /usr/bin/gcc -> gcc-4.0 -r-xr-xr-x 1 root wheel 258368 Feb 19 2008 /usr/bin/gcc-3.3 -rwxr-xr-x 1 root wheel 93088 Feb 5 2009 /usr/bin/gcc-4.0 -rwxr-xr-x 1 root wheel 105680 Apr 27 2009 /usr/bin/gcc-4.2 savaii:~ baker$ ls -l /usr/bin/cc* lrwxr-xr-x 1 root wheel 7 Oct 2 2009 /usr/bin/cc -> gcc-4.0 savaii:~ baker$ ls /Developer/usr/llvm-gcc-4.2/bin/*cc* /Developer/usr/llvm-gcc-4.2/bin/i686-apple-darwin9-llvm-gcc-4.2 /Developer/usr/llvm-gcc-4.2/bin/llvm-gcc-4.2 /Developer/usr/llvm-gcc-4.2/bin/powerpc-apple-darwin9-llvm-gcc-4.2 Larry Baker US Geological Survey 650-329-5608 ba...@usgs.gov <mailto:ba...@usgs.gov> On 22 Feb 2012, at 5:55 PM, Paul H. Hargrove wrote: Folks at Oracle should decide, but I suspect "Solaris 10" should be updated to "Solaris 10 and 11", or just "11". -Paul On 2/22/2012 2:44 PM, Jeffrey Squyres wrote: Please verify this list of supported systems for the v1.5.5 release: - The run-time systems that are currently supported are: - rsh / ssh - LoadLeveler - PBS Pro, Open PBS, Torque - Platform LSF (v7.0.2 and later) - SLURM - Cray XT-3, XT-4, and XT-5 - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008) - Systems that have been tested are: - Linux (various flavors/distros), 32 bit, with gcc, and Oracle Solaris Studio 12 - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft, Intel, Portland, and Oracle Solaris Studio 12 compilers (*) - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and Absoft compilers (*) - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with Oracle Solaris Studio 12 (*) Be sure to read the Compiler Notes, below. - Other systems have been lightly (but not fully tested): - Other 64 bit platforms (e.g., Linux on PPC64) - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008); see the README.WINDOWS file. -- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov> Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org <mailto:de...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5 supported systems
Folks at Oracle should decide, but I suspect "Solaris 10" should be updated to "Solaris 10 and 11", or just "11". -Paul On 2/22/2012 2:44 PM, Jeffrey Squyres wrote: Please verify this list of supported systems for the v1.5.5 release: - The run-time systems that are currently supported are: - rsh / ssh - LoadLeveler - PBS Pro, Open PBS, Torque - Platform LSF (v7.0.2 and later) - SLURM - Cray XT-3, XT-4, and XT-5 - Oracle Grid Engine (OGE) 6.1, 6.2 and open source Grid Engine - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008) - Systems that have been tested are: - Linux (various flavors/distros), 32 bit, with gcc, and Oracle Solaris Studio 12 - Linux (various flavors/distros), 64 bit (x86), with gcc, Absoft, Intel, Portland, and Oracle Solaris Studio 12 compilers (*) - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and Absoft compilers (*) - Oracle Solaris 10, 32 and 64 bit (SPARC, i386, x86_64), with Oracle Solaris Studio 12 (*) Be sure to read the Compiler Notes, below. - Other systems have been lightly (but not fully tested): - Other 64 bit platforms (e.g., Linux on PPC64) - Microsoft Windows CCP (Microsoft Windows server 2003 and 2008); see the README.WINDOWS file. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux
I think I have the beginning of a fix for this issue. I had not even noticed earlier that the error in event.h is from the C++ compiler, when compiling file.cxx in the c++ bindings. That makes the vendor-specific addition of "-library=stlport4" to CXXFLAGS quite relevant to the problem/solution. It eventually occurred to me that when VT's sub-configure told me to add configure arguments, I could have used --with-contrib-vt-flags to pass that ONLY to VT and perhaps NOT mess with whatever karma was providing the definition of u_char. However, when I tried that I was disappointed to find that the bit of configure logic that suggests/requires CXXFLAGS=-library=stlport4 (from ompi/contrib/vt/configure.m4) runs BEFORE the processing of --with-contrib-vt-flags. So, that was a dead end. So, the next idea was to look for a fix specific to sltport. I tried adding near the top of opal/event/event.h (after the WINDOWS equivalent): #ifdef STLPORT typedef unsigned char u_char; #endif That managed to clear up the original problem w/ SS12.2. With SS12.3, things also built fine. This suggests the typedef is not "conflicting" with whatever other defn was present. I think the "safety" of this needs to be examined more widely before this can be adopted. My concern is that some system could "typedef char u_char" if it has char unsigned by default, leading to a conflict. Now that would, I suppose, only be a problem if STLPORT is also defined. So, maybe I am over thinking this. -Paul On 2/21/2012 11:10 PM, Paul H. Hargrove wrote: More notes: I've tested ompi-1.5.4 and it has the same problem. So, this is NOT a regression. Terry D. has observed that Ubuntu is NOT a supported platform for the Solaris Studio compilers. So, I've reproduced on a Scientific Linux 5.5 platform (Red Hat Enterprise Linux 5.5 clone, like CentOS) to be sure that was NOT the cause. When I configure for the SS12.x compilers, I've been passing CXXFLAGS="-library=stlport4" as the VT sub-configure has informed me I should, due to something wrong the the default STL. I tried dropping that from configure, and THE BUILD WAS SUCCESSFUL. So, one has 2 choices: + build w/ SS12.2 without VT + update to SS12.3 and have VT I don't think there is sufficient reason to delay 1.5.5 for this. -Paul On 2/21/2012 4:39 PM, Paul H. Hargrove wrote: A few things to note: 1) This is NOT a problem w/ the SS12.3 compilers on the same machine. So, one could say "upgrade your compiler" (a free download) and not delay 1.5.5 for this issue. 2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 and SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus) 3) Testing the trunk I DON'T see the problem with either SS12.2 or SS12.3. This is interesting, because it probably means that a u_char definition is SOMEWHERE in the headers (because libevent *is* getting built). Whatever else may be done, I think this should be fixed "properly" (whatever that may equate to) for 1.6. The way I see it now, it feels like OMPI is getting a definition of u_char only "by accident". -Paul On 2/21/2012 12:16 PM, Paul H. Hargrove wrote: Building the v1.5 branch on Linux with the Solaris Studio 12.2 compilers I see the following failure: "[srcdir]/opal/event/event.h", line 797: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 798: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead of "*". Where line 1184 is a prototype containing "u_char *". As far as I can find, only several files below opal/event/ contain any use of "u_char". There is a typedef for u_char in hwloc, but no use that I could see. To the best of my knowledge u_char is NOT defined by any standard, and thus there is no particular header one can reliably find it in. The alternatives, of course are "unsigned char" or "uint8_t" (defined in stdint.h). I had a look at the trunk and VISUALLY is appears the same problem exists in: opal/event/event.h opal/mca/event/libevent2013/libevent/event.h However, my testing is currently confined to the v1.5 branch in the hopes of finally getting the next 1.5.5rc out the door. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux
More notes: I've tested ompi-1.5.4 and it has the same problem. So, this is NOT a regression. Terry D. has observed that Ubuntu is NOT a supported platform for the Solaris Studio compilers. So, I've reproduced on a Scientific Linux 5.5 platform (Red Hat Enterprise Linux 5.5 clone, like CentOS) to be sure that was NOT the cause. When I configure for the SS12.x compilers, I've been passing CXXFLAGS="-library=stlport4" as the VT sub-configure has informed me I should, due to something wrong the the default STL. I tried dropping that from configure, and THE BUILD WAS SUCCESSFUL. So, one has 2 choices: + build w/ SS12.2 without VT + update to SS12.3 and have VT I don't think there is sufficient reason to delay 1.5.5 for this. -Paul On 2/21/2012 4:39 PM, Paul H. Hargrove wrote: A few things to note: 1) This is NOT a problem w/ the SS12.3 compilers on the same machine. So, one could say "upgrade your compiler" (a free download) and not delay 1.5.5 for this issue. 2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 and SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus) 3) Testing the trunk I DON'T see the problem with either SS12.2 or SS12.3. This is interesting, because it probably means that a u_char definition is SOMEWHERE in the headers (because libevent *is* getting built). Whatever else may be done, I think this should be fixed "properly" (whatever that may equate to) for 1.6. The way I see it now, it feels like OMPI is getting a definition of u_char only "by accident". -Paul On 2/21/2012 12:16 PM, Paul H. Hargrove wrote: Building the v1.5 branch on Linux with the Solaris Studio 12.2 compilers I see the following failure: "[srcdir]/opal/event/event.h", line 797: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 798: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead of "*". Where line 1184 is a prototype containing "u_char *". As far as I can find, only several files below opal/event/ contain any use of "u_char". There is a typedef for u_char in hwloc, but no use that I could see. To the best of my knowledge u_char is NOT defined by any standard, and thus there is no particular header one can reliably find it in. The alternatives, of course are "unsigned char" or "uint8_t" (defined in stdint.h). I had a look at the trunk and VISUALLY is appears the same problem exists in: opal/event/event.h opal/mca/event/libevent2013/libevent/event.h However, my testing is currently confined to the v1.5 branch in the hopes of finally getting the next 1.5.5rc out the door. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 r25914 DOA
My build with the "2011_sp1.8.273" Intel compilers passes the same tests as I detailed below for "2011_sp1.7.256". I don't suspect any longer that the compiler is at fault, but am willing to try additional/alternate tests to help confirm. -Paul On 2/21/2012 5:40 PM, Paul H. Hargrove wrote: Here are the first of the results of the testing I promised. I am not 100% sure how to reach the code that Eugene reported as problematic, so I tried just running the ring test with various -bind-to-* options. I am quite willing to run additional test cases. All runs are w/ OMPI_MCA_btl=sm,self. + 2011.5.220 FAIL: "make check" fails opal_datatype_test OK: mpirun -np 2 ./ring_c OK: mpirun -np 2 -bind-to-none ./ring_c OK: mpirun -np 2 -bind-to-core ./ring_c OK: mpirun -np 2 -bind-to-socket ./ring_c + 2011_sp1.7.256 OK: "make check" OK: mpirun -np 2 -bind-to-none ./ring_c OK: mpirun -np 2 -bind-to-core ./ring_c OK: mpirun -np 2 -bind-to-socket ./ring_c So, I don't think the "2011_sp1.7.256" compilers are broken (and are "better" than the ones I've been using). I have a build with "2011_sp1.8.273" churning away right now (est. 45minutes to complete - should have disabled the Fortan bindings) If there is something other than the -bind-to-* flags I should be using to reach the problematic code, let me know. But based on what I've seen so far, I think we can probably rule out the compiler as the problem. -Paul On 2/21/2012 4:37 PM, Paul H. Hargrove wrote: I have been testing v1.5 with slightly older Intel "composerxe-2011.5.220" compilers. I see a "make check" failure in opal_datatype_test which is not present with any other compiler (such as gcc on the same node). This has been seen most recently on the 1.5.5rc2r25990 tarball generated earlier today. With "make check -k" I can confirm that opal_datatype_test is the ONLY failure I see with this compiler. So, I have just assumed this was a buggy compiler and thought nothing more of it. I have not yet tested them, but also have the same "composer_xe_2011_sp1.7.256" compiler and a more recent "composer_xe_2011_sp1.8.273". I will test both ASAP and report back with my findings. -Paul On 2/21/2012 4:20 PM, Eugene Loh wrote: We have some amount of MTT testing going on every night and on ONE of our systems v1.5 has been dead since r25914. The system is Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux and I'm encountering the problem with Intel (composer_xe_2011_sp1.7.256) compilers. I haven't poked around enough yet to figure out what the problematic characteristic of this configuration is. In r25914, orte/mca/odls/base/odls_base_open.c, we get 222 /* get the number of local sockets unless we were given a number */ 223 if (0 == orte_default_num_sockets_per_board) { 224 opal_paffinity_base_get_socket_info(_odls_globals.num_sockets); 225 } 226 /* get the number of local processors */ 227 opal_paffinity_base_get_processor_info(_odls_globals.num_processors); 228 /* compute the base number of cores/socket, if not given */ 229 if (0 == orte_default_num_cores_per_socket) { 230 orte_odls_globals.num_cores_per_socket = orte_odls_globals.num_processors / orte_odls_globals.num_sockets; 231 } Well, we execute the branch at line 224, but num_sockets remains 0. This leads to the divide-by-0 at line 230. Digging deeper, the call at line 224 led us to opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff left out): static int module_get_socket_info(int *num_sockets) { hwloc_topology_t *t = _hwloc_topology; *num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET); return OPAL_SUCCESS; } Anyhow, SOCKET is somehow an unknown layer, so num_sockets is returning 0. I can poke around more, but does someone want to advise? _______ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 r25914 DOA
Here are the first of the results of the testing I promised. I am not 100% sure how to reach the code that Eugene reported as problematic, so I tried just running the ring test with various -bind-to-* options. I am quite willing to run additional test cases. All runs are w/ OMPI_MCA_btl=sm,self. + 2011.5.220 FAIL: "make check" fails opal_datatype_test OK: mpirun -np 2 ./ring_c OK: mpirun -np 2 -bind-to-none ./ring_c OK: mpirun -np 2 -bind-to-core ./ring_c OK: mpirun -np 2 -bind-to-socket ./ring_c + 2011_sp1.7.256 OK: "make check" OK: mpirun -np 2 -bind-to-none ./ring_c OK: mpirun -np 2 -bind-to-core ./ring_c OK: mpirun -np 2 -bind-to-socket ./ring_c So, I don't think the "2011_sp1.7.256" compilers are broken (and are "better" than the ones I've been using). I have a build with "2011_sp1.8.273" churning away right now (est. 45minutes to complete - should have disabled the Fortan bindings) If there is something other than the -bind-to-* flags I should be using to reach the problematic code, let me know. But based on what I've seen so far, I think we can probably rule out the compiler as the problem. -Paul On 2/21/2012 4:37 PM, Paul H. Hargrove wrote: I have been testing v1.5 with slightly older Intel "composerxe-2011.5.220" compilers. I see a "make check" failure in opal_datatype_test which is not present with any other compiler (such as gcc on the same node). This has been seen most recently on the 1.5.5rc2r25990 tarball generated earlier today. With "make check -k" I can confirm that opal_datatype_test is the ONLY failure I see with this compiler. So, I have just assumed this was a buggy compiler and thought nothing more of it. I have not yet tested them, but also have the same "composer_xe_2011_sp1.7.256" compiler and a more recent "composer_xe_2011_sp1.8.273". I will test both ASAP and report back with my findings. -Paul On 2/21/2012 4:20 PM, Eugene Loh wrote: We have some amount of MTT testing going on every night and on ONE of our systems v1.5 has been dead since r25914. The system is Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux and I'm encountering the problem with Intel (composer_xe_2011_sp1.7.256) compilers. I haven't poked around enough yet to figure out what the problematic characteristic of this configuration is. In r25914, orte/mca/odls/base/odls_base_open.c, we get 222 /* get the number of local sockets unless we were given a number */ 223 if (0 == orte_default_num_sockets_per_board) { 224 opal_paffinity_base_get_socket_info(_odls_globals.num_sockets); 225 } 226 /* get the number of local processors */ 227 opal_paffinity_base_get_processor_info(_odls_globals.num_processors); 228 /* compute the base number of cores/socket, if not given */ 229 if (0 == orte_default_num_cores_per_socket) { 230 orte_odls_globals.num_cores_per_socket = orte_odls_globals.num_processors / orte_odls_globals.num_sockets; 231 } Well, we execute the branch at line 224, but num_sockets remains 0. This leads to the divide-by-0 at line 230. Digging deeper, the call at line 224 led us to opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff left out): static int module_get_socket_info(int *num_sockets) { hwloc_topology_t *t = _hwloc_topology; *num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET); return OPAL_SUCCESS; } Anyhow, SOCKET is somehow an unknown layer, so num_sockets is returning 0. I can poke around more, but does someone want to advise? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux
A few things to note: 1) This is NOT a problem w/ the SS12.3 compilers on the same machine. So, one could say "upgrade your compiler" (a free download) and not delay 1.5.5 for this issue. 2) This is ONLY a problem on Linux, and not on Solaris (both SS12.2 and SS12.3 tested for x86, x86-64, Sparc/v9 and Sparc/v8plus) 3) Testing the trunk I DON'T see the problem with either SS12.2 or SS12.3. This is interesting, because it probably means that a u_char definition is SOMEWHERE in the headers (because libevent *is* getting built). Whatever else may be done, I think this should be fixed "properly" (whatever that may equate to) for 1.6. The way I see it now, it feels like OMPI is getting a definition of u_char only "by accident". -Paul On 2/21/2012 12:16 PM, Paul H. Hargrove wrote: Building the v1.5 branch on Linux with the Solaris Studio 12.2 compilers I see the following failure: "[srcdir]/opal/event/event.h", line 797: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 798: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead of "*". Where line 1184 is a prototype containing "u_char *". As far as I can find, only several files below opal/event/ contain any use of "u_char". There is a typedef for u_char in hwloc, but no use that I could see. To the best of my knowledge u_char is NOT defined by any standard, and thus there is no particular header one can reliably find it in. The alternatives, of course are "unsigned char" or "uint8_t" (defined in stdint.h). I had a look at the trunk and VISUALLY is appears the same problem exists in: opal/event/event.h opal/mca/event/libevent2013/libevent/event.h However, my testing is currently confined to the v1.5 branch in the hopes of finally getting the next 1.5.5rc out the door. -Paul -- Paul H. hargrovephhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.5 r25914 DOA
I have been testing v1.5 with slightly older Intel "composerxe-2011.5.220" compilers. I see a "make check" failure in opal_datatype_test which is not present with any other compiler (such as gcc on the same node). This has been seen most recently on the 1.5.5rc2r25990 tarball generated earlier today. With "make check -k" I can confirm that opal_datatype_test is the ONLY failure I see with this compiler. So, I have just assumed this was a buggy compiler and thought nothing more of it. I have not yet tested them, but also have the same "composer_xe_2011_sp1.7.256" compiler and a more recent "composer_xe_2011_sp1.8.273". I will test both ASAP and report back with my findings. -Paul On 2/21/2012 4:20 PM, Eugene Loh wrote: We have some amount of MTT testing going on every night and on ONE of our systems v1.5 has been dead since r25914. The system is Linux burl-ct-v20z-10 2.6.9-67.ELsmp #1 SMP Wed Nov 7 13:56:44 EST 2007 x86_64 x86_64 x86_64 GNU/Linux and I'm encountering the problem with Intel (composer_xe_2011_sp1.7.256) compilers. I haven't poked around enough yet to figure out what the problematic characteristic of this configuration is. In r25914, orte/mca/odls/base/odls_base_open.c, we get 222 /* get the number of local sockets unless we were given a number */ 223 if (0 == orte_default_num_sockets_per_board) { 224 opal_paffinity_base_get_socket_info(_odls_globals.num_sockets); 225 } 226 /* get the number of local processors */ 227 opal_paffinity_base_get_processor_info(_odls_globals.num_processors); 228 /* compute the base number of cores/socket, if not given */ 229 if (0 == orte_default_num_cores_per_socket) { 230 orte_odls_globals.num_cores_per_socket = orte_odls_globals.num_processors / orte_odls_globals.num_sockets; 231 } Well, we execute the branch at line 224, but num_sockets remains 0. This leads to the divide-by-0 at line 230. Digging deeper, the call at line 224 led us to opal/mca/paffinity/hwloc/paffinity_hwloc_module.c (lots of stuff left out): static int module_get_socket_info(int *num_sockets) { hwloc_topology_t *t = _hwloc_topology; *num_sockets = (int) hwloc_get_nbobjs_by_type(*t, HWLOC_OBJ_SOCKET); return OPAL_SUCCESS; } Anyhow, SOCKET is somehow an unknown layer, so num_sockets is returning 0. I can poke around more, but does someone want to advise? ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] v1.5 build failure w/ Solaris Studio 12.2 on Linux
Building the v1.5 branch on Linux with the Solaris Studio 12.2 compilers I see the following failure: "[srcdir]/opal/event/event.h", line 797: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 798: Error: Type name expected instead of "u_char". "[srcdir]/opal/event/event.h", line 1184: Error: "," expected instead of "*". Where line 1184 is a prototype containing "u_char *". As far as I can find, only several files below opal/event/ contain any use of "u_char". There is a typedef for u_char in hwloc, but no use that I could see. To the best of my knowledge u_char is NOT defined by any standard, and thus there is no particular header one can reliably find it in. The alternatives, of course are "unsigned char" or "uint8_t" (defined in stdint.h). I had a look at the trunk and VISUALLY is appears the same problem exists in: opal/event/event.h opal/mca/event/libevent2013/libevent/event.h However, my testing is currently confined to the v1.5 branch in the hopes of finally getting the next 1.5.5rc out the door. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable code in examples/Makefile
Thanks, Ralph. Excellent point about not needing to use the "FC" name with its special (absurd?) behavior. -Paul On 2/21/2012 1:52 AM, Ralph Castain wrote: I went ahead and applied this, with a tweak. There is no reason to call our flag "FC" as all we use it for is to call the write wrapper. So I renamed it to something less problematic. On Feb 21, 2012, at 1:20 AM, Paul H. Hargrove wrote: And while we are looking at examples/Makefile on Solaris-10, why are the F77 examples getting built w/ mpif90? Because w/ the Solaris make setting FC also silently sets F77 (yes, I am NOT kidding)! So, reordering the F77= and FC= lines in Makefile resolves that mis-behavior. Attached is my patch to fix both F77/FC and the "better" ompi_info queries mentioned in my previous post. This REPLACES the patch in the previous post. -Paul On 2/20/2012 11:36 PM, Paul H. Hargrove wrote: The addition on Monday of the Java cases to examples/Makefile has shown that the default "make" in Solaris-10 will stop on the first failed grep command in the "all" rule: $ make mpicc -g -o hello_c hello_c.c mpicc -g -o ring_c ring_c.c mpicc -g -o connectivity_c connectivity_c.c mpic++ -g -o hello_cxx hello_cxx.cc mpic++ -g -o ring_cxx ring_cxx.cc mpif90 -g hello_f77.f -o hello_f77 mpif90 -g ring_f77.f -o ring_f77 mpif90 -g hello_f90.f90 -o hello_f90 mpif90 -g ring_f90.f90 -o ring_f90 *** Error code 1 The following command caused the error: if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ make Hello.class; \ fi make: Fatal error: Command failed for target `all' The addition of java did NOT break anything, but exposed a pre-existing problem which was not evident in my prior testing because all language bindings were being build prior to adding java. The attached patch resolves the problem in my (admittedly minimal) testing with the smallest possible change. However an entirely different avoids both "test" and "true" and simply looks like: @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then I have *also* tested that approach, and it works fine too. I *did* warn that the introduction of the java bindings would bring collateral damage. I just didn't anticipate encountering it personally. -Paul ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. hargrovephhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org <mailto:de...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable code in examples/Makefile
And while we are looking at examples/Makefile on Solaris-10, why are the F77 examples getting built w/ mpif90? Because w/ the Solaris make setting FC also silently sets F77 (yes, I am NOT kidding)! So, reordering the F77= and FC= lines in Makefile resolves that mis-behavior. Attached is my patch to fix both F77/FC and the "better" ompi_info queries mentioned in my previous post. This REPLACES the patch in the previous post. -Paul On 2/20/2012 11:36 PM, Paul H. Hargrove wrote: The addition on Monday of the Java cases to examples/Makefile has shown that the default "make" in Solaris-10 will stop on the first failed grep command in the "all" rule: $ make mpicc -g -o hello_c hello_c.c mpicc -g -o ring_c ring_c.c mpicc -g -o connectivity_c connectivity_c.c mpic++ -g -o hello_cxx hello_cxx.cc mpic++ -g -o ring_cxx ring_cxx.cc mpif90 -g hello_f77.f -o hello_f77 mpif90 -g ring_f77.f -o ring_f77 mpif90 -g hello_f90.f90 -o hello_f90 mpif90 -g ring_f90.f90 -o ring_f90 *** Error code 1 The following command caused the error: if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ make Hello.class; \ fi make: Fatal error: Command failed for target `all' The addition of java did NOT break anything, but exposed a pre-existing problem which was not evident in my prior testing because all language bindings were being build prior to adding java. The attached patch resolves the problem in my (admittedly minimal) testing with the smallest possible change. However an entirely different avoids both "test" and "true" and simply looks like: @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then I have *also* tested that approach, and it works fine too. I *did* warn that the introduction of the java bindings would bring collateral damage. I just didn't anticipate encountering it personally. -Paul ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 Index: Makefile === --- Makefile(revision 25980) +++ Makefile(working copy) @@ -25,8 +25,8 @@ CC = mpicc CXX = mpic++ CCC = mpic++ +FC = mpif90 F77 = mpif77 -FC = mpif90 JAVAC = mpijavac # Using -g is not necessary, but it is helpful for example programs, @@ -49,19 +49,19 @@ # if Open MPI was build with the relevant language bindings. all: hello_c ring_c connectivity_c - @ if test "`ompi_info --parsable | grep bindings:cxx:yes`" != ""; then \ + @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then \ $(MAKE) hello_cxx ring_cxx; \ fi - @ if test "`ompi_info --parsable | grep bindings:f77:yes`" != ""; then \ + @ if ompi_info --parsable | grep bindings:f77:yes >/dev/null; then \ $(MAKE) hello_f77 ring_f77; \ fi - @ if test "`ompi_info --parsable | grep bindings:f90:yes`" != ""; then \ + @ if ompi_info --parsable | grep bindings:f90:yes >/dev/null; then \ $(MAKE) hello_f90 ring_f90; \ fi - @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ + @ if ompi_info --parsable | grep bindings:java:yes >/dev/null; then \ $(MAKE) Hello.class; \ fi - @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ + @ if ompi_info --parsable | grep bindings:java:yes >/dev/null ; then \ $(MAKE) Ring.class; \ fi
[OMPI devel] non-portable code in examples/Makefile
The addition on Monday of the Java cases to examples/Makefile has shown that the default "make" in Solaris-10 will stop on the first failed grep command in the "all" rule: $ make mpicc -g -o hello_c hello_c.c mpicc -g -o ring_c ring_c.c mpicc -g -o connectivity_c connectivity_c.c mpic++ -g -o hello_cxx hello_cxx.cc mpic++ -g -o ring_cxx ring_cxx.cc mpif90 -g hello_f77.f -o hello_f77 mpif90 -g ring_f77.f -o ring_f77 mpif90 -g hello_f90.f90 -o hello_f90 mpif90 -g ring_f90.f90 -o ring_f90 *** Error code 1 The following command caused the error: if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ make Hello.class; \ fi make: Fatal error: Command failed for target `all' The addition of java did NOT break anything, but exposed a pre-existing problem which was not evident in my prior testing because all language bindings were being build prior to adding java. The attached patch resolves the problem in my (admittedly minimal) testing with the smallest possible change. However an entirely different avoids both "test" and "true" and simply looks like: @ if ompi_info --parsable | grep bindings:cxx:yes >/dev/null; then I have *also* tested that approach, and it works fine too. I *did* warn that the introduction of the java bindings would bring collateral damage. I just didn't anticipate encountering it personally. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 Index: Makefile === --- Makefile (revision 25980) +++ Makefile (working copy) @@ -49,19 +49,19 @@ # if Open MPI was build with the relevant language bindings. all: hello_c ring_c connectivity_c - @ if test "`ompi_info --parsable | grep bindings:cxx:yes`" != ""; then \ + @ if test "`ompi_info --parsable | grep bindings:cxx:yes || true`" != ""; then \ $(MAKE) hello_cxx ring_cxx; \ fi - @ if test "`ompi_info --parsable | grep bindings:f77:yes`" != ""; then \ + @ if test "`ompi_info --parsable | grep bindings:f77:yes || true`" != ""; then \ $(MAKE) hello_f77 ring_f77; \ fi - @ if test "`ompi_info --parsable | grep bindings:f90:yes`" != ""; then \ + @ if test "`ompi_info --parsable | grep bindings:f90:yes || true`" != ""; then \ $(MAKE) hello_f90 ring_f90; \ fi - @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ + @ if test "`ompi_info --parsable | grep bindings:java:yes || true``" != ""; then \ $(MAKE) Hello.class; \ fi - @ if test "`ompi_info --parsable | grep bindings:java:yes`" != ""; then \ + @ if test "`ompi_info --parsable | grep bindings:java:yes || true`" != ""; then \ $(MAKE) Ring.class; \ fi
Re: [OMPI devel] [EXTERNAL] Re: trunk build failure on Altix [w/ WORK AROUND]
Testing tonight's trunk tarball on the Altix system I have access to looks fine now. Thanks, Brian. -Paul On 2/20/2012 11:49 AM, Paul H. Hargrove wrote: Brian, Thanks for looking into this. I'll plan to take a look at the trunk tarball tonight and report back. -Paul On 2/20/2012 8:49 AM, Barrett, Brian W wrote: Hi Paul - Thanks for noticing this. I guess we don't have many Altix developers. I think I've fixed it on the trunk with r25968, plus r25967 to make sure the Altix component gets selected over the Linux component on Altix systems. I don't have an Altix to test on; can you give it a go and let me know if it works? In the trunk right now, and should be in the trunk nightly tarball tomorrow morning. The problem cropped up when we started running the configure macros for components that couldn't possibly succeed (which we needed to make Automake happy in a couple of situations) sometime late in the 1.5 series. Before that, a component could never think it succeeded and then later be told it didn't. We added yet another macro to handle issues like this, so it was a fairly easy fix. Thanks, Brian On 2/17/12 4:26 PM, "Paul H. Hargrove"<phhargr...@lbl.gov> wrote: I've poked enough at the ompi configure magic to *think* I understand the source of the problem I've seen w/ both trunk and 1.5.x on the Altix. The problem appears to be that both timer/altix/configure.m4 and timer/linux/configure.m4 are setting the value of $timer_base_include and the LAST one "wins". Meanwhile, only the FIRST one is getting added to $static_components ("there can be only one"). So, I suspect the difference I saw between trunk and 1.5 was just a matter of which configure probe ran first. The result of having FIRST and LAST "win" in different settings is a mismatch. $ grep -e timer:linux -e timer:altix configure.out --- MCA component timer:linux (m4 configuration macro, priority 30) checking for MCA component timer:linux compile mode... static checking if MCA component timer:linux can compile... yes --- MCA component timer:altix (m4 configuration macro, priority 30) checking for MCA component timer:altix compile mode... static checking if MCA component timer:altix can compile... no which picks timer:linux and rejects timer:altix, as compared to: $ grep -e '"MCA_opal_timer_[SD]' -e MCA_timer_ config.status S["MCA_opal_timer_DSO_SUBDIRS"]="" S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux" S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la " S["MCA_opal_timer_DSO_COMPONENTS"]="" S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux" D["MCA_timer_IMPLEMENTATION_HEADER"]=" \"opal/mca/timer/altix/timer_altix.h\"" Which will build timer:linux but has improperly picked up the timer:altix HEADER! For the present, an explicit --with-timer=altix works-around the problem in either branch. However, the setting of the header variable by a NON-selected component is ERRONEOUS and should get fixed. In trunk, it may also make sense to raise the priority of timer:altix above that of timer:linux. -Paul On 2/15/2012 12:41 AM, Paul Hargrove wrote: I've configured the ompi trunk (nightly tarball 1.7a1r25927) on an SGI Altix. I used no special arguments indicating that this is an Altix, and there does not appear to be an altix-specific file in contrib/platform. My build fails as follows: make: Entering directory `/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrapper s' CC opal_wrapper.o CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_timer_altix_mmdev_timer_addr' ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_timer_altix_freq' collect2: ld returned 1 exit status The configure-generated opal_config.h contains #define MCA_timer_IMPLEMENTATION_HEADER "opal/mca/timer/altix/timer_altix.h" Nothing appears to have been built in BUILDDIR/opal/mca/timer/altix. However, BUILDDIR/opal/mca/timer/linux has been built. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 <tel:%2B1-510-495-2352> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 <tel:%2B1-510-486-6900> -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Dep
[OMPI devel] Invalid format strings in ROMIO
Both the v1.5 branch and trunk are getting lots of warnings from Clang like the following: CC ad_coll_exch_new.lo ../../../../../../../../ompi/mca/io/romio/romio/adio/common/ad_coll_exch_new.c:51:28: warning: length modifier 'L' results in undefined behavior or no effect with 'd' conversion specifier [-Wformat] fprintf(stderr, "%d=(%Ld,%Ld)\n", i, flatlist_node_p->indices[i], Manpages from both Linux (glibc) and FreeBSD (NOT glibc) agree that "L" is only a valid length modifier for the floating-point conversion specifiers. Grepping both v1.5 and trunk show instances of "%Ld" in: ompi/mca/io/romio/romio/adio/common/ad_write_nolock.c ompi/mca/io/romio/romio/adio/common/ad_coll_build_req_new.c ompi/mca/io/romio/romio/adio/common/ad_coll_exch_new.c ompi/mca/io/romio/romio/adio/ad_gridftp/ad_gridftp_write.c ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2_io_dtype.c ompi/mca/io/romio/romio/adio/ad_pvfs2/ad_pvfs2_io_list.c Not sure how much one cares, but I am reporting on the off chance somebody does want to fix this. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] flex warning from flex-2.5.4
On 2/20/2012 3:36 PM, Paul H. Hargrove wrote: NOTE: I've not yet actually tested the resulting show_help utility [but soon]. An "instrumented" version of test/opal_sos.c is getting the same string back from opal_show_help_string() both with and without my patch. So, I believe it to be correct/safe. Hopefully is anything "deeper" needs to be tested then Ralph can see to it. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [OMPI svn] svn:open-mpi r25966
The original problem of a missing aio.h was seen on OpenBSD-5.0 (which was released Nov 1, 2011) See http://www.open-mpi.org/community/lists/devel/2012/02/10470.php -Paul On 2/20/2012 4:03 PM, Edgar Gabriel wrote: just out of curiosity, what platform did not have support for the aio operations? Also, the proper solution will be to not compile the section using the aio functions, but still compile the rest of the module. I will try to implement that properly ASAP. The POSIX is the most basic module that shall be used if everything else breaks, so disabling it basically means that we should not compile OMPIO at all. Thanks Edgar On 2/20/2012 4:36 PM, Ralph Castain wrote: I'm afraid this commit breaks the ability to build from a tarball. I created a tarball from the trunk and then did a configure followed by "make clean". The make command failed to execute because it could not "make clean" in the mca/fbtl/posix directory as there is no Makefile in it. I checked and the Makefile -is- being created when built in an svn checkout, but is -not- being created when built from tarball. This was done on a Mac. On Feb 20, 2012, at 8:53 AM, jsquy...@osl.iu.edu wrote: Author: jsquyres Date: 2012-02-20 10:53:20 EST (Mon, 20 Feb 2012) New Revision: 25966 URL: https://svn.open-mpi.org/trac/ompi/changeset/25966 Log: Ensure that we have aio.h before trying to compile this component. Added: trunk/ompi/mca/fbtl/posix/configure.m4 Added: trunk/ompi/mca/fbtl/posix/configure.m4 == --- (empty file) +++ trunk/ompi/mca/fbtl/posix/configure.m4 2012-02-20 10:53:20 EST (Mon, 20 Feb 2012) @@ -0,0 +1,33 @@ +# -*- shell-script -*- +# +# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana +# University Research and Technology +# Corporation. All rights reserved. +# Copyright (c) 2004-2005 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, +# University of Stuttgart. All rights reserved. +# Copyright (c) 2004-2012 The Regents of the University of California. +# All rights reserved. +# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2008-2011 University of Houston. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + +# MCA_fbtl_posix_CONFIG(action-if-can-compile, +#[action-if-cant-compile]) +# +AC_DEFUN([MCA_ompi_fbtl_posix_CONFIG],[ +AC_CHECK_HEADER([aio.h], +[fbtl_posix_happy="yes"], +[fbtl_posix_happy="no"]) + +AS_IF([test "$fbtl_posix_happy" = "yes"], + [$1], + [$2]) +])dnl ___ svn mailing list s...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/svn ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] flex warning from flex-2.5.4
Ralph, The change below removes the warning, but very slightly changes the syntax that is parsed. In the original, anything following the "[tag]" was considered trailing context. However that made inputs like "[tag]foo]" ambiguous to the parser (hence the warning). With the change below, both "]" will be in the matched string. I am pretty sure that shouldn't ever happen in valid inputs anyway. NOTE: I've not yet actually tested the resulting show_help utility [but soon]. -Paul Index: opal/util/show_help_lex.l === --- opal/util/show_help_lex.l (revision 25974) +++ opal/util/show_help_lex.l (working copy) @@ -62,7 +62,7 @@ #.*\n ; /* comment line */ -^\[.+\]/.*\n { BEGIN(CHOMP); return OPAL_SHOW_HELP_PARSE_TOPIC; } +^\[.+\]/[^\]\n]*\n { BEGIN(CHOMP); return OPAL_SHOW_HELP_PARSE_TOPIC; } .*\n { BEGIN(INITIAL); } On 2/20/2012 3:26 PM, Ralph Castain wrote: My bad - didn't look closely enough. I'll take a look at it and see if there is anything we can do. On Feb 20, 2012, at 4:12 PM, Paul H. Hargrove wrote: Ralph, Are you sure this is a flex-generated file? I am looking at opal/util/show_help_lex.l in the svn trunk and it certainly looks human-generated to me. Please clue me in if I am missing something. The warning is from flex when processing the .l file, NOT from the compilation of the flex-generated .c file. -Paul On 2/19/2012 7:55 PM, Ralph Castain wrote: We get that everywhere, unfortunately - it comes from flex and is outside our control as the file it complains about is actually generated by flex itself. Unfortunately, flex is no longer maintained, and so nothing has been done to correct it. On Feb 19, 2012, at 8:47 PM, Paul H. Hargrove wrote: I've not checked any other systems, but building the trunk on OpenBSD and FreeBSD (w/ flex-2.5.4) I see the following: LEXshow_help_lex.c "[srcdir]/opal/util/show_help_lex.l", line 65: warning, dangerous trailing context I found this message in the flex documentation, and it mentions that the POSIX draft for LEX leaves such cases undefined. http://flex.sourceforge.net/manual/Limitations.html -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] flex warning from flex-2.5.4
Ralph, Are you sure this is a flex-generated file? I am looking at opal/util/show_help_lex.l in the svn trunk and it certainly looks human-generated to me. Please clue me in if I am missing something. The warning is from flex when processing the .l file, NOT from the compilation of the flex-generated .c file. -Paul On 2/19/2012 7:55 PM, Ralph Castain wrote: We get that everywhere, unfortunately - it comes from flex and is outside our control as the file it complains about is actually generated by flex itself. Unfortunately, flex is no longer maintained, and so nothing has been done to correct it. On Feb 19, 2012, at 8:47 PM, Paul H. Hargrove wrote: I've not checked any other systems, but building the trunk on OpenBSD and FreeBSD (w/ flex-2.5.4) I see the following: LEXshow_help_lex.c "[srcdir]/opal/util/show_help_lex.l", line 65: warning, dangerous trailing context I found this message in the flex documentation, and it mentions that the POSIX draft for LEX leaves such cases undefined. http://flex.sourceforge.net/manual/Limitations.html -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [EXTERNAL] Re: [OMPI svn] svn:open-mpi r25966
s/Jeff/Paul/ Jeff's only fault was trusting me too much. -Paul On 2/20/2012 2:41 PM, Barrett, Brian W wrote: That's because Jeff forgot to copy the line: AC_CONFIG_FILES([ompi/mca/fbtl/posix/Makefile]) > From whatever configure.m4 script he used as the base for his new macro:). Brian -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable test operator in configure
For those keeping score at home, that should have said "/usr/ucb" instead of "/usr/ucb/bin". I make mistakes too (as Ralph's observation of breakage w/ r25966 shows quite clearly). -Paul On 2/20/2012 2:37 PM, Paul H. Hargrove wrote: Short version: The "expr: Paren problem" comes from having /usr/ucb/bin ahead of /usr/bin in one's $PATH. So, I needed to fix my $PATH. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable test operator in configure
Short version: The "expr: Paren problem" comes from having /usr/ucb/bin ahead of /usr/bin in one's $PATH. So, I needed to fix my $PATH. Long version: This error is coming from configure's own argument parsing logic when the ROMIO sub-configure is invoked. The issue appears to be that the expr implementation of parens (for match capture), doesn't like the length of the match. This works: $ expr 'XCPPFLAGS=-D_REENTRANT -I/foo/bar' : '[^=]*=\(.*\)' -D_REENTRANT -I/foo/bar This (from config.log) does not: $ expr 'XCPPFLAGS= -D_REENTRANT -I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/openmpi-1.7a1r25959/opal/mca/hwloc/hwloc132/hwloc/include -I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/BB/opal/mca/hwloc/hwloc132/hwloc/include -I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/openmpi-1.7a1r25959/opal/mca/event/libevent2013/libevent -I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/openmpi-1.7a1r25959/opal/mca/event/libevent2013/libevent/include -I/export/home/phargrov/OMPI/openmpi-trunk-solaris10-x86-ss12u3/BB/opal/mca/event/libevent2013/libevent/include -I/usr/include/infiniband -I/usr/include/infiniband' : '[^=]*=\(.*\)' expr: Paren problem This one works: expr 'XCPPFLAGS=-D_REENTRANT -I/foo/bar/baz/1234567890/acdefghijklmnopqrstuvwxyz/this-is-getting-too-long-for-solaris-expr-to-handle-correctly/' : '[^=]*=\(.*\)' -D_REENTRANT -I/foo/bar/baz/1234567890/acdefghijklmnopqrstuvwxyz/this-is-getting-too-long-for-solaris-expr-to-handle-correctly/ While 1 more character breaks: {phargrov@cloon BB}$ expr 'XCPPFLAGS=-D_REENTRANT -I/foo/bar/baz/1234567890/acdefghijklmnopqrstuvwxyz/this-is-getting-too-long-for-solaris-expr-to-handle-correctly/1' : '[^=]*=\(.*\)' expr: Paren problem The work-around appears to be to ensure /usr/bin is before /usr/ucb/bin in PATH, since /usr/bin/expr doesn't display this problem. I've fixed my own PATH accordingly for my future Solaris testing. Even using /bin/sh, I saw no other "odd" behaviors with configure on Solaris-10. -Paul On 2/20/2012 1:16 PM, Paul H. Hargrove wrote: Argh!! I am now trying to track down "expr: Paren problem" on Solaris. The dash shell on Linux doesn't reproduce this one, unfortunately. -Paul On 2/20/2012 1:12 PM, Paul H. Hargrove wrote: I'll report back ASAP on my slowlaris10 results. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/20/2012 1:26 PM, Brice Goglin wrote: Le 08/02/2012 22:33, Paul H. Hargrove a écrit : Tests on the virtual node I have access to where that problem report originated is still not quite right. There is now a different assertion failing than I had seen before: lt-linux-libnuma: /users/phh1/OMPI/hwloc-1.3.2rc1-linux-ppc64-gcc//hwloc-1.3.2rc1/tests/linux-libnuma.c:83: main: Assertion `!memcmp(,_all_nodes, sizeof(nodemask_t))' failed. /bin/sh: line 5: 19416 Aborted ${dir}$tst FAIL: linux-libnuma I don't have any clue if that represents forward or backward progress. Can you try the attached patch? It removes nodemask checks (this deprecated interface is too buggy/strange in libnuma, no way to assert its behavior reliably). Then, it fixes the libnuma helpers to properly use os_index instead logical_index (important on your machine because node ids are sparse). And finally it makes sure the test actually checks what we want (shouldn't matter in your case). I've tested this on your topology, a 8-node machine with out-of-order numa node ids, and some basic nodes, with a recent and a less recent libnuma release. My current plan is to apply all these in all branches and then remove the nodemask conversion helpers from trunk. Brice I applied to the svn trunk and can now PASS "make check" on my odd virtual node. Thanks, Brice. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable test operator in configure
Argh!! I am now trying to track down "expr: Paren problem" on Solaris. The dash shell on Linux doesn't reproduce this one, unfortunately. -Paul On 2/20/2012 1:12 PM, Paul H. Hargrove wrote: I'll report back ASAP on my slowlaris10 results. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable test operator in configure
Not what I had expected to find, but a pretty simple fix (missing line continuation): Index: orte/mca/ess/alps/configure.m4 === --- orte/mca/ess/alps/configure.m4 (revision 25970) +++ orte/mca/ess/alps/configure.m4 (working copy) @@ -53,7 +53,7 @@ [orte_mca_ess_alps_happy="yes"], [orte_mca_ess_alps_happy="no"]) -AS_IF([test "$orte_mca_ess_alps_happy" = "yes" -a "$orte_without_full_support" = 0 -a +AS_IF([test "$orte_mca_ess_alps_happy" = "yes" -a "$orte_without_full_support" = 0 -a \ "$orte_mca_ess_alps_have_cnos" = 1], [$1], [$2]) That is sufficient to let "dash" on an Ubuntu system make it through configure. I'll report back ASAP on my slowlaris10 results. NOTE: this is NOT present in the v1.5 branch (no cmr is required). -Paul On 2/20/2012 12:46 PM, Jeffrey Squyres wrote: Ah, ok. On Feb 20, 2012, at 3:45 PM, Paul H. Hargrove wrote: Jeff, The one in config/ompi_load_platform.m4 was on my original hit-list. Getting PAST that one shows a new problem that appears NOT to be a "==". The autoconf manual warns about use of "-a" and "-o" together with variables that may expand to the empty string, and I suspect that is the new problem I am hitting. I hope to know soon. -Paul On 2/20/2012 12:41 PM, Jeffrey Squyres wrote: grep == configure | grep test only shows one more. I found it in config/ompi_load_platform.m4 and fixed it on the trunk. On Feb 20, 2012, at 3:38 PM, Paul H. Hargrove wrote: I am afraid that with the $with_platform instance fixed, configure on Solaris 10 gets far enough to find another problem. I'll provide a patch once I've tracked this down. Sigh. FYI: One can root out bashisms by using the "dash" shell on a Debian or Ubuntu system: $ env CONFIG_SHELL=dash dash [path_to]/configure [options] -Paul On 2/20/2012 5:42 AM, Jeffrey Squyres wrote: Fixed -- thanks! On Feb 20, 2012, at 4:11 AM, Paul H. Hargrove wrote: Please note that "==" is NOT a portable binary operator for the "test" utility. It is supported only by the bash built-in version of "test". The correct operator is a simple "=". The following appear in the svn trunk ./orte/config/orte_check_alps.m4: AS_IF([test "$orte_check_alps_pmi_happy" == "yes" -a "$orte_without_full_support" = 0], ./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then The $with_platform test breaks configure fairly early on at least Solaris 10. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable test operator in configure
Jeff, The one in config/ompi_load_platform.m4 was on my original hit-list. Getting PAST that one shows a new problem that appears NOT to be a "==". The autoconf manual warns about use of "-a" and "-o" together with variables that may expand to the empty string, and I suspect that is the new problem I am hitting. I hope to know soon. -Paul On 2/20/2012 12:41 PM, Jeffrey Squyres wrote: grep == configure | grep test only shows one more. I found it in config/ompi_load_platform.m4 and fixed it on the trunk. On Feb 20, 2012, at 3:38 PM, Paul H. Hargrove wrote: I am afraid that with the $with_platform instance fixed, configure on Solaris 10 gets far enough to find another problem. I'll provide a patch once I've tracked this down. Sigh. FYI: One can root out bashisms by using the "dash" shell on a Debian or Ubuntu system: $ env CONFIG_SHELL=dash dash [path_to]/configure [options] -Paul On 2/20/2012 5:42 AM, Jeffrey Squyres wrote: Fixed -- thanks! On Feb 20, 2012, at 4:11 AM, Paul H. Hargrove wrote: Please note that "==" is NOT a portable binary operator for the "test" utility. It is supported only by the bash built-in version of "test". The correct operator is a simple "=". The following appear in the svn trunk ./orte/config/orte_check_alps.m4: AS_IF([test "$orte_check_alps_pmi_happy" == "yes" -a "$orte_without_full_support" = 0], ./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then The $with_platform test breaks configure fairly early on at least Solaris 10. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 _______ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 _______ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] non-portable test operator in configure
I am afraid that with the $with_platform instance fixed, configure on Solaris 10 gets far enough to find another problem. I'll provide a patch once I've tracked this down. Sigh. FYI: One can root out bashisms by using the "dash" shell on a Debian or Ubuntu system: $ env CONFIG_SHELL=dash dash [path_to]/configure [options] -Paul On 2/20/2012 5:42 AM, Jeffrey Squyres wrote: Fixed -- thanks! On Feb 20, 2012, at 4:11 AM, Paul H. Hargrove wrote: Please note that "==" is NOT a portable binary operator for the "test" utility. It is supported only by the bash built-in version of "test". The correct operator is a simple "=". The following appear in the svn trunk ./orte/config/orte_check_alps.m4: AS_IF([test "$orte_check_alps_pmi_happy" == "yes" -a "$orte_without_full_support" = 0], ./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then The $with_platform test breaks configure fairly early on at least Solaris 10. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] [EXTERNAL] Re: trunk build failure on Altix [w/ WORK AROUND]
Brian, Thanks for looking into this. I'll plan to take a look at the trunk tarball tonight and report back. -Paul On 2/20/2012 8:49 AM, Barrett, Brian W wrote: Hi Paul - Thanks for noticing this. I guess we don't have many Altix developers. I think I've fixed it on the trunk with r25968, plus r25967 to make sure the Altix component gets selected over the Linux component on Altix systems. I don't have an Altix to test on; can you give it a go and let me know if it works? In the trunk right now, and should be in the trunk nightly tarball tomorrow morning. The problem cropped up when we started running the configure macros for components that couldn't possibly succeed (which we needed to make Automake happy in a couple of situations) sometime late in the 1.5 series. Before that, a component could never think it succeeded and then later be told it didn't. We added yet another macro to handle issues like this, so it was a fairly easy fix. Thanks, Brian On 2/17/12 4:26 PM, "Paul H. Hargrove"<phhargr...@lbl.gov> wrote: I've poked enough at the ompi configure magic to *think* I understand the source of the problem I've seen w/ both trunk and 1.5.x on the Altix. The problem appears to be that both timer/altix/configure.m4 and timer/linux/configure.m4 are setting the value of $timer_base_include and the LAST one "wins". Meanwhile, only the FIRST one is getting added to $static_components ("there can be only one"). So, I suspect the difference I saw between trunk and 1.5 was just a matter of which configure probe ran first. The result of having FIRST and LAST "win" in different settings is a mismatch. $ grep -e timer:linux -e timer:altix configure.out --- MCA component timer:linux (m4 configuration macro, priority 30) checking for MCA component timer:linux compile mode... static checking if MCA component timer:linux can compile... yes --- MCA component timer:altix (m4 configuration macro, priority 30) checking for MCA component timer:altix compile mode... static checking if MCA component timer:altix can compile... no which picks timer:linux and rejects timer:altix, as compared to: $ grep -e '"MCA_opal_timer_[SD]' -e MCA_timer_ config.status S["MCA_opal_timer_DSO_SUBDIRS"]="" S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux" S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la " S["MCA_opal_timer_DSO_COMPONENTS"]="" S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux" D["MCA_timer_IMPLEMENTATION_HEADER"]=" \"opal/mca/timer/altix/timer_altix.h\"" Which will build timer:linux but has improperly picked up the timer:altix HEADER! For the present, an explicit --with-timer=altix works-around the problem in either branch. However, the setting of the header variable by a NON-selected component is ERRONEOUS and should get fixed. In trunk, it may also make sense to raise the priority of timer:altix above that of timer:linux. -Paul On 2/15/2012 12:41 AM, Paul Hargrove wrote: I've configured the ompi trunk (nightly tarball 1.7a1r25927) on an SGI Altix. I used no special arguments indicating that this is an Altix, and there does not appear to be an altix-specific file in contrib/platform. My build fails as follows: make: Entering directory `/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrapper s' CC opal_wrapper.o CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_timer_altix_mmdev_timer_addr' ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_timer_altix_freq' collect2: ld returned 1 exit status The configure-generated opal_config.h contains #define MCA_timer_IMPLEMENTATION_HEADER "opal/mca/timer/altix/timer_altix.h" Nothing appears to have been built in BUILDDIR/opal/mca/timer/altix. However, BUILDDIR/opal/mca/timer/linux has been built. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 <tel:%2B1-510-495-2352> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 <tel:%2B1-510-486-6900> -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ dev
Re: [OMPI devel] Fwd: [OMPI users] WEXITSTATUS: OpenMPI 1.5.5 RC1 doesn't build on NetBSD (and perhaps other BSDs)
That was on my list of defects back in December. This change is ALREADY present in the v1.5 branch in svn. -Paul On 2/20/2012 8:48 AM, Jeffrey Squyres wrote: VT guys -- Can you have a look at this? I don't know if needs to be protected or not, but it looks like it's needed. Begin forwarded message: From: Aleksej Saushev<a...@inbox.ru> Subject: [OMPI users] WEXITSTATUS: OpenMPI 1.5.5 RC1 doesn't build on NetBSD (and perhaps other BSDs) Date: February 18, 2012 3:11:49 PM EST To: us...@open-mpi.org Reply-To: a...@inbox.ru, Open MPI Users<us...@open-mpi.org> Hello! WEXITSTATUS is defined in, see the patch attached. (Sorry, I couldn't find simple mail interface for bug reports.) -- HE CE3OH... ___ users mailing list us...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/users ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] non-portable test operator in configure
Please note that "==" is NOT a portable binary operator for the "test" utility. It is supported only by the bash built-in version of "test". The correct operator is a simple "=". The following appear in the svn trunk ./orte/config/orte_check_alps.m4: AS_IF([test "$orte_check_alps_pmi_happy" == "yes" -a "$orte_without_full_support" = 0], ./config/ompi_load_platform.m4:if test "$with_platform" == "" ; then The $with_platform test breaks configure fairly early on at least Solaris 10. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] trunk build failure on OpenBSD-5.0 [SOLVED/PATCH]
The attachment adds the necessary (cached) check for aio.h before enabling fbtl:posix. -Paul On 2/17/2012 12:55 AM, Paul Hargrove wrote: OpenBSD lacks an aio.h header. configure knows this: $ grep aio.h configure.log checking aio.h usability... no checking aio.h presence... no checking for aio.h... no Yet fbtl/posix is enabled, despite needing aio.h: checking if MCA component fbtl:posix can compile... yes I am guessing this problem will appear on any platform w/o aio.h. I think is just a simple matter of requiring OPAL_HAVE_AIO_H when "checking if component fbtl:posix can compile". -Paul -- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov> Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 Index: ompi/mca/fbtl/posix/configure.m4 === --- ompi/mca/fbtl/posix/configure.m4 (revision 0) +++ ompi/mca/fbtl/posix/configure.m4 (revision 0) @@ -0,0 +1,34 @@ +# -*- shell-script -*- +# +# Copyright (c) 2004-2005 The Trustees of Indiana University and Indiana +# University Research and Technology +# Corporation. All rights reserved. +# Copyright (c) 2004-2005 The University of Tennessee and The University +# of Tennessee Research Foundation. All rights +# reserved. +# Copyright (c) 2004-2005 High Performance Computing Center Stuttgart, +# University of Stuttgart. All rights reserved. +# Copyright (c) 2004-2012 The Regents of the University of California. +# All rights reserved. +# Copyright (c) 2010 Cisco Systems, Inc. All rights reserved. +# Copyright (c) 2008-2011 University of Houston. All rights reserved. +# $COPYRIGHT$ +# +# Additional copyrights may follow +# +# $HEADER$ +# + + +# MCA_fbtl_posix_CONFIG(action-if-can-compile, +#[action-if-cant-compile]) +# +AC_DEFUN([MCA_ompi_fbtl_posix_CONFIG],[ +AC_CHECK_HEADER([aio.h], +[fbtl_posix_happy="yes"], +[fbtl_posix_happy="no"]) + +AS_IF([test "$fbtl_posix_happy" = "yes"], + [$1], + [$2]) +])dnl
[OMPI devel] flex warning from flex-2.5.4
I've not checked any other systems, but building the trunk on OpenBSD and FreeBSD (w/ flex-2.5.4) I see the following: LEXshow_help_lex.c "[srcdir]/opal/util/show_help_lex.l", line 65: warning, dangerous trailing context I found this message in the flex documentation, and it mentions that the POSIX draft for LEX leaves such cases undefined. http://flex.sourceforge.net/manual/Limitations.html -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] excessive warnings from xlc w/ hwloc-trunk
On 2/19/2012 12:45 PM, Samuel Thibault wrote: [snip[ Ah, right, it's an inline, so we need to declare it first with the attribute, and then define it: static __hwloc_inline const char * hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) __hwloc_attribute_pure; static __hwloc_inline const char * hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) { ... } does it work that way? Samuel Yes. That worked! -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] excessive warnings from xlc w/ hwloc-trunk
On 2/19/2012 10:54 AM, Samuel Thibault wrote: [snip] Does it still complain if using the following? static __hwloc_inline const char * hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) __hwloc_attribute_pure That'd be safer to make sure that the attribute is applied to the function, not something else. [snip] I should have mentioned that I had tried Samuel's suggested form first. Yes, it complains but worse considers this form to by a syntax error rather than just warning about it: CC topology.lo "/users/phh1/hwloc-trunk/include/hwloc.h", line 1247.1: 1506-277 (S) Syntax error: possible missing ';' or ','? make[1]: *** [topology.lo] Error 1 So, we are safer sticking with the current form and ignoring the warning. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[hwloc-devel] excessive warnings from xlc w/ hwloc-trunk
The patch below is required to avoid xlc-11.1 on Linux/ppc64 from issuing lots of warnings: "[srcdir]/hwloc-trunk/include/hwloc.h", line 1245.34: 1506-1385 (W) The attribute "pure" is not a valid type attribute. The problem appears to be that when the function returns a pointer type, XLC thinks the attribute is being applied to the return type rather than the function. That is why no other instances of __hwloc_attribute_pure need to be reordered. I am not sure of the risk/reward on applying this change, however. Gcc seems to be happy enough either way as far as I could tell. -Paul --- hwloc-1.5a1r4308/include/hwloc.h~ 2012-02-17 17:45:45.0 -0600 +++ hwloc-1.5a1r4308/include/hwloc.h2012-02-17 17:52:20.0 -0600 @@ -1242,7 +1242,7 @@ * * \return \c NULL if no such key exists. */ -static __hwloc_inline const char * __hwloc_attribute_pure +static __hwloc_inline __hwloc_attribute_pure const char * hwloc_obj_get_info_by_name(hwloc_obj_t obj, const char *name) { unsigned i; -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] trunk build failure on Altix [w/ WORK AROUND]
I've poked enough at the ompi configure magic to *think* I understand the source of the problem I've seen w/ both trunk and 1.5.x on the Altix. The problem appears to be that both timer/altix/configure.m4 and timer/linux/configure.m4 are setting the value of $timer_base_include and the LAST one "wins". Meanwhile, only the FIRST one is getting added to $static_components ("there can be only one"). So, I suspect the difference I saw between trunk and 1.5 was just a matter of which configure probe ran first. The result of having FIRST and LAST "win" in different settings is a mismatch. $ grep -e timer:linux -e timer:altix configure.out --- MCA component timer:linux (m4 configuration macro, priority 30) checking for MCA component timer:linux compile mode... static checking if MCA component timer:linux can compile... yes --- MCA component timer:altix (m4 configuration macro, priority 30) checking for MCA component timer:altix compile mode... static checking if MCA component timer:altix can compile... no which picks timer:linux and rejects timer:altix, as compared to: $ grep -e '"MCA_opal_timer_[SD]' -e MCA_timer_ config.status S["MCA_opal_timer_DSO_SUBDIRS"]="" S["MCA_opal_timer_STATIC_SUBDIRS"]=" mca/timer/linux" S["MCA_opal_timer_STATIC_LTLIBS"]="mca/timer/linux/libmca_timer_linux.la " S["MCA_opal_timer_DSO_COMPONENTS"]="" S["MCA_opal_timer_STATIC_COMPONENTS"]=" linux" D["MCA_timer_IMPLEMENTATION_HEADER"]=" \"opal/mca/timer/altix/timer_altix.h\"" Which will build timer:linux but has improperly picked up the timer:altix HEADER! For the present, an explicit --with-timer=altix works-around the problem in either branch. However, the setting of the header variable by a NON-selected component is ERRONEOUS and should get fixed. In trunk, it may also make sense to raise the priority of timer:altix above that of timer:linux. -Paul On 2/15/2012 12:41 AM, Paul Hargrove wrote: I've configured the ompi trunk (nightly tarball 1.7a1r25927) on an SGI Altix. I used no special arguments indicating that this is an Altix, and there does not appear to be an altix-specific file in contrib/platform. My build fails as follows: make: Entering directory `/mnt/home/c_phargrov/OMPI/openmpi-trunk-linux-ia64/BLD/opal/tools/wrappers' CC opal_wrapper.o CCLD opal_wrapper ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_timer_altix_mmdev_timer_addr' ../../../opal/.libs/libopen-pal.so: undefined reference to `opal_timer_altix_freq' collect2: ld returned 1 exit status The configure-generated opal_config.h contains #define MCA_timer_IMPLEMENTATION_HEADER "opal/mca/timer/altix/timer_altix.h" Nothing appears to have been built in BUILDDIR/opal/mca/timer/altix. However, BUILDDIR/opal/mca/timer/linux has been built. -Paul -- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov> Future Technologies Group HPC Research Department Tel: +1-510-495-2352 <tel:%2B1-510-495-2352> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 <tel:%2B1-510-486-6900> -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] excessive warnings on some BSDs [w/ PATCH]
When building trunk or 1.5.x on OpenBSD-5.0 (and maybe others), I get *LOTS* of the following: /usr/include/arpa/inet.h:74: warning: 'struct in_addr' declared inside parameter list /usr/include/arpa/inet.h:74: warning: its scope is only this definition or declaration, which is probably not what you want /usr/include/arpa/inet.h:75: warning: 'struct in_addr' declared inside parameter list This is trivial to fix by including netinet/in.h before arpa/inet.h (see attached patch). The patch applies cleanly to both the trunk and the 1.5 branch (perhaps to hold back until 1.6) -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 --- openmpi-1.7a1r25944/opal/include/opal/types.h~ Fri Feb 17 12:01:41 2012 +++ openmpi-1.7a1r25944/opal/include/opal/types.h Fri Feb 17 11:58:46 2012 @@ -33,6 +33,9 @@ #ifdef HAVE_SYS_SELECT_H #include #endif +#ifdef HAVE_NETINET_IN_H +#include +#endif #ifdef HAVE_ARPA_INET_H #include #endif --- openmpi-1.7a1r25944/orte/mca/rml/oob/rml_oob_component.c~ Fri Feb 17 12:11:58 2012 +++ openmpi-1.7a1r25944/orte/mca/rml/oob/rml_oob_component.c Fri Feb 17 12:12:08 2012 @@ -23,6 +23,9 @@ #include "orte_config.h" #include "orte/constants.h" +#ifdef HAVE_NETINET_IN_H +#include +#endif #ifdef HAVE_ARPA_INET_H #include #endif --- openmpi-1.7a1r25944/ompi/mca/btl/tcp/btl_tcp_proc.c~ Fri Feb 17 12:11:06 2012 +++ openmpi-1.7a1r25944/ompi/mca/btl/tcp/btl_tcp_proc.c Fri Feb 17 12:11:21 2012 @@ -19,11 +19,11 @@ #include "ompi_config.h" -#ifdef HAVE_ARPA_INET_H -#include -#endif #ifdef HAVE_NETINET_IN_H #include +#endif +#ifdef HAVE_ARPA_INET_H +#include #endif #include "opal/class/opal_hash_table.h"
[OMPI devel] OPAL_ENABLE_FT_CR build broken in 1.5 branch
I've tried to build from both the 1.5 and trunk nightly tarballs configured with "--enable-ft=cr --with-blcr=" . I am using Intel compilers on Linux/x86. The trunk was fine, but on the 1.5 branch I see the build fail with: Making all in mca/btl/sm make[2]: Entering directory `/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1/BLD/ompi/mca/btl/sm' CC mca_btl_sm_la-btl_sm.lo /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1107): error: struct "mca_btl_sm_component_t" has no field "mmap_file" if( NULL != mca_btl_sm_component.mmap_file ) { ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1113): error: struct "mca_btl_sm_component_t" has no field "mmap_file" opal_crs_base_metadata_write_token(NULL, CRS_METADATA_TOUCH, mca_btl_sm_component.mmap_file->map_path); ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1121): error: struct "mca_btl_sm_component_t" has no field "mmap_file" if( NULL != mca_btl_sm_component.mmap_file ) { ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1125): error: struct "mca_btl_sm_component_t" has no field "mmap_file" opal_crs_base_cleanup_append(mca_btl_sm_component.mmap_file->map_path, false); ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1134): error: struct "mca_btl_sm_component_t" has no field "mmap_file" if( NULL != mca_btl_sm_component.mmap_file ) { ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c(1144): error: struct "mca_btl_sm_component_t" has no field "mmap_file" opal_crs_base_cleanup_append(mca_btl_sm_component.mmap_file->map_path, false); ^ compilation aborted for /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/btl/sm/btl_sm.c (code 2) Pushing past that error with "make -k" yields a similar problem in mpool/sm as well: Making all in mca/mpool/sm make[2]: Entering directory `/home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1/BLD/ompi/mca/mpool/sm' CC mpool_sm_module.lo /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(146): error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap" unlink(sm_module->sm_common_mmap->map_path); ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(183): error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap" if (NULL != self_sm_module->sm_common_mmap) { ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(184): error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap" opal_crs_base_cleanup_append(self_sm_module->sm_common_mmap->map_path, false); ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(198): error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap" if (NULL != self_sm_module->sm_common_mmap) { ^ /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c(199): error: struct "mca_mpool_sm_module_t" has no field "sm_common_mmap" opal_crs_base_cleanup_append(self_sm_module->sm_common_mmap->map_path, false); ^ compilation aborted for /home/pcp1/phargrov/OMPI/openmpi-1.5-latest-linux-x86-gm2-icc-8.1//openmpi-1.5-latest/ompi/mca/mpool/sm/mpool_sm_module.c (code 2) -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] btl/gm build broken on trunk
I just tried to build yesterday's ompi trunk tarball (1.7a1r25937) with the Intel compilers. Sorry if this was fixed in the past 23 hours or so. My system has GM-2.1.30 installed, and icc wasn't happy with btl_gm_component.c: /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gm2-icc-8.1//openmpi-trunk/ompi/mca/btl/gm/btl_gm_component.c(606): error #165: too few arguments in function call btl->error_cb(>super, MCA_BTL_ERROR_FLAGS_FATAL); ^ /home/pcp1/phargrov/OMPI/openmpi-trunk-linux-x86-gm2-icc-8.1//openmpi-trunk/ompi/mca/btl/gm/btl_gm_component.c(632): error #165: too few arguments in function call btl->error_cb(>super, MCA_BTL_ERROR_FLAGS_FATAL); ^ Usage of btl->error_cb() appears correct on the 1.5 branch (just a visual inspection). -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] -fvisibility probe broken [w/ 3-line PATCH]
After seeing some odd behaviors in a build of the trunk last night, I took a closer look at the configure probe for -fvisibility support and found that recent changes where applied incompletely/incorrectly. What is currently in opal/config/opal_check_visibility.m4: AC_LINK_IFELSE([AC_LANG_PROGRAM([[ __attribute__((visibility("default"))) int foo; ]],[[fprintf(stderr, "Hello, world\n");]])], [], [AS_IF([test -s conftest.err], [$GREP -iq visibility conftest.err # If we find "visibility" in the stderr, then # assume it doesn't work AS_IF([test "$?" = "0"], [opal_add=])]) ]) Here is a dissection of the args to AC_LINK_IFELSE: arg1: AC_LANG_PROGRAM arg2: action-on-success is EMPTY arg3: action-on-failure is an AS_IF Unfortunately that is incorrect in 3 ways: Error #1: The AS_IF belongs as arg2 (where there is an empty "[]" now). That is because the intent of that logic is to "double check" a successful link to check the stderr for "visibility". The idea there is that warnings like "unknown attribute visibility ignored" will be treated as a fail. That was the case in the "original" (r22138) version of the logic as well. However, it appears that this logic has been broken since r23923 when Jeff recoded AC_TRY_LINK to AC_LINK_IFELSE in Oct 2010. Those changes failed to account for the fact that LINK_IFELSE takes 1 argument for the PROGRAM where TRY_LINK has separate INCLUDE and BODY arguments. That lead to the unintended movement of the AS_IF[...grep...] logic from the on-success to the on-failure slots, and nothing has been the same since. Error #2: action-on-failure should be another instance of "[opal_add=]", do avoid using visibility if the link test failed. This had survived r23923 as a "extra" 4th argument to AC_LINK_IFELSE, and was later removed. This error leads to use of -fvisiblity on compilers that totally failed to compile or link the test! Error #3: Missing include of stdio.h leads some compilers to fail the test unnecessarily. Unlike the other 2 problems, this leads to REJECTING visibility even though it is working (except that error #2 currently hides this). These problems, which I previously detailed in off-list emails to Jeff, apparently got lost in the rush to get hwloc-1.3.2 out the door. Here is the relatively simple correction: --- ompi-trunk/opal/config/opal_check_visibility.m4 (revision 25941) +++ ompi-trunk/opal/config/opal_check_visibility.m4 (working copy) @@ -56,15 +56,15 @@ AC_MSG_CHECKING([if $CC supports $opal_add]) AC_LINK_IFELSE([AC_LANG_PROGRAM([[ +#include __attribute__((visibility("default"))) int foo; ]],[[fprintf(stderr, "Hello, world\n");]])], -[], [AS_IF([test -s conftest.err], [$GREP -iq visibility conftest.err # If we find "visibility" in the stderr, then # assume it doesn't work AS_IF([test "$?" = "0"], [opal_add=])]) -]) +], [opal_add=]) AS_IF([test "$opal_add" = ""], [AC_MSG_RESULT([no])], [AC_MSG_RESULT([yes])]) Just to confuse things, the instance of OPAL_CHECK_VISIBLITY in the libevent2013 configure is getting the result right "by accident". In that case the CFLAGS give more verbose warnings and the LINK fails (due to missing stdio.h), while yielding "visibility" in the warning message: conftest.c(87): remark #1418: external definition with no prior declaration __attribute__((visibility("default"))) int foo; Unfortunately, the incorrect logic made it into hwloc-1.3.2. So, I'd suggest fixing this in OMPI's embedded hwloc and hwloc's trunk also. My sincere apologies for not having caught this in the hwloc-1.3.2 testing where Jeff and I thought we had this issue fixed. I don't know for sure how I missed re-testing the final cut, but can only guess that I left --disable-visibility in my testing scripts. -Paul "thorough testing is a double-edged sword" Hargrove -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] More on OMPI on MacOS 10.4/ppc [WORK AROUND]
As I already discover (see http://www.open-mpi.org/community/lists/devel/2012/02/10444.php), MacOS 10.4 is NOT listed as a supported platform any longer. So, this message is really just for the archives. From "man ld" on a MacOS 10.4 system (x86 or ppc): MACOSX_DEPLOYMENT_TARGET This is set to indicate the oldest Mac OS X version that that the output is to be used on. When this is set to a release that is older than the current release features that are incompatible with that release will be disabled. If a feature is seen in the input that can't be in the output due to this setting a warning is issued. The current allowable values for this are 10.1, 10.2 10.3, and 10.4 with the default being 10.4 for the i386 archi- tecture and 10.1 for all other architectures. The last sentence of that seems like a good starting point for why the behaviors I see on ppc and x86 differ. So, before configuring OMPI (tarball 1.7a1r25937 or 1.5.5rc2r25933) I did $ export MACOSX_DEPLOYMENT_TARGET=10.4 And, everything worked! Both branches had the previously described errors w/o this env var, but now both work fine. So, anybody in need of OMPI on MacOS 10.4/ppc now has a work-around. -Paul -- Paul H. hargrovephhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")
Ah, I see that my problems on MacOS 10.4 are already a moot point, as my "option (c)" has already been implemented. From README in 1.5 branch: - OS X (10.5, 10.6), 32 and 64 bit (x86_64), with gcc and Absoft compilers (*) and from trunk: - OS X (10.5, 10.6, 10.7), 32 and 64 bit (x86_64), with gcc and Absoft compilers (*). -Paul On 2/15/2012 7:24 PM, Paul H. Hargrove wrote: As a point for discussion, I am going to offer a simple solution: c) Ignore this for 1.5.5 and raise the minimum MacOS version from 10.4 to 10.5 for ompi 1.6.x and 1.7.x Any strong opinions? -Paul On 2/15/2012 10:29 AM, Paul H. Hargrove wrote: I wanted to note that MacOS 10.4 on *X86* has the same "WEAK" definitions in opal_config.h. Yet it can build ompi-1.5.x with only WARNING about duplicate symbols. I just tried, and the test code Matthias sent worked too: $ ./bin/mpicc pmpi_test.c /usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize /var/tmp//cc7tP6zp.o definition of _MPI_Finalize in section (__TEXT,__text) /Users/phhargrove/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single module) definition of _MPI_Finalize $ ./a.out inside MPI_Finalize() wrapper So, I don't think (A) is an appropriate solution. I am also wondering if there is some compiler/linker flag we could/should pass to "fix" the PPC. Going back to the 10.4/PPC I see now that despite the warnings, a working executable WAS generated: $ ./a.out inside MPI_Finalize() wrapper So, I don't think we have managed to reproduce the source of the build problem. -Paul On 2/15/2012 9:25 AM, Matthias Jurenz wrote: Thanks for testing, Paul. I think we need an additional configure test which disables VT if a) weak symbol support is disabled/not available - or more precise - b) configuring on PPC/Mac10.4 and the used GNU compiler version is older or equal to 4.0.1 I prefer to option b) because VT (i.e. PMPI) should also work without weak symbol support (at least it does on my laptop with gcc 4.4.3 and '--disable- weak-symbols'). On the other hand, in the most cases the compiler supports weak symbols, so option a) would also work, unless weak symbol support is disabled by the configure option '--disable-weak-symbols'. Jeff, what's your opinion? Matthias On Wednesday 15 February 2012 10:33:30 Paul Hargrove wrote: See responses mixed in below. On Wed, Feb 15, 2012 at 1:02 AM, Matthias Jurenz< matthias.jur...@tu-dresden.de> wrote: Unfortunately, we don't have access to a PPC system with MacOS 10.4 to try to reproduce the error. Not too surprising. I'll see what I can do to help resolve the problem. Paul, could you please check for the definition of the macro OPAL_HAVE_WEAK_SYMBOLS in ompi_config.h? ompi_config.h doesn't contain that macro. However, opal_config.h shows no weak symbol support: #define HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS 0 #define OPAL_HAVE_ATTRIBUTE_WEAK_ALIAS 0 #define OPAL_HAVE_WEAK_SYMBOLS 0 I assume that the ancient GNU compiler on PPC/MacOS10.4 does not support weak-symbols which cause the multiply definitions. Does that mean I should simply not expect to get VT built there? Furthermore, could you please try to build the following code to test whether the PMPI interface of Open MPI works in general? #include #include int MPI_Finalize() { printf( "inside MPI_Finalize() wrapper\n" ); return PMPI_Finalize(); } int main(int argc, char** argv) { MPI_Init(,); MPI_Finalize(); } I am assuming I am supposed to build that with VT disabled in my OMPI build. Doing so, I see that PMPI is apparently not working: $ ./bin/mpicc pmpi_test.c /usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize /var/tmp//ccHZvZ3B.o definition of _MPI_Finalize in section (__TEXT,__text) /Users/phargrov/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single module) definition of _MPI_Finalize Maybe the error occurs only if this code is in a shared library which depends on the MPI library (as does the libvt-mpi). Therefor, run the following: $ gcc -fPIC -shared pmpi_test.c -I -o libpmpi_test.dylib -L -lmpi I assume this check might be redundant given that the previous one failed. However, here it is anyway: $ gcc -fPIC -shared pmpi_test.c -Iinclude -o libpmpi_test.dylib -Llib powerpc-apple-darwin8-gcc-4.0.1: unrecognized option '-shared' /usr/bin/ld: Undefined symbols: _MPI_Init _PMPI_Finalize collect2: ld returned 1 exit status -Paul Thanks! Matthias On 12/14/2011 2:51 PM, Paul H. Hargrove wrote: I've attempted to reproduce the failure reported below for MacOS 10.4 for PPC on an X86-64 system. First, I've realized that while I reported "make check" as the source of the problem, it occurs at "make". Regardless of that mistake in my reporting, I was unable to reproduce the problem, making this a PPC-specific problem as far as I can tell. Instead of 255 instances of "ld: multiple definitions of symbol
Re: [OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")
As a point for discussion, I am going to offer a simple solution: c) Ignore this for 1.5.5 and raise the minimum MacOS version from 10.4 to 10.5 for ompi 1.6.x and 1.7.x Any strong opinions? -Paul On 2/15/2012 10:29 AM, Paul H. Hargrove wrote: I wanted to note that MacOS 10.4 on *X86* has the same "WEAK" definitions in opal_config.h. Yet it can build ompi-1.5.x with only WARNING about duplicate symbols. I just tried, and the test code Matthias sent worked too: $ ./bin/mpicc pmpi_test.c /usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize /var/tmp//cc7tP6zp.o definition of _MPI_Finalize in section (__TEXT,__text) /Users/phhargrove/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single module) definition of _MPI_Finalize $ ./a.out inside MPI_Finalize() wrapper So, I don't think (A) is an appropriate solution. I am also wondering if there is some compiler/linker flag we could/should pass to "fix" the PPC. Going back to the 10.4/PPC I see now that despite the warnings, a working executable WAS generated: $ ./a.out inside MPI_Finalize() wrapper So, I don't think we have managed to reproduce the source of the build problem. -Paul On 2/15/2012 9:25 AM, Matthias Jurenz wrote: Thanks for testing, Paul. I think we need an additional configure test which disables VT if a) weak symbol support is disabled/not available - or more precise - b) configuring on PPC/Mac10.4 and the used GNU compiler version is older or equal to 4.0.1 I prefer to option b) because VT (i.e. PMPI) should also work without weak symbol support (at least it does on my laptop with gcc 4.4.3 and '--disable- weak-symbols'). On the other hand, in the most cases the compiler supports weak symbols, so option a) would also work, unless weak symbol support is disabled by the configure option '--disable-weak-symbols'. Jeff, what's your opinion? Matthias On Wednesday 15 February 2012 10:33:30 Paul Hargrove wrote: See responses mixed in below. On Wed, Feb 15, 2012 at 1:02 AM, Matthias Jurenz< matthias.jur...@tu-dresden.de> wrote: Unfortunately, we don't have access to a PPC system with MacOS 10.4 to try to reproduce the error. Not too surprising. I'll see what I can do to help resolve the problem. Paul, could you please check for the definition of the macro OPAL_HAVE_WEAK_SYMBOLS in ompi_config.h? ompi_config.h doesn't contain that macro. However, opal_config.h shows no weak symbol support: #define HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS 0 #define OPAL_HAVE_ATTRIBUTE_WEAK_ALIAS 0 #define OPAL_HAVE_WEAK_SYMBOLS 0 I assume that the ancient GNU compiler on PPC/MacOS10.4 does not support weak-symbols which cause the multiply definitions. Does that mean I should simply not expect to get VT built there? Furthermore, could you please try to build the following code to test whether the PMPI interface of Open MPI works in general? #include #include int MPI_Finalize() { printf( "inside MPI_Finalize() wrapper\n" ); return PMPI_Finalize(); } int main(int argc, char** argv) { MPI_Init(,); MPI_Finalize(); } I am assuming I am supposed to build that with VT disabled in my OMPI build. Doing so, I see that PMPI is apparently not working: $ ./bin/mpicc pmpi_test.c /usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize /var/tmp//ccHZvZ3B.o definition of _MPI_Finalize in section (__TEXT,__text) /Users/phargrov/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single module) definition of _MPI_Finalize Maybe the error occurs only if this code is in a shared library which depends on the MPI library (as does the libvt-mpi). Therefor, run the following: $ gcc -fPIC -shared pmpi_test.c -I -o libpmpi_test.dylib -L -lmpi I assume this check might be redundant given that the previous one failed. However, here it is anyway: $ gcc -fPIC -shared pmpi_test.c -Iinclude -o libpmpi_test.dylib -Llib powerpc-apple-darwin8-gcc-4.0.1: unrecognized option '-shared' /usr/bin/ld: Undefined symbols: _MPI_Init _PMPI_Finalize collect2: ld returned 1 exit status -Paul Thanks! Matthias On 12/14/2011 2:51 PM, Paul H. Hargrove wrote: I've attempted to reproduce the failure reported below for MacOS 10.4 for PPC on an X86-64 system. First, I've realized that while I reported "make check" as the source of the problem, it occurs at "make". Regardless of that mistake in my reporting, I was unable to reproduce the problem, making this a PPC-specific problem as far as I can tell. Instead of 255 instances of "ld: multiple definitions of symbol _MPI_*" I get instances of "ld: warning multiple definitions of symbol _MPI*", where the only difference is the addition of the word "warning". However, this is apparently non-fatal on the x86-64 but fatal by default on PPC. -Paul On 12/13/2011 9:30 PM, Paul H. Hargrove wrote: Using the 1.5.5rc1 tarball, I've repeated tests on the following platforms for which I recently reporte
Re: [OMPI devel] VT build failure with Clang++
Dmitri, Since I have not seen any error like this from gcc, pgi, pathcc, xlc, icc, open64 or suncc, I am pretty sure the problem is Clang-specific even if not a true "bug" in Clang. I just test everything I can get my hands on and report what I find. If there is not a simple fix for this then it is not a big deal YET. However, it is widely expected that Apple will move to a Clang-only (no gcc/g++) release of Xcode as soon as they are able. So, it *might* become a concern in the near future. So, how should we proceed on this? -Paul On 2/15/2012 8:38 AM, Dmitri Gribenko wrote: I don't know if it is a Clang bug, but here's my understanding of the problem. [...excellent description removed...] I'm not sure if this is a bug in Clang because I don't know if Clang should have tried to instantiate create(). Dmitri -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] libevent build fails when configured with --disable-hwloc
Thanks, Ralph. Your commit missed the nightly tarball, but the configure logic to exclude the rank_file component was in there. So, I dropped the new libevent2013_module.c into tonight's tarball (1.7a1r25937). My build configured --without-hwloc now PASSes "make all install check clean". And thanks for the nfs4 fix too, BTW: $ svn praise test/util/opal_path_nfs.c | grep nfs4 25939rhc 0 == strcasecmp (fs, "nfs4") || -Paul On 2/15/2012 6:52 PM, Ralph Castain wrote: Thanks Paul. I modified the patch a bit to silence some warnings, but added it to the trunk. On Wed, Feb 15, 2012 at 2:17 PM, Paul H. Hargrove <phhargr...@lbl.gov <mailto:phhargr...@lbl.gov>> wrote: The following 1-line change resolves the problem for me, and I see no potential down-side to it: --- openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c~ 2012-02-15 14:11:22.274734667 -0800 +++ openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c 2012-02-15 14:11:25.183478598 -0800 @@ -4,7 +4,7 @@ */ #include "opal_config.h" #include "opal/constants.h" -#include "config.h" +#include "libevent/config.h" #ifdef HAVE_SYS_TYPES_H #include -Paul On 2/15/2012 1:58 PM, Paul H. Hargrove wrote: Here is a bit more on this. When I configure w/ only a --prefix and CFLAGS=-save-temps, I can examine libevent2013_module.i which contains the following: # 7 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2 # 1 "../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen/config.h" 1 # 8 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2 What that says is that the '#include "config.h"' on line 7 of libevent2013_module.c has included hwloc's config.h, as I had claimed earlier (and this was much easier than manually traversing the -I list as I had done before). -- Paul H. Hargrove phhargr...@lbl.gov <mailto:phhargr...@lbl.gov> Future Technologies Group HPC Research Department Tel: +1-510-495-2352 <tel:%2B1-510-495-2352> Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 <tel:%2B1-510-486-6900> ___ devel mailing list de...@open-mpi.org <mailto:de...@open-mpi.org> http://www.open-mpi.org/mailman/listinfo.cgi/devel _______ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] opal_path_nfs test failure on NFS4 [w/ PATCH]
Testing a trunk tarball (1.7a1r25927) I am seeing an opal_path_nfs failure from "make check": Failure : Mismatch: input "/opt/cluster", expected:0 got:1 SUPPORT: OMPI Test failed: opal_path_nfs() (1 of 20 failed) FAIL: opal_path_nfs The "mount" command reports /opt/cluster as "nfs4" which appears to be distinct from "nfs" (which is reported for other mount points): X:/cluster on /opt/cluster type nfs4 (rw,intr,nolock,addr=,clientaddr=XXX) Notice that the failure was "expected:0 got:1". That means opal_path_nfs() is "smarter" than the test in this case. The 1-line addition below fixes this for me , and should apply cleanly to 1.5.x as well (or hold for 1.6 if desired). -Paul --- openmpi-1.7a1r25927/test/util/opal_path_nfs.c 2012-02-15 03:27:46.0 +0100 +++ openmpi-1.7a1r25927m/test/util/opal_path_nfs.c 2012-02-16 01:49:18.882418827 +0100 @@ -154,6 +154,7 @@ nfs_tmp[mount_known] = false; if (0 == strcasecmp (fs, "nfs") || +0 == strcasecmp (fs, "nfs4") || 0 == strcasecmp (fs, "lustre") || 0 == strcasecmp (fs, "panfs") || 0 == strcasecmp (fs, "gpfs")) -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] libevent build fails when configured with --disable-hwloc
The following 1-line change resolves the problem for me, and I see no potential down-side to it: --- openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c~ 2012-02-15 14:11:22.274734667 -0800 +++ openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent2013_module.c 2012-02-15 14:11:25.183478598 -0800 @@ -4,7 +4,7 @@ */ #include "opal_config.h" #include "opal/constants.h" -#include "config.h" +#include "libevent/config.h" #ifdef HAVE_SYS_TYPES_H #include -Paul On 2/15/2012 1:58 PM, Paul H. Hargrove wrote: Here is a bit more on this. When I configure w/ only a --prefix and CFLAGS=-save-temps, I can examine libevent2013_module.i which contains the following: # 7 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2 # 1 "../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen/config.h" 1 # 8 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2 What that says is that the '#include "config.h"' on line 7 of libevent2013_module.c has included hwloc's config.h, as I had claimed earlier (and this was much easier than manually traversing the -I list as I had done before). -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] libevent build fails when configured with --disable-hwloc
Here is a bit more on this. When I configure w/ only a --prefix and CFLAGS=-save-temps, I can examine libevent2013_module.i which contains the following: # 7 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2 # 1 "../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen/config.h" 1 # 8 "../../../../../opal/mca/event/libevent2013/libevent2013_module.c" 2 What that says is that the '#include "config.h"' on line 7 of libevent2013_module.c has included hwloc's config.h, as I had claimed earlier (and this was much easier than manually traversing the -I list as I had done before). This is a VPATH build from a trunk tarball (1.7a1r25927). Perhaps Ralph could not reproduce because of a difference between svn and tarball, such as autotools versions, or use of a non-VPATH build? For me there is a generated BLDDIR/opal/mca/event/libevent2013/libevent/config.h but that directory does NOT appear in the -I's, though the $(srcdir) version does. So, I suspect a non-VPATH build would work when configured with --without-hwloc Below is a reformatted version of the compile command from "make V=1". I've marked two things: 1 = the hwloc directory from whence config.h is being included 2 = two instances of $(srcdir)/libevent (key: 5*"../" = srcdir, 4*"../" = blddir) gcc -DHAVE_CONFIG_H -I. -I../../../../../opal/mca/event/libevent2013 -I../../../../opal/include -I../../../../orte/include -I../../../../ompi/include 1-> -I../../../../opal/mca/hwloc/hwloc132/hwloc/include/private/autogen -I../../../../opal/mca/hwloc/hwloc132/hwloc/include/hwloc/autogen 2-> -I../../../../../opal/mca/event/libevent2013/libevent -I../../../../../opal/mca/event/libevent2013/libevent/include -I./libevent/include -I../../../../../opal/mca/event/libevent2013/libevent/compat -I../../../../.. -I../../../.. -I../../../../../opal/include -I../../../../../orte/include -I../../../../../ompi/include -I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/hwloc/hwloc132/hwloc/include -I/home/phargrov/OMPI/openmpi-1.7a1r25927/BLD-with/opal/mca/hwloc/hwloc132/hwloc/include 2-> -I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent -I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/event/libevent2013/libevent/include -I/home/phargrov/OMPI/openmpi-1.7a1r25927/BLD-with/opal/mca/event/libevent2013/libevent/include -I/usr/include/infiniband -I/usr/include/infiniband -O3 -DNDEBUG -save-temps -finline-functions -fno-strict-aliasing -pthread -I/home/phargrov/OMPI/openmpi-1.7a1r25927/opal/mca/hwloc/hwloc132/hwloc/include -MT libevent2013_module.lo -MD -MP -MF .deps/libevent2013_module.Tpo -c ../../../../../opal/mca/event/libevent2013/libevent2013_module.c -fPIC -DPIC -o .libs/libevent2013_module.o -Paul On 2/15/2012 1:16 PM, Paul H. Hargrove wrote: Thanks, Ralph. I am a little deficient in the autotools department. So, I will probably only be able to retest after a new trunk tarball is generated tonight. In the meantime I might be able to get more info on the config.h problem. If I add -save-temps to CFLAGS I should be able to examine the .i file w/o and w/ --disable-hwloc. That will either prove or disprove what I've claimed is happening. -Paul On 2/15/2012 5:47 AM, Ralph Castain wrote: Built on Linux --without-hwloc as well, with the fix. On Wed, Feb 15, 2012 at 3:13 AM, Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: Hi Paul The rank_file component should not attempt to build if --without-hwloc is given - I've fixed that now. Thanks for reporting it. With that fix, I was able to build the trunk on Mac - testing Linux now. I haven't checked for the config.h confusion you report, though - just noting that it built. -- Paul H. hargrovephhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 ___ devel mailing list de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] trunk build failed when configured with --disable-hwloc
Thanks, Ralph. I am a little deficient in the autotools department. So, I will probably only be able to retest after a new trunk tarball is generated tonight. In the meantime I might be able to get more info on the config.h problem. If I add -save-temps to CFLAGS I should be able to examine the .i file w/o and w/ --disable-hwloc. That will either prove or disprove what I've claimed is happening. -Paul On 2/15/2012 5:47 AM, Ralph Castain wrote: Built on Linux --without-hwloc as well, with the fix. On Wed, Feb 15, 2012 at 3:13 AM, Ralph Castain <r...@open-mpi.org <mailto:r...@open-mpi.org>> wrote: Hi Paul The rank_file component should not attempt to build if --without-hwloc is given - I've fixed that now. Thanks for reporting it. With that fix, I was able to build the trunk on Mac - testing Linux now. I haven't checked for the config.h confusion you report, though - just noting that it built. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.5.5rc1 tested: MacOS/ppc (w/ 1 failure and a "CMR")
I wanted to note that MacOS 10.4 on *X86* has the same "WEAK" definitions in opal_config.h. Yet it can build ompi-1.5.x with only WARNING about duplicate symbols. I just tried, and the test code Matthias sent worked too: $ ./bin/mpicc pmpi_test.c /usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize /var/tmp//cc7tP6zp.o definition of _MPI_Finalize in section (__TEXT,__text) /Users/phhargrove/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single module) definition of _MPI_Finalize $ ./a.out inside MPI_Finalize() wrapper So, I don't think (A) is an appropriate solution. I am also wondering if there is some compiler/linker flag we could/should pass to "fix" the PPC. Going back to the 10.4/PPC I see now that despite the warnings, a working executable WAS generated: $ ./a.out inside MPI_Finalize() wrapper So, I don't think we have managed to reproduce the source of the build problem. -Paul On 2/15/2012 9:25 AM, Matthias Jurenz wrote: Thanks for testing, Paul. I think we need an additional configure test which disables VT if a) weak symbol support is disabled/not available - or more precise - b) configuring on PPC/Mac10.4 and the used GNU compiler version is older or equal to 4.0.1 I prefer to option b) because VT (i.e. PMPI) should also work without weak symbol support (at least it does on my laptop with gcc 4.4.3 and '--disable- weak-symbols'). On the other hand, in the most cases the compiler supports weak symbols, so option a) would also work, unless weak symbol support is disabled by the configure option '--disable-weak-symbols'. Jeff, what's your opinion? Matthias On Wednesday 15 February 2012 10:33:30 Paul Hargrove wrote: See responses mixed in below. On Wed, Feb 15, 2012 at 1:02 AM, Matthias Jurenz< matthias.jur...@tu-dresden.de> wrote: Unfortunately, we don't have access to a PPC system with MacOS 10.4 to try to reproduce the error. Not too surprising. I'll see what I can do to help resolve the problem. Paul, could you please check for the definition of the macro OPAL_HAVE_WEAK_SYMBOLS in ompi_config.h? ompi_config.h doesn't contain that macro. However, opal_config.h shows no weak symbol support: #define HWLOC_HAVE_ATTRIBUTE_WEAK_ALIAS 0 #define OPAL_HAVE_ATTRIBUTE_WEAK_ALIAS 0 #define OPAL_HAVE_WEAK_SYMBOLS 0 I assume that the ancient GNU compiler on PPC/MacOS10.4 does not support weak-symbols which cause the multiply definitions. Does that mean I should simply not expect to get VT built there? Furthermore, could you please try to build the following code to test whether the PMPI interface of Open MPI works in general? #include #include int MPI_Finalize() { printf( "inside MPI_Finalize() wrapper\n" ); return PMPI_Finalize(); } int main(int argc, char** argv) { MPI_Init(,); MPI_Finalize(); } I am assuming I am supposed to build that with VT disabled in my OMPI build. Doing so, I see that PMPI is apparently not working: $ ./bin/mpicc pmpi_test.c /usr/bin/ld: warning multiple definitions of symbol _MPI_Finalize /var/tmp//ccHZvZ3B.o definition of _MPI_Finalize in section (__TEXT,__text) /Users/phargrov/OMPI/openmpi-1.5.5rc1/INST/lib/libmpi.dylib(single module) definition of _MPI_Finalize Maybe the error occurs only if this code is in a shared library which depends on the MPI library (as does the libvt-mpi). Therefor, run the following: $ gcc -fPIC -shared pmpi_test.c -I -o libpmpi_test.dylib -L -lmpi I assume this check might be redundant given that the previous one failed. However, here it is anyway: $ gcc -fPIC -shared pmpi_test.c -Iinclude -o libpmpi_test.dylib -Llib powerpc-apple-darwin8-gcc-4.0.1: unrecognized option '-shared' /usr/bin/ld: Undefined symbols: _MPI_Init _PMPI_Finalize collect2: ld returned 1 exit status -Paul Thanks! Matthias On 12/14/2011 2:51 PM, Paul H. Hargrove wrote: I've attempted to reproduce the failure reported below for MacOS 10.4 for PPC on an X86-64 system. First, I've realized that while I reported "make check" as the source of the problem, it occurs at "make". Regardless of that mistake in my reporting, I was unable to reproduce the problem, making this a PPC-specific problem as far as I can tell. Instead of 255 instances of "ld: multiple definitions of symbol _MPI_*" I get instances of "ld: warning multiple definitions of symbol _MPI*", where the only difference is the addition of the word "warning". However, this is apparently non-fatal on the x86-64 but fatal by default on PPC. -Paul On 12/13/2011 9:30 PM, Paul H. Hargrove wrote: Using the 1.5.5rc1 tarball, I've repeated tests on the following platforms for which I recently reported 1.4.5rc1 results: MacOS 10.5 (Leopard) on PPC: powerpc-apple-darwin9-gcc-4.0.1 (GCC) 4.0.1 (Apple Inc. build 5488) MacOS 10.4 (Tiger) on PPC: powerpc-apple-darwin8-gcc-4.0.1 (GCC) 4.0.1 (Apple Computer, Inc. build 5341) MacOS 10.3 (Panther) on PPC: gcc (GCC) 3.3 20030304
[OMPI devel] More MIPS asm patches
On 2/14/2012 10:13 PM, Paul H. Hargrove wrote: On the linux/mips64el platform I also tried the PathScale 3.3a compilers on both branches. On both branches the atomic_*_noinline tests all PASS, which validates these patches. On trunk all the tests in test/asm are PASSing. However, the versions NOT suffixed with _noinline are FAILing on the 1.5 branch. Since those failures DO NOT use the files touched by these patches, they are irrelevant. Oops - I was looking at the wrong output when I stated pathcc/trunk was PASSing all tests. The *inline* atomics tests SIGBUS w/ the pathcc compilers on BOTH branches. I know from previous encounters with pathcc on MIPS that the problem is due to the explict use of "$1" (aka "AT", the "Assembler Temporary" register). Unlike gcc, pathcc schedules this as a normal register. Indeed the attached patch (which should apply cleanly to both branches) resolves the problem simply by conditionally adding "at" to the clobbers for the inline asm. This is independent of the patches in my previous posting. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 --- openmpi-1.7a1r25913/opal/include/opal/sys/mips/atomic.h 2012-02-13 20:00:06.0 -0600 +++ openmpi-1.7a1r25913m/opal/include/opal/sys/mips/atomic.h2012-02-15 00:23:44.648085811 -0600 @@ -119,7 +119,11 @@ ".set reorder \n" : "="(ret), "=m"(*addr) : "m"(*addr), "r"(oldval), "r"(newval) - : "cc", "memory"); + : "cc", "memory" +#ifdef __PATHCC__ + , "at" +#endif +); return (ret == oldval); } @@ -168,7 +172,11 @@ ".set reorder \n" : "=" (ret), "=m" (*addr) : "m" (*addr), "r" (oldval), "r" (newval) - : "cc", "memory"); + : "cc", "memory" +#ifdef __PATHCC__ + , "at" +#endif +); return (ret == oldval); }
[OMPI devel] Fixes for MIPS assembly [w/ PATCHES]
The attached patches fix three problems with the non-inline ASM for MIPS (and MIPS64EL): 1) ".set rerorder" was placed too early. This was causing loss of the SLTU instruction in the jump delay slot which follows the return instruction. Since that SLTU is used to set the return value, this was fatal to most tests in test/asm. 2) The 64-bit cmpset code was performing the XOR (to compare the read value to 'oldval') using 'addr' as the destination register. Since XOR is in the delay slot of the retry branch instruction (except in the acq variant) any retry would load from an invalid 'addr' (SEGV). 3) The 64-bit cmpset code was using the wrong destination register for the SLTU and thus not setting the return value (even after the ".set reorder" was placed correctly). There is one patch each for the 1.5 branch and trunk. Both have been testing with on: linux/mips32 w/ -march=4kc in the *FLAGS (gcc-4.4.5) linux/mips64 w/ -mabi=n32 in the *FLAGS (gcc-4.3.2) linux/mips64 w/ -mabi=64 in the *FLAGS (gcc-4.3.2) linux/mips64el (gcc-4.2.3) Of those 8 builds, the mips32/ompi-1.5 build is the only one that fails. That is because, unlike trunk, it tries to build the 64-bit atomics which the assembler then rejects. I have not attempted to backport the fix(es) for that from trunk to 1.5. On the linux/mips64el platform I also tried the PathScale 3.3a compilers on both branches. On both branches the atomic_*_noinline tests all PASS, which validates these patches. On trunk all the tests in test/asm are PASSing. However, the versions NOT suffixed with _noinline are FAILing on the 1.5 branch. Since those failures DO NOT use the files touched by these patches, they are irrelevant. If/when these patches have been committed, I may consider returning to the 1.5 branch to backport/CMR + support for MIPS32 (should not be trying to build the 64-bit atomics) + fix for the inline atomics (the FAILures on the inline tests) w/ pathcc -Paul -- Paul H. hargrovephhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 diff -ur openmpi-1.5.5rc2r25906/opal/asm/base/MIPS.asm openmpi-1.5.5rc2r25906m/opal/asm/base/MIPS.asm --- openmpi-1.5.5rc2r25906/opal/asm/base/MIPS.asm 2012-02-10 21:16:29.0 -0600 +++ openmpi-1.5.5rc2r25906m/opal/asm/base/MIPS.asm 2012-02-14 16:16:26.948085714 -0600 @@ -34,11 +34,10 @@ sc $2, 0($4) beqz $2, retry1 done1: - .set reorder - xor $3,$3,$5 j ra sltu$2,$3,1 + .set reorder END(opal_atomic_cmpset_32) @@ -52,11 +51,10 @@ beqz $2, retry2 done2: sync - .set reorder - xor $3,$3,$5 j ra sltu$2,$3,1 + .set reorder END(opal_atomic_cmpset_acq_32) @@ -70,16 +68,15 @@ sc $2, 0($4) beqz $2, retry3 done3: - .set reorder - xor $3,$3,$5 j ra sltu$2,$3,1 + .set reorder END(opal_atomic_cmpset_rel_32) LEAF(opal_atomic_cmpset_64) - .set noreorder + .set noreorder retry4: lld$3, 0($4) bne$3, $5, done4 @@ -87,11 +84,10 @@ scd$2, 0($4) beqz $2, retry4 done4: - .set reorder - - xor $4,$3,$5 + xor $3,$3,$5 j ra - sltu$3,$4,1 + sltu$2,$3,1 + .set reorder END(opal_atomic_cmpset_64) @@ -104,11 +100,11 @@ scd$2, 0($4) beqz $2, retry5 done5: - .set reorder sync - xor $4,$3,$5 + xor $3,$3,$5 j ra - sltu$3,$4,1 + sltu$2,$3,1 + .set reorder END(opal_atomic_cmpset_acq_64) @@ -122,9 +118,8 @@ scd$2, 0($4) beqz $2, retry6 done6: - .set reorder - - xor $4,$3,$5 + xor $3,$3,$5 j ra - sltu$3,$4,1 + sltu$2,$3,1 + .set reorder END(opal_atomic_cmpset_rel_64) diff -ur openmpi-1.5.5rc2r25906/opal/asm/generated/atomic-mips-irix.s openmpi-1.5.5rc2r25906m/opal/asm/generated/atomic-mips-irix.s --- openmpi-1.5.5rc2r25906/opal/asm/generated/atomic-mips-irix.s 2012-02-10 21:25:44.0 -0600 +++ openmpi-1.5.5rc2r25906m/opal/asm/generated/atomic-mips-irix.s 2012-02-14 16:29:55.140085838 -0600 @@ -33,11 +33,10 @@ sc $2, 0($4) beqz $2, retry1 done1: - .set reorder - xor $3,$3,$5 j ra sltu$2,$3,1 +
Re: [OMPI devel] trunk build failed when configured with --disable-hwloc
On 2/14/2012 5:10 PM, Paul H. Hargrove wrote: I have configured the ompi-trunk (from last night's tarball: 1.7a1r25913) with --without-hwloc. Having done so, I see the following failure at build time: CC rmaps_rank_file_component.lo /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c: In function 'orte_rmaps_rank_file_open': /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c:111: error: 'opal_hwloc_binding_policy' undeclared (first use in this function) /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c:111: error: (Each undeclared identifier is reported only once /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c:111: error: for each function it appears in.) /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_component.c:111: error: 'OPAL_BIND_TO_CPUSET' undeclared (first use in this function) Looks like this code is not "aware" that hwloc has been configured out. This is not present in the 1.5 branch configured with identical arguments. -Paul The following appears to "fix" that, but I am uncertain if this is the desired fix. --- orte/mca/rmaps/rank_file/rmaps_rank_file_component.c~ 2012-02-14 17:25:07.653483222 -0800 +++ orte/mca/rmaps/rank_file/rmaps_rank_file_component.c 2012-02-14 17:25:28.803483261 -0800 @@ -107,8 +107,10 @@ } ORTE_SET_MAPPING_POLICY(orte_rmaps_base.mapping, ORTE_MAPPING_BYUSER); ORTE_SET_MAPPING_DIRECTIVE(orte_rmaps_base.mapping, ORTE_MAPPING_GIVEN); +#if OPAL_HAVE_HWLOC /* we are going to bind to cpuset since the user is specifying the cpus */ OPAL_SET_BINDING_POLICY(opal_hwloc_binding_policy, OPAL_BIND_TO_CPUSET); +#endif /* make us first */ my_priority = 1; } HOWEVER, I am now also seeing the following occurring ONLY when configured with --disable-hwloc: make[1]: Entering directory `/home/phargrov/openmpi-1.7a1r25913/BLD2/opal/mca/event/libevent2013' CC libevent2013_module.lo ../../../../../opal/mca/event/libevent2013/libevent2013_module.c:7:20: error: config.h: No such file or directory ../../../../../opal/mca/event/libevent2013/libevent2013_module.c: In function 'opal_event_init': ../../../../../opal/mca/event/libevent2013/libevent2013_module.c:243: warning: ignoring return value of 'asprintf', declared with attribute warn_unused_result make[1]: *** [libevent2013_module.lo] Error 1 It seems VERY odd to me that disabling hwloc should have that effect. Looking deeper it appears that '#include "config.h"' in libevent2013_module.c has been including the config.h from HWLOC, instead of the one from libevent2013. If one examines the -I options carefully, you will see that $(builddr)/libevent is NOT in the include path, but that is the location of the config.h generated by libevent's configure script! -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] trunk build failed when configured with --disable-hwloc
I have configured the ompi-trunk (from last night's tarball: 1.7a1r25913) with --without-hwloc. Having done so, I see the following failure at build time: CC rmaps_rank_file_component.lo /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c: In function 'orte_rmaps_rank_file_open': /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c:111: error: 'opal_hwloc_binding_policy' undeclared (first use in this function) /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c:111: error: (Each undeclared identifier is reported only once /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_compo nent.c:111: error: for each function it appears in.) /home/hargrove/OMPI/openmpi-trunk-linux-mips64el//openmpi-trunk/orte/mca/rmaps/rank_file/rmaps_rank_file_component.c:111: error: 'OPAL_BIND_TO_CPUSET' undeclared (first use in this function) Looks like this code is not "aware" that hwloc has been configured out. This is not present in the 1.5 branch configured with identical arguments. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] the dangers of configure probing argument counts
There was recently a fair amount of work done in hwloc to get configure to work correctly for a probe that was intended to determine how many arguments appear in a specific function prototype. The "issue" was that the C spec doesn't require that the C compiler issue an error for either too-many or too-few arguments. While gcc and most other compilers make both cases an error, there are two compilers of non-trivial importance which do NOT: + By default the IBM (xlc) C compiler warns for the case of too many argument. + By default the Intel (icc) C compiler warns for the case of too few arguments. This renders configure-time tests that want to check argument counts unreliable unless one takes special care to add something "special" to CFLAGS. While hacking on hwloc we determined that is was NOT safe for configure to add to CFLAGS in general, nor to ask the user to do so. It was only safe to /temporarily/ add to CFLAGS for the duration of the argument count probe. So, WHY am I tell you all this? Because of the following in openmpi-1.7a1r25865/ompi/config/ompi_check_openib.m4: [AC_CACHE_CHECK( [number of arguments to ibv_create_cq], which performs exactly the sort of test I am warning against. So, I would encourage somebody to make the effort to reuse the configure logic Jeff and I developed for hwloc. In particular look for setting and use of HWLOC_STRICT_ARGS_CFLAGS in config/hwloc.m4 -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] hwloc 1.3.2rc2 released
On 2/13/2012 1:30 PM, Jeff Squyres wrote: Due to the volume of off-list emails, I'm kinda expecting this rc to be good / final. However, please do at least some cursory testing so that we can be sure. I disregarded the "cursory" and ran on 61 arch/os/compiler combinations. I can see only 2 problems at this point: + known libnuma issues on a "wierd" virtual node - NOT expected to fix in 1.3.x + "make check" failure w/ icc-8.0 on x86/Linux - BUT icc-9.0 and gcc are both fine on the same node (so probably a compiler bug). So, I agree this looks "final" to me. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
[OMPI devel] 1.5.5rc2r25906 test results
On 2/10/2012 6:04 PM, Jeff Squyres wrote: On Feb 10, 2012, at 8:57 PM, Jeff Squyres wrote: 1.5.5rc2 coming soon. I should qualify that statement: many things have been resolved, but there's a few more things to go. A new rc will come when they have been resolved: https://svn.open-mpi.org/trac/ompi/report/15 I just tried tonight's nightly tarball for the 1.5 branch (1.5.5rc2r25906). I found the following issues, which I wad previously reported against 1.5.5rc1, for which I did NOT find a corresponding ticket in "report/15". My apologies is I've missed a ticket, or if any of these were deferred to 1.6.x (as was Lion+PGI, for instance). + GNU Make required for "make clean" due to use of non-standard $(RM) Reported in http://www.open-mpi.org/community/lists/devel/2011/12/10184.php + MacOS 10.4 on ppc fails linking libvt-mpi.la (multiply defined symbols) Reported in http://www.open-mpi.org/community/lists/devel/2011/12/10090.php My MacOS 10.4/x86 machine is down, but I don't believe it had this problem w/ rc1. + ROMIO uses explicit MAKE=make, causing problems if one builds ompi w/ gmake Reported in http://www.open-mpi.org/community/lists/devel/2012/01/10300.php -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] v1.4.5rc6 released
No new problems to report w/ 60+ platforms tested. I was unable to retest MacOS 10.4 because both my x86 and ppc hosts are down. Relative to the last time I reported my list of platforms, I have add FreeBSD9 on i386 and amd86. There have also been some additional compilers added on Linux x86 and/or Linux x86-64. I believe that all my odd platform/compiler issues have been addressed in README. Several platforms were documented as not supported, and some of these configure is now smart enough to reject. Others that required work-arounds have been documented as well. This looks like ready to go from my point of view (wide portability). If there are other things I might help test to speed the release, let me know. -Paul On 2/10/2012 6:11 PM, Jeff Squyres wrote: Usual location: http://www.open-mpi.org/software/ompi/v1.4/ Changes since rc5: - document LD_LIBRARY_PATH for -m32 with Ubuntu/Sun compilers - refuse to configure with gccffs - LANL TLCC2 platform files -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?
That's probably a reflection of the status of the "Open MPI User Documentation" sub-project :-) On 2/10/2012 5:12 PM, Jeff Squyres wrote: FWIW: google analytics indicates that the FAQ and the mailing list archives are among the most heavily used sections of the web site. :-) -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?
Much better - at least to the extent that users actually read FAQs :-) -Paul On 2/10/2012 5:01 PM, Jeff Squyres (jsquyres) wrote: Check out #220 now; I updated it. -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?
On 2/10/2012 12:21 PM, Jeff Squyres wrote: On Feb 10, 2012, at 3:14 PM, Paul H. Hargrove wrote: + User knows nothing about xen, and thus nothing about virbr0 + User has a local-only interface (eth8 in my made up example) + User reads FAQ entry "220. How do I tell Open MPI which TCP networks to use?" + User follows instructions given in said FAQ, yielding my example command line. Do you mean that eth8 is the only non-loopback interface on their laptop, and it's disconnected? (e.g., sitting on a train with no wifi and no wired ethernet) Then OMPI would have disqualified that interface, anyway (because it wasn't up). I think I'm missing the zen of your question... :-\ The point of the question isn't related to WHY eth8 is useless - just assume it is. Assume it is UP, but useless for whatever reasons motivated writing FAQ #220. It could be Terry's example of a port connected to the service processor. The concern is what happens in this situation when the user, following the advice in the FAQ, passes an explicit setting for btl_tcp_if_exclude, which DOES NOT include virbr0? They don't know it was there before, or that it needs to be there (the FAQ states that lo MUST be included). So, by following the FAQ they don't resolve their problem. OMPI ceases any attempts use of eth8 (or whatever), but loss of the implicit virbr0 from the exclude list results in their system attempting to use virbr0 (and thus continue to fail). Right? Maybe the FAQ just needs an update to address my concern. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?
On 2/10/2012 12:03 PM, Jeff Squyres wrote: On Feb 10, 2012, at 1:44 PM, Paul H. Hargrove wrote: Since the situation described is one where the user didn't know they could/should disable xen, it is reasonable to think they ALSO don't know they need to exclude virbr0. That's what I'm thinking. So, I read the question as meaning the following: What happens when a user who doesn't know anything about virbr0 does mpirun --mca btl_tcp_if_exclude lo,eth8 I'm not sure I understand your question -- the above will exclude loopback and eth8. (where did eth8 come from?) Sorry, if I wasn't clear. I'll try again: + User knows nothing about xen, and thus nothing about virbr0 + User has a local-only interface (eth8 in my made up example) + User reads FAQ entry "220. How do I tell Open MPI which TCP networks to use?" + User follows instructions given in said FAQ, yielding my example command line. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 failures w/ icc on x86 (visibility?)
On 2/10/2012 11:19 AM, Jeff Squyres wrote: I'll go compare. I already did... HWLOC (1.3.2rc1) tries: AC_LINK_IFELSE([AC_LANG_PROGRAM([[ __attribute__((visibility("default"))) int foo; ]],[[int i;]])], [], [hwloc_add=]) While OMPI (1.4.5rc5) tries: AC_TRY_LINK([ #include __attribute__((visibility("default"))) int foo; void bar(void) { fprintf(stderr, "bar\n"); }; ],[], [if test -s conftest.err ; then $GREP -iq "visibility" conftest.err if test "$?" = "0" ; then ompi_cv_cc_fvisibility="no" else ompi_cv_cc_fvisibility="yes" fi else ompi_cv_cc_fvisibility="yes" fi], [ompi_cv_cc_fvisibility="no"]) ]) -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/10/2012 11:00 AM, Jeff Squyres wrote: Here's the final logic -- is it what you intended? Yes, that works for me. I pasted you version into config/hwloc.m4 on 1.3.2rc1 and faked the $hwloc_c_vendor setting. The results were the same as with my version. (Yes, I did autoreconf to make sure I was testing the right version.) -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 failures w/ icc on x86 (visibility?)
On 2/10/2012 11:04 AM, Jeff Squyres wrote: It's kinda weird that icc supported the visibility stuff but gcc did not... See my post that crossed yours in flight. The configure logic in ompi thinks icc does NOT support visibility on this platform. I think ompi is a touch smarter than hwloc in this respect. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 failures w/ icc on x86 (visibility?)
On 2/10/2012 9:27 AM, Paul H. Hargrove wrote: I have versions 8.1.032, 9.0.024 and 9.1.042 of the Intel compilers on a Linux/x86 (32-bit) host. All three can configure and build hwloc-1.3.2rc1, but all are failing "make check" in the same way. What I see is ton(ne)s of linker messages and every executable SEGVs. The linker messages look like: CC hwloc_synthetic.o CCLD hwloc_synthetic ld: hwloc_synthetic.o(.text+0x1c): unresolvable relocation against symbol `hwloc_topology_init' ld: hwloc_synthetic.o(.text+0x2a): unresolvable relocation against symbol `hwloc_topology_set_synthetic' ld: hwloc_synthetic.o(.text+0x33): unresolvable relocation against symbol `hwloc_topology_load' ld: hwloc_synthetic.o(.text+0x3c): unresolvable relocation against symbol `hwloc_topology_check' ld: hwloc_synthetic.o(.text+0x46): unresolvable relocation against symbol `hwloc_topology_get_depth' ld: hwloc_synthetic.o(.text+0x64): unresolvable relocation against symbol `hwloc_get_nbobjs_by_depth' ld: hwloc_synthetic.o(.text+0x8a): unresolvable relocation against symbol `hwloc_get_obj_by_depth' ld: hwloc_synthetic.o(.text+0xc6): unresolvable relocation against symbol `hwloc_topology_destroy' Where most tests have far more of these. For the moment, I am going to assume the SEGVs are a result of the linker problems. As compared to gcc on the same system, the only difference in include/private/autogen/config.h is: /* Whether C compiler supports symbol visibility or not */ -#define HWLOC_C_HAVE_VISIBILITY 1 +#define HWLOC_C_HAVE_VISIBILITY 0 Where the '1' is the build with the Intel compiler. So, my current suspicion falls on the visibility crud. I can confirm that "HWLOC_CFLAGS = -fvisibility=hidden" in Makefile. Other then that, I don't know where to begin looking at this problem. -Paul For comparison, tried building the OMPI 1.4.5rc5 with these Intel compilers. icc-9.1.042: caused assertion failure in ld - let not consider this one icc-9.0.024: PASSed "make all install check clean" icc-8.1.032: PASSed "make all install check clean" So, I believe that the two PASS results shows that the correct visibility logic is "known" in ompi. The key difference appears to be that ompi has decided NOT to use -fvisibility with these compilers: == Symbol Visibility Feature checking if icc supports -fvisibility... no checking enable symbol visibility... no And from the ompi-1.4.5rc5 config.log: configure:164594: checking if icc supports -fvisibility configure:164624: icc -o conftest -O3 -DNDEBUG -finline-functions -fno-strict-aliasing -restrict -pthread -fvisibility=hidden conftest.c -lnsl -lutil >&5 /tmp/iccFBKDBg.o: In function `bar': conftest.c:(.text+0x26): undefined reference to `fputs' ld: conftest: hidden symbol `fputs' isn't defined ld: final link failed: Nonrepresentable section on output configure:164631: $? = 1 As compared to hwloc-1.3.2rc1: configure:8253: checking if icc supports -fvisibility configure:8268: icc -o conftest -fvisibility=hidden -Werror conftest.c >&5 configure:8268: $? = 0 configure:8279: result: yes So, my educated guess is that one needs to (back)port the configure logic for visibility support. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] RFC: Add "virbr0" to [btl|oob]_tcp_if_exclude?
On 2/10/2012 10:38 AM, Jeff Squyres wrote: On Feb 10, 2012, at 1:00 PM, TERRY DONTJE wrote: >> Should we add "virbr0" to the default value for [btl|oob]_tcp_if_exclude? >> > What happens to that value if you then set btl_tcp_if_exclude to some value on the mpirun command line? It works just fine. I.e., if you mpirun --mca btl_tcp_if_exclude lo,virbr0 ... That works like a champ. Since the situation described is one where the user didn't know they could/should disable xen, it is reasonable to think they ALSO don't know they need to exclude virbr0. So, I read the question as meaning the following: What happens when a user who doesn't know anything about virbr0 does mpirun --mca btl_tcp_if_exclude lo,eth8 And my guess is "nothing good happens". -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/9/2012 2:26 PM, Paul H. Hargrove wrote: We then test if *either* set the variable. Sort of a double-negative. One of De Morgan's Laws: NOT (A AND B) = (NOT A) OR (NOT B) Applied to give: NOT (TEST1_FAIL AND TEST2_FAIL) = (NOT TEST1_FAIL) OR (NOT TEST2_FAIL) = TEST1_PASS OR TEST2_PASS -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] btl/openib: get_ib_dev_distance doesn't see processes as bound if the job has been launched by srun
On 2/9/2012 1:19 PM, Brice Goglin wrote: So you can find out that you are "bound" by a Linux cgroup (I am not saying Linux "cpuset" to avoid confusion) by comparing root->cpuset and root->online_cpuset. If I understood the problem as stated earlier in this thread the current code was looping over a (singleton) cpuset and not finding finding the current process to be bound to any of the cpus in the set. For that case the fact that the cpuset is a singleton should already have been enough information to know that one is effectively bound. Is there really more to this than a need for special-casing the singleton? -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
Jeff, What you have for the "Make sure..." is wrong in the same way as the one that was in rc1. The problem is that the AC_COMPILE_IFELSE code tests too-few and too-many args together. Since xlc makes too many an error by default, we don't notice its MISbehavior when given too few. So, one needs to split the too-many and too-few tests as I did in the patch I sent. I don't think we should drop that AC_COMPILE_IFELSE entirely (or rather we shouldn't drop the TWO once split). If we were to encounter another Linux compiler that didn't STOP on too-few arguments the binding code would get silently broken again. I was also partial to the "structure" of my patch which needed to test $hwloc_c_vendor only once. This would allow adding compiler-specific logic in exactly one place if other cases arise. I *do* like the way you've run the AC_COMPILE_IFELSE test AFTER adding the compiler-specific flag (thus confirming that it actually resolved the problem). However, as noted above you will need to split the too-few and too-many arg tests for that to be effective. And regarding the "older, buggy" comment: This is a recent XLC compiler, and this behavior is NOT a bug because the C spec doesn't require a fatal error here. That is why I commented (with delimiters) on the evils of configure probes that try to determine how many arguments appear in a prototype. -Paul On 2/9/2012 5:08 AM, Jeff Squyres wrote: How's this patch (against v1.3, assuming https://svn.open-mpi.org/trac/hwloc/changeset/4285)? Is the test that checks to see if compilers error when the wrong number of params are passed now mooot? Index: config/hwloc.m4 === --- config/hwloc.m4 (revision 4285) +++ config/hwloc.m4 (working copy) @@ -268,22 +268,24 @@ AS_IF([test "$HWLOC_VISIBILITY_CFLAGS" != ""], [AC_MSG_WARN(["$HWLOC_VISIBILITY_CFLAGS" has been added to the hwloc CFLAGS])]) -# make sure the compiler returns an error code when function arg count is wrong, -# otherwise sched_setaffinity checks may fail +# Make sure the compiler returns an error code when function arg +# count is wrong, otherwise sched_setaffinity checks may fail. +# For older, buggy versions of the xlc compilers, we need to set +# an additional compiler flag to catch these situations. +AS_IF([test "$hwloc_c_vendor" = "ibm"], + [HWLOC_CFLAGS_save=$CFLAGS + CFLAGS="$CFLAGS -qhalt=e"]) AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ extern int one_arg(int x); extern int two_arg(int x, int y); int foo(void) { return one_arg(1, 2) + two_arg(3); } ]])], [ AC_MSG_WARN([Your C compiler does not consider incorrect argument counts to be a fatal error.]) -if test "$hwloc_check_compiler_vendor_result" = "ibm"; then -AC_MSG_WARN([For XLC you may try appending '-qhalt=-e' to the value of CFLAGS.]) -AC_MSG_WARN([Alternatively you may configure with a different compiler.]) -else -AC_MSG_WARN([Please report this failure, and configure using a different C compiler if possible.]) -fi AC_MSG_ERROR([Cannot continue.]) ]) +# Restore the CFLAGS if we modified them above +AS_IF([test "$hwloc_c_vendor" = "ibm"], + [CFLAGS=HWLOC_CFLAGS]) # # Now detect support @@ -387,6 +389,12 @@ AC_DEFINE_UNQUOTED(hwloc_thread_t, $hwloc_thread_t, [Define this to the thread ID type]) fi +# For older, buggy versions of the xlc compilers, we need to set +# an additional compiler flag to catch cases where the wrong +# number of parameters are passed. +AS_IF([test "$hwloc_c_vendor" = "ibm"], + [HWLOC_CFLAGS_save=$CFLAGS + CFLAGS="$CFLAGS -qhalt=e"]) _HWLOC_CHECK_DECL([sched_setaffinity], [ AC_DEFINE([HWLOC_HAVE_SCHED_SETAFFINITY], [1], [Define to 1 if glibc provides a prototype of sched_setaffinity()]) AC_MSG_CHECKING([for old prototype of sched_setaffinity]) @@ -403,6 +411,9 @@ #define _GNU_SOURCE #include ]]) +# Restore the CFLAGS if we modified them above +AS_IF([test "$hwloc_c_vendor" = "ibm"], + [CFLAGS=HWLOC_CFLAGS]) AC_MSG_CHECKING([for working CPU_SET]) AC_LINK_IFELSE([ On Feb 8, 2012, at 7:47 PM, Paul H. Hargrove wrote: On 2/8/2012 4:41 PM, Paul H. Hargrove wrote: I do agree w/ Samuel that the BEST solution is to apply "-qhalt=e" ONLY to the test(s) where one expects the compiler to through errors (rather than warnings) for function calls with argument counts which don't match the prototypes. At the moment, I am 90% certain that the "old sched_setaffinity()" probe is the only one fit
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/9/2012 4:48 AM, Jeff Squyres wrote: On Feb 8, 2012, at 6:02 PM, Paul H. Hargrove wrote: The file config/hwloc_check_vendor.m4 that is present in trunk, is ABSENT in the 1.3.2rc1 tarball. There is, correspondingly, no call to _HWLOC_C_COMPILER_VENDOR in hwloc.m4. Correct -- we hadn't used $hwloc_c_vendor anywhere else in the 1.3 configury. Are you sure? It looks like my grep turned up several reads, mostly related to the visibility CFLAGS. Perhaps that was just dead code from a backport? Anyway, it looks suspicious to me. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 4:41 PM, Paul H. Hargrove wrote: I do agree w/ Samuel that the BEST solution is to apply "-qhalt=e" ONLY to the test(s) where one expects the compiler to through errors (rather than warnings) for function calls with argument counts which don't match the prototypes. At the moment, I am 90% certain that the "old sched_setaffinity()" probe is the only one fitting that description. I am hoping to be able contribute patch for this soon. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 4:43 PM, Samuel Thibault wrote: Paul H. Hargrove, le Thu 09 Feb 2012 01:41:47 +0100, a écrit : On 2/8/2012 4:31 PM, Samuel Thibault wrote: Paul H. Hargrove, le Thu 09 Feb 2012 01:28:53 +0100, a écrit : Option #4: CFLAGS='-qhalt=e -qsuppress=1506-077' Appears to work for me for xlc-8.0 and xlc-9.0. That still looks dangerous to me: we don't know whatever warning might be added in the future. I'd rather add -qhalt=e only for the sched_setaffinity test. I don't recommend adding -qsuppress automatically, just documenting it for users that need xlc-8 or xlc-9. I'm not actually talking about the -qsuppress, but about still using -qhalt=e, which might make a lot more other warnings fatal. Right. I realized that about 10 seconds after hitting SEND and was composing a retraction when the post above arrived. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 4:31 PM, Samuel Thibault wrote: Paul H. Hargrove, le Thu 09 Feb 2012 01:28:53 +0100, a écrit : Option #4: CFLAGS='-qhalt=e -qsuppress=1506-077' Appears to work for me for xlc-8.0 and xlc-9.0. That still looks dangerous to me: we don't know whatever warning might be added in the future. I'd rather add -qhalt=e only for the sched_setaffinity test. I don't recommend adding -qsuppress automatically, just documenting it for users that need xlc-8 or xlc-9. If nothing else, this "work-around" is now in the hwloc-devel archives for the search engines. Sorry that I wasn't clear on what I meant to do with those CFLAGS. Regarding "we don't know whatever warning might be added in the future.": "1506-077" is the number for this particular warning about invalid wchar_t constants. So this suppresses ONLY the one message and should be pretty safe. Based on looking at the constants, this message is being issued ERRONEOUSLY by these compilers. I do agree w/ Samuel that the BEST solution is to apply "-qhalt=e" ONLY to the test(s) where one expects the compiler to through errors (rather than warnings) for function calls with argument counts which don't match the prototypes. At the moment, I am 90% certain that the "old sched_setaffinity()" probe is the only one fitting that description. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 8:58 AM, Jeff Squyres wrote: Please test! http://www.open-mpi.org/software/hwloc/v1.3/ I have access to BG/L, BG/P, Cray-XT and Cray-XE systems. Are there any tests I could/should consider running on those? -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 1:44 PM, Brice Goglin wrote: Ah, we need to use $hwloc_c_vendor instead. That's where's $hwloc_check_compiler_vendor_result ends up before being cleared. It looks like something is very wrong here: Examining the 1.3.2rc1 tarball I seem to see $hwloc_c_vendor is read but NOT written! $ grep hwloc_c_vendor configure case "$hwloc_c_vendor" in case "$hwloc_c_vendor" in case "$hwloc_c_vendor" in case "$hwloc_c_vendor" in case "$hwloc_c_vendor" in If this is really the case, then I can imagine visibility and other things going quite wrong with various compilers. The file config/hwloc_check_vendor.m4 that is present in trunk, is ABSENT in the 1.3.2rc1 tarball. There is, correspondingly, no call to _HWLOC_C_COMPILER_VENDOR in hwloc.m4. Am I correct here, or have I missed something? -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 1:10 PM, Paul H. Hargrove wrote: On 2/8/2012 8:58 AM, Jeff Squyres wrote: * Detect when a compiler such as xlc may not report compile errors properly, causing some configure checks to be wrong. Thanks to Paul H. Hargrove for reporting the problem and providing a patch. Looks like I botched this one! I have added two Linux/ppc64 machines the xlc-7.0, xlc-8.0 and xlc-9.0 to my testing. These are NOT running on the odd virtual node that caused assertion failures when testing xlc-11.1. ARGH!!! I've applied the patches I included, and tested on the xlc-11.1 system where auto tools are new enough. Everything looked fine. Now I've had a chance to retest earlier xlc (8 and 9, which are on 2 different machines), with the explict CFLAGS=-qhalt=e. The result was NOT good. It seems that xlc dislikes some wchar constants (see below). In a build w/ default CFLAGS they produce an "(E)" level message, but compilation continues to completion. With the recommended CFLAGS=-qhalt=e these become fatal: CC lstopo-lstopo-text.o "/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0/hwloc-1.3.2rc1/include/hwloc.h", line 1203.34: 1506-1385 (W) The attribute "pure" is not a valid type attribute. "/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0//hwloc-1.3.2rc1/utils/lstopo-text.c", line 450.12: 1506-077 (E) The wchar_t value 0x250c is not valid. "/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0//hwloc-1.3.2rc1/utils/lstopo-text.c", line 451.12: 1506-077 (E) The wchar_t value 0x2510 is not valid. "/home/hargrove/OMPI/hwloc-1.3.2rc1-linux-ppc64-xlc-9.0//hwloc-1.3.2rc1/utils/lstopo-text.c", line 452.12: 1506-077 (E) The wchar_t value 0x2514 is not valid. [followed by another error for each case in the switch statement]. So, now I am not sure what to recommend. Options include: + Don't worry about old xlc (which OMPI doesn't support since they can't build the opal atomics). + Rig things to use -qhalt=e ONLY for configure, but not for make? + Punt on 1.3 and revisit later By the way: xlc-11.1 on Linux doesn't make these complaints on lstopo-lstopo-text. Nor does xlc-6.0 on MacOS-10.3 (honest, I am not making this up). [And, YES, both platforms define HAVE_PUTWC] -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 1:37 PM, Brice Goglin wrote: Let's ignore this for 1.3.2. libnuma sucks, we're wasting way too much time trying to make it sane. I'll look later if I find an easy way to reproduce. OK, fine by me. I've verified that if I "disarm" that test, then the remaining tests PASS. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 8:58 AM, Jeff Squyres wrote: * Fix conversion from/to Linux libnuma when some NUMA nodes have no memory. Tests on the virtual node I have access to where that problem report originated is still not quite right. There is now a different assertion failing than I had seen before: lt-linux-libnuma: /users/phh1/OMPI/hwloc-1.3.2rc1-linux-ppc64-gcc//hwloc-1.3.2rc1/tests/linux-libnuma.c:83: main: Assertion `!memcmp(, _all_nodes, sizeof(nodemask_t))' failed. /bin/sh: line 5: 19416 Aborted ${dir}$tst FAIL: linux-libnuma I don't have any clue if that represents forward or backward progress. Maybe the sanity check is just different between 1.3 and trunk? So, I figured I had better report it. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 8:58 AM, Jeff Squyres wrote: * Detect when a compiler such as xlc may not report compile errors properly, causing some configure checks to be wrong. Thanks to Paul H. Hargrove for reporting the problem and providing a patch. Looks like I botched this one! I have added two Linux/ppc64 machines the xlc-7.0, xlc-8.0 and xlc-9.0 to my testing. These are NOT running on the odd virtual node that caused assertion failures when testing xlc-11.1. With these new xlc versions AND the original xlc-11.1 compiler (4 compilers on 3 different nodes) I am STILL seeing the following INCORRECT result: checking for old prototype of sched_setaffinity... yes Where gcc on the same machines correctly gives a "no" result. Looking in config.log, I do NOT see the -qhalt=E that was discussed as the solution to this problem: configure:9065: checking for old prototype of sched_setaffinity configure:9083: xlc -c conftest.c >&5 And, of course, I didn't see the fatal error that should have occurred at configure time. So, I poked around some more in config.log and found: configure:8338: xlc -c -q32 conftest.c >&5 "conftest.c", line 62.43: 1506-099 (S) Unexpected argument. "conftest.c", line 62.55: 1506-098 (E) Missing argument(s). So, what this means is that the probe I wrote for "xlc needs -qhalt=E" is WRONG. The following tests too many and too few as distinct cases, and appears to resolve the problem for me: --- config/hwloc.m4~2012-02-08 20:55:03.188903698 + +++ config/hwloc.m4 2012-02-08 20:57:29.987668761 + @@ -269,11 +269,16 @@ # make sure the compiler returns an error code when function arg count is wrong, # otherwise sched_setaffinity checks may fail +hwloc_args_check_ok=yes AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ extern int one_arg(int x); +int foo(void) { return one_arg(1, 2); } +]])], [ hwloc_args_check_ok=no ]) +AC_COMPILE_IFELSE([AC_LANG_PROGRAM([[ extern int two_arg(int x, int y); -int foo(void) { return one_arg(1, 2) + two_arg(3); } -]])], [ +int foo(void) { return two_arg(3); } +]])], [ hwloc_args_check_ok=no ]) +AS_IF([test "$hwloc_args_check_ok" != "yes"],[ AC_MSG_WARN([Your C compiler does not consider incorrect argument counts to be a fatal error.]) if test "$hwloc_check_compiler_vendor_result" = "ibm"; then AC_MSG_WARN([For XLC you may try appending '-qhalt=-e' to the value of CFLAGS.]) With that change in place, configure stops as desired: configure: WARNING: Your C compiler does not consider incorrect argument counts to be a fatal error. configure: WARNING: Please report this failure, and configure using a different C compiler if possible. configure: error: Cannot continue. EXCEPT, that I am not seeing the "set CFLAGS..." message. Is it possible that this check is running before hwloc_check_compiler_vendor_result has been set? ALSO, the text of the (missing) message is incorrect: 284c284 < AC_MSG_WARN([For XLC you may try appending '-qhalt=-e' to the value of CFLAGS.]) --- > AC_MSG_WARN([For XLC you may try appending '-qhalt=e' to the value of CFLAGS.]) That is probably my fault, too. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.4.5rc5 has been released
On 2/8/2012 11:14 AM, Paul H. Hargrove wrote: On 2/8/2012 3:25 AM, TERRY DONTJE wrote: + Building w/ Solaris Studio 12.2 or 12.3 on Linux x86-64, with "-m32" required setting LD_LIBRARY_PATH. Can the LD_LIBRARY_PATH be substituted with a rpath change in LDFLAGS of the build? Terry sent more specific instructions for that offlist, and I am testing now. I can confirm that both Solaris Studio 12.2 and 12.3 work with {C,CXX,F,FC}FLAGS=-m22 with the addition of LDFLAGS="-L/lib32 -R/lib32" on the configure line, as suggested by Terry. I went to try 12 and 12.1 for good measure, but found that their C++ compilers choke on /usr/include/stdlib.h. So, since the original error was a c++ one, I didn't pursue them any further. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] 1.3.2rc1 has escaped
On 2/8/2012 9:18 AM, Samuel Thibault wrote: Jeff Squyres, le Wed 08 Feb 2012 17:59:04 +0100, a écrit : Please test! http://www.open-mpi.org/software/hwloc/v1.3/ Could somebody test it on AIX, and with xlc? Thanks, Samuel No AIX, but I will hit xlc on Linux again today. Do we care about xlc on MacOS 10.3? -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [OMPI devel] 1.4.5rc5 has been released
On 2/8/2012 3:25 AM, TERRY DONTJE wrote: + Building w/ Solaris Studio 12.2 or 12.3 on Linux x86-64, with "-m32" required setting LD_LIBRARY_PATH. Can the LD_LIBRARY_PATH be substituted with a rpath change in LDFLAGS of the build? Terry sent more specific instructions for that offlist, and I am testing now. This is could either be Oracle's bug in the compiler, or a libtool problem. My report was: http://www.open-mpi.org/community/lists/devel/2012/01/10272.php I thought I responded to the above issue. You did respond, but I didn't see any "resolution". I apologize if I missed something in the past emails. I think this may be a OS distribution (Solaris Studio assumption) issue. On my RH system /lib contains the 32 libraries and /lib64 has the 64 bit libs. I assume your system may have it the other way around (/lib = 64 bit libs and /lib32 has 32 bit). Can you confirm that your /lib contains 64 bit libs. Also can you do a "cc -### -m32" compile and link of a simple program and confirm that the compiler is pulling in /lib (I am 99% certain it is). YES to "/lib = 64 bit libs and /lib32 has 32 bit". There is also a /lib64->/lib symlink. Here is the requested verbose output: $ cc -### -m32 hello.c ### cc: Note: NLSPATH = /opt/SS12u3/solarisstudio12.3/prod/bin/../lib/locale/%L/LC_MESSAGES/%N.cat:/opt/SS12u3/solarisstudio12.3/prod/bin/../../lib/locale/%L/LC_MESSAGES/%N.cat ### command line files and options (expanded): ### -# -m32 hello.c /opt/SS12u3/solarisstudio12.3/prod/bin/acomp -Qy -Xa -xc99=%all -i hello.c -D__SUNPRO_C=0x5120 -D__unix -D__unix__ -Dlinux -D__linux -D__linux__ -D__gnu__linux__ "-D__builtin_expect(e,x)=e" -D__i386 -D__i386__ -D__BUILTIN_VA_ARG_INCR -D__C99FEATURES__ -D__PRAGMA_REDEFINE_EXTNAME -Dunix -Di386 -D__RESTRICT -D__FLT_EVAL_METHOD__=-1 -D__SUN_PREFETCH -D__NOVECTORSIZE__ -I-xbuiltin -I/opt/SS12u3/solarisstudio12.3/prod/lib/compilers/rtlibs/usr/include -I/opt/SS12u3/solarisstudio12.3/prod/include/cc -xbuiltin=%none -fsimple=0 -m32 -fparam_ir -xF=%none -xdbggen=no%stabs+dwarf2+usedonly -xdbggen=incl -xldscope=global -xivdep=loop -c99OS "-g/opt/SS12u3/solarisstudio12.3/prod/bin/cc -m32 " -destination_ir=yabe -y-fbe -y/opt/SS12u3/solarisstudio12.3/prod/bin/fbe -y-verbose -y-comdat -y-xarch=generic -y-comdat -y-xthreadvar=no%dynamic -y-xannotate=no -y-o -yhello.o -y-s ### cc: Note: LD_LIBRARY_PATH = (null) ### cc: Note: LD_RUN_PATH = (null) ### cc: Note: LD_OPTIONS = (null) /usr/bin/ld -m elf_i386 -dynamic-linker /lib/ld-linux.so.2 --enable-new-dtags /opt/SS12u3/solarisstudio12.3/prod/lib/crti.o /opt/SS12u3/solarisstudio12.3/prod/lib/crt1.o /opt/SS12u3/solarisstudio12.3/prod/lib/values-xa.o hello.o -o a.out -Y "/opt/SS12u3/solarisstudio12.3/prod/lib:/lib32:/usr/lib32" -Qy -lc /opt/SS12u3/solarisstudio12.3/prod/lib/libc_supp.a /opt/SS12u3/solarisstudio12.3/prod/lib/crtn.o rm hello.o HOWEVER, in the failing build there was the following bit of output showing that the system linker is NOT being used: CC: Warning: failed to detect system linker version, falling back to custom linker usage Also, is this /lib is 64 bit libraries a common thing, none of my Linux systems are set up this way. This appears to be the default on Ubuntu (checked 3 hosts with 2 different releases). -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900