Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]
My $0.02 is that we should disable FFS on all versions of that compiler. It's not like this is performance-critical code. I'd rather it be "slow" and guaranteed correct than fast and maybe wrong. Meaning: I'm good with Paul's patch. I'll commit, since no one has posted any alternatives. On Feb 2, 2012, at 6:00 AM, Samuel Thibault wrote: > Paul H. Hargrove, le Thu 02 Feb 2012 02:29:08 +0100, a écrit : >> + The configure-time logic is NOT trying to determine the version number, as >> I don't have a way (yet?) to pinpoint which version(s) work correctly, and >> the Oracle Forums thread on the subject doesn't say. So, it is >> conservatively assuming all "gccfss" versions are broken. > > We don't necessarily need to be precise, suffice to know that at least > from some version the bug was fixed, and be fine with spuriously use the > generic code with old non-broken gccfss. > > Samuel > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Jeff Squyres jsquy...@cisco.com For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/
Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]
Paul H. Hargrove, le Thu 02 Feb 2012 02:29:08 +0100, a écrit : > + The configure-time logic is NOT trying to determine the version number, as > I don't have a way (yet?) to pinpoint which version(s) work correctly, and > the Oracle Forums thread on the subject doesn't say. So, it is > conservatively assuming all "gccfss" versions are broken. We don't necessarily need to be precise, suffice to know that at least from some version the bug was fixed, and be fine with spuriously use the generic code with old non-broken gccfss. Samuel
Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]
>Would that work? Nope, I tried to address that question in a comment in the patch. The link succeeds and the problem only occurs when the executable is RUN. So one would need AC_TRY_RUN; and then one has openned the the cross-compilation can-of-worms. -Paul On 2/1/2012 9:51 PM, Brice Goglin wrote: We could also AC_TRY_LINK a program that uses ffsfoo (the one that actually breaks here). If it fails, we AC_TRY_LINK a program that uses ffsfoo with the __ffssi2() definition. If it fails, we define NEED_FFS_FIX And we just add the fix under #ifdef NEED_FFS_FIX in private/misc.h. Would that work? thanks Brice Le 02/02/2012 02:28, Paul H. Hargrove a écrit : On 2/1/2012 11:46 AM, Paul H. Hargrove wrote: [snip] So, in short: when building w/ this compiler, hwloc needs to behave as if the system lacks ffs(). Making that happen is non-trivial because there are no preprocessor symbols defined by gccfss that would allow compile-time #if checks to distinguish gccfss from "vanilla" gcc. The only difference is in the string value of __VERSION__, which one could check at configure time. Attached is a patch, relative to the svn trunk, which fixes this problem in my testing. As I outlined above, the approach is two-fold: 1) Add configure-time logic to ID the buggy compiler 2) Restructure include/private/misc.h to include a HWLOC_HAVE_BROKEN_FFS case. Two things I'd like to note about the approach: + The configure-time logic is NOT trying to determine the version number, as I don't have a way (yet?) to pinpoint which version(s) work correctly, and the Oracle Forums thread on the subject doesn't say. So, it is conservatively assuming all "gccfss" versions are broken. + The misc.h changes are intentionally "generic" so one could add other configure time checks to define HWLOC_HAVE_BROKEN_FFS based on problems we've not yet discovered. -Paul ___ hwloc-devel mailing list hwloc-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel ___ hwloc-devel mailing list hwloc-de...@open-mpi.org http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900
Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]
We could also AC_TRY_LINK a program that uses ffsfoo (the one that actually breaks here). If it fails, we AC_TRY_LINK a program that uses ffsfoo with the __ffssi2() definition. If it fails, we define NEED_FFS_FIX And we just add the fix under #ifdef NEED_FFS_FIX in private/misc.h. Would that work? thanks Brice Le 02/02/2012 02:28, Paul H. Hargrove a écrit : > > On 2/1/2012 11:46 AM, Paul H. Hargrove wrote: > [snip] >> So, in short: when building w/ this compiler, hwloc needs to behave >> as if the system lacks ffs(). >> >> Making that happen is non-trivial because there are no preprocessor >> symbols defined by gccfss that would allow compile-time #if checks to >> distinguish gccfss from "vanilla" gcc. The only difference is in the >> string value of __VERSION__, which one could check at configure time. > > Attached is a patch, relative to the svn trunk, which fixes this > problem in my testing. > As I outlined above, the approach is two-fold: > 1) Add configure-time logic to ID the buggy compiler > 2) Restructure include/private/misc.h to include a > HWLOC_HAVE_BROKEN_FFS case. > > Two things I'd like to note about the approach: > > + The configure-time logic is NOT trying to determine the version > number, as I don't have a way (yet?) to pinpoint which version(s) work > correctly, and the Oracle Forums thread on the subject doesn't say. > So, it is conservatively assuming all "gccfss" versions are broken. > > + The misc.h changes are intentionally "generic" so one could add > other configure time checks to define HWLOC_HAVE_BROKEN_FFS based on > problems we've not yet discovered. > > -Paul > > > ___ > hwloc-devel mailing list > hwloc-de...@open-mpi.org > http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel
Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]
On 2/1/2012 11:46 AM, Paul H. Hargrove wrote: [snip] So, in short: when building w/ this compiler, hwloc needs to behave as if the system lacks ffs(). Making that happen is non-trivial because there are no preprocessor symbols defined by gccfss that would allow compile-time #if checks to distinguish gccfss from "vanilla" gcc. The only difference is in the string value of __VERSION__, which one could check at configure time. Attached is a patch, relative to the svn trunk, which fixes this problem in my testing. As I outlined above, the approach is two-fold: 1) Add configure-time logic to ID the buggy compiler 2) Restructure include/private/misc.h to include a HWLOC_HAVE_BROKEN_FFS case. Two things I'd like to note about the approach: + The configure-time logic is NOT trying to determine the version number, as I don't have a way (yet?) to pinpoint which version(s) work correctly, and the Oracle Forums thread on the subject doesn't say. So, it is conservatively assuming all "gccfss" versions are broken. + The misc.h changes are intentionally "generic" so one could add other configure time checks to define HWLOC_HAVE_BROKEN_FFS based on problems we've not yet discovered. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900 Index: include/private/misc.h === --- include/private/misc.h (revision 4239) +++ include/private/misc.h (working copy) @@ -46,8 +46,15 @@ * ffsl helpers. */ -#ifdef __GNUC__ +#if defined(HWLOC_HAVE_BROKEN_FFS) +/* System has a broken ffs(). + * We must check the before __GNUC__ or HWLOC_HAVE_FFSL + */ +#define HWLOC_NO_FFS + +#elif defined(__GNUC__) + # if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4)) /* Starting from 3.4, gcc has a long variant. */ #define hwloc_ffsl(x) __builtin_ffsl(x) @@ -75,6 +82,13 @@ #else /* no ffs implementation */ +#define HWLOC_NO_FFS + +#endif + +#ifdef HWLOC_NO_FFS + +/* no ffs or it is known to be broken */ static __hwloc_inline int __hwloc_attribute_const hwloc_ffsl(unsigned long x) { @@ -114,10 +128,8 @@ return i; } -#endif +#elif defined(HWLOC_NEED_FFSL) -#ifdef HWLOC_NEED_FFSL - /* We only have an int ffs(int) implementation, build a long one. */ /* First make it 32 bits if it was only 16. */ Index: config/hwloc.m4 === --- config/hwloc.m4 (revision 4239) +++ config/hwloc.m4 (working copy) @@ -446,6 +446,15 @@ AC_DEFINE([HWLOC_HAVE_DECL_FFS], [1], [Define to 1 if function `ffs' is declared by system headers]) ]) AC_DEFINE([HWLOC_HAVE_FFS], [1], [Define to 1 if you have the `ffs' function.]) + if ( $CC --version | grep gccfss ) >/dev/null 2>&1 ; then +dnl May be broken due to +dnlhttps://forums.oracle.com/forums/thread.jspa?threadID=1997328 +dnl TODO: a more selective test, since bug may be version dependent. +dnl We can't use AC_TRY_LINK because the failure does not appear until +dnl run/load time and there is currently no precedent for AC_TRY_RUN +dnl use in hwloc. --PHH +AC_DEFINE([HWLOC_HAVE_BROKEN_FFS], [1], [Define to 1 if your `ffs' function is known to be broken.]) + fi ]) AC_CHECK_FUNCS([ffsl], [ _HWLOC_CHECK_DECL([ffsl],[
[hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss
The problem I described below is ALSO present in hwloc-1.4 -Paul On 1/31/2012 4:57 PM, Paul H. Hargrove wrote: This report is the flip-side of the problem w/ Solaris Studio compilers on x86-64. With Solaris-10 on SPARC, I find I *can* build hwloc-1.3.1 w/ SS12.x, but instead am failing w/ gcc. Keep in mind that /usr/bin/gcc on this system is one from Sun, not the FSF: -bash-3.00$ which gcc /usr/bin/gcc -bash-3.00$ gcc --version sparc-sun-solaris2.10-gcc (GCC) 4.0.4 (gccfss) Copyright (C) 2006 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. The key bit there is "(gccfss)" = "GCC for SPARC Systems" The problem is a load-time missing symbol when I "gmake check": $ gmake check V=1 Making check in src [...] gmake[2]: Entering directory `/home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/utils' ld.so.1: hwloc-calc: fatal: relocation error: file /home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/src/.libs/libhwloc.so.4: symbol __ffssi2: referenced symbol not found FAIL: test-hwloc-calc.sh ld.so.1: hwloc-distrib: fatal: relocation error: file /home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/src/.libs/libhwloc.so.4: symbol __ffssi2: referenced symbol not found FAIL: test-hwloc-distrib.sh 2 of 2 tests failed Please report to http://www.open-mpi.org/community/help/ Again I am sorry I didn't get a chance to discover this in 1.3.1rc2. -Paul -- Paul H. Hargrove phhargr...@lbl.gov Future Technologies Group HPC Research Department Tel: +1-510-495-2352 Lawrence Berkeley National Laboratory Fax: +1-510-486-6900