Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]

2012-02-02 Thread Jeff Squyres
My $0.02 is that we should disable FFS on all versions of that compiler.  It's 
not like this is performance-critical code.  I'd rather it be "slow" and 
guaranteed correct than fast and maybe wrong.

Meaning: I'm good with Paul's patch.  I'll commit, since no one has posted any 
alternatives.


On Feb 2, 2012, at 6:00 AM, Samuel Thibault wrote:

> Paul H. Hargrove, le Thu 02 Feb 2012 02:29:08 +0100, a écrit :
>> + The configure-time logic is NOT trying to determine the version number, as
>> I don't have a way (yet?) to pinpoint which version(s) work correctly, and
>> the Oracle Forums thread on the subject doesn't say.  So, it is
>> conservatively assuming all "gccfss" versions are broken.
> 
> We don't necessarily need to be precise, suffice to know that at least
> from some version the bug was fixed, and be fine with spuriously use the
> generic code with old non-broken gccfss.
> 
> Samuel
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel


-- 
Jeff Squyres
jsquy...@cisco.com
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/




Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]

2012-02-02 Thread Samuel Thibault
Paul H. Hargrove, le Thu 02 Feb 2012 02:29:08 +0100, a écrit :
> + The configure-time logic is NOT trying to determine the version number, as
> I don't have a way (yet?) to pinpoint which version(s) work correctly, and
> the Oracle Forums thread on the subject doesn't say.  So, it is
> conservatively assuming all "gccfss" versions are broken.

We don't necessarily need to be precise, suffice to know that at least
from some version the bug was fixed, and be fine with spuriously use the
generic code with old non-broken gccfss.

Samuel


Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]

2012-02-02 Thread Paul H. Hargrove

>Would that work?

Nope, I tried to address that question in a comment in the patch.
The link succeeds and the problem only occurs when the executable is RUN.
So one would need AC_TRY_RUN; and then one has openned the  the 
cross-compilation can-of-worms.


-Paul

On 2/1/2012 9:51 PM, Brice Goglin wrote:
We could also AC_TRY_LINK a program that uses ffsfoo (the one that 
actually breaks here).
If it fails, we AC_TRY_LINK a program that uses ffsfoo with the 
__ffssi2() definition.

If it fails, we define NEED_FFS_FIX
And we just add the fix under #ifdef NEED_FFS_FIX in private/misc.h.
Would that work?
thanks
Brice



Le 02/02/2012 02:28, Paul H. Hargrove a écrit :


On 2/1/2012 11:46 AM, Paul H. Hargrove wrote:
[snip]
So, in short: when building w/ this compiler, hwloc needs to behave 
as if the system lacks ffs().


Making that happen is non-trivial because there are no preprocessor 
symbols defined by gccfss that would allow compile-time #if checks 
to distinguish gccfss from "vanilla" gcc.  The only difference is in 
the string value of __VERSION__, which one could check at configure 
time.


Attached is a patch, relative to the svn trunk, which fixes this 
problem in my testing.

As I outlined above, the approach is two-fold:
1) Add configure-time logic to ID the buggy compiler
2) Restructure include/private/misc.h to include a 
HWLOC_HAVE_BROKEN_FFS case.


Two things I'd like to note about the approach:

+ The configure-time logic is NOT trying to determine the version 
number, as I don't have a way (yet?) to pinpoint which version(s) 
work correctly, and the Oracle Forums thread on the subject doesn't 
say.  So, it is conservatively assuming all "gccfss" versions are 
broken.


+ The misc.h changes are intentionally "generic" so one could add 
other configure time checks to define HWLOC_HAVE_BROKEN_FFS based on 
problems we've not yet discovered.


-Paul


___
hwloc-devel mailing list
hwloc-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel




___
hwloc-devel mailing list
hwloc-de...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel


--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900



Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]

2012-02-02 Thread Brice Goglin
We could also AC_TRY_LINK a program that uses ffsfoo (the one that
actually breaks here).
If it fails, we AC_TRY_LINK a program that uses ffsfoo with the
__ffssi2() definition.
If it fails, we define NEED_FFS_FIX
And we just add the fix under #ifdef NEED_FFS_FIX in private/misc.h.
Would that work?
thanks
Brice



Le 02/02/2012 02:28, Paul H. Hargrove a écrit :
>
> On 2/1/2012 11:46 AM, Paul H. Hargrove wrote:
> [snip]
>> So, in short: when building w/ this compiler, hwloc needs to behave
>> as if the system lacks ffs().
>>
>> Making that happen is non-trivial because there are no preprocessor
>> symbols defined by gccfss that would allow compile-time #if checks to
>> distinguish gccfss from "vanilla" gcc.  The only difference is in the
>> string value of __VERSION__, which one could check at configure time.
>
> Attached is a patch, relative to the svn trunk, which fixes this
> problem in my testing.
> As I outlined above, the approach is two-fold:
> 1) Add configure-time logic to ID the buggy compiler
> 2) Restructure include/private/misc.h to include a
> HWLOC_HAVE_BROKEN_FFS case.
>
> Two things I'd like to note about the approach:
>
> + The configure-time logic is NOT trying to determine the version
> number, as I don't have a way (yet?) to pinpoint which version(s) work
> correctly, and the Oracle Forums thread on the subject doesn't say. 
> So, it is conservatively assuming all "gccfss" versions are broken.
>
> + The misc.h changes are intentionally "generic" so one could add
> other configure time checks to define HWLOC_HAVE_BROKEN_FFS based on
> problems we've not yet discovered.
>
> -Paul
>
>
> ___
> hwloc-devel mailing list
> hwloc-de...@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/hwloc-devel



Re: [hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss [PATCH]

2012-02-01 Thread Paul H. Hargrove


On 2/1/2012 11:46 AM, Paul H. Hargrove wrote:
[snip]
So, in short: when building w/ this compiler, hwloc needs to behave as 
if the system lacks ffs().


Making that happen is non-trivial because there are no preprocessor 
symbols defined by gccfss that would allow compile-time #if checks to 
distinguish gccfss from "vanilla" gcc.  The only difference is in the 
string value of __VERSION__, which one could check at configure time.


Attached is a patch, relative to the svn trunk, which fixes this problem 
in my testing.

As I outlined above, the approach is two-fold:
1) Add configure-time logic to ID the buggy compiler
2) Restructure include/private/misc.h to include a HWLOC_HAVE_BROKEN_FFS 
case.


Two things I'd like to note about the approach:

+ The configure-time logic is NOT trying to determine the version 
number, as I don't have a way (yet?) to pinpoint which version(s) work 
correctly, and the Oracle Forums thread on the subject doesn't say.  So, 
it is conservatively assuming all "gccfss" versions are broken.


+ The misc.h changes are intentionally "generic" so one could add other 
configure time checks to define HWLOC_HAVE_BROKEN_FFS based on problems 
we've not yet discovered.


-Paul

--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900

Index: include/private/misc.h
===
--- include/private/misc.h  (revision 4239)
+++ include/private/misc.h  (working copy)
@@ -46,8 +46,15 @@
  * ffsl helpers.
  */

-#ifdef __GNUC__
+#if defined(HWLOC_HAVE_BROKEN_FFS)

+/* System has a broken ffs().
+ * We must check the before __GNUC__ or HWLOC_HAVE_FFSL
+ */
+#define HWLOC_NO_FFS
+
+#elif defined(__GNUC__)
+
 #  if (__GNUC__ >= 4) || ((__GNUC__ == 3) && (__GNUC_MINOR__ >= 4))
  /* Starting from 3.4, gcc has a long variant.  */
 #define hwloc_ffsl(x) __builtin_ffsl(x)
@@ -75,6 +82,13 @@

 #else /* no ffs implementation */

+#define HWLOC_NO_FFS
+
+#endif
+
+#ifdef HWLOC_NO_FFS
+
+/* no ffs or it is known to be broken */
 static __hwloc_inline int __hwloc_attribute_const
 hwloc_ffsl(unsigned long x)
 {
@@ -114,10 +128,8 @@
return i;
 }

-#endif
+#elif defined(HWLOC_NEED_FFSL)

-#ifdef HWLOC_NEED_FFSL
-
 /* We only have an int ffs(int) implementation, build a long one.  */

 /* First make it 32 bits if it was only 16.  */
Index: config/hwloc.m4
===
--- config/hwloc.m4 (revision 4239)
+++ config/hwloc.m4 (working copy)
@@ -446,6 +446,15 @@
 AC_DEFINE([HWLOC_HAVE_DECL_FFS], [1], [Define to 1 if function `ffs' 
is declared by system headers])
   ])
   AC_DEFINE([HWLOC_HAVE_FFS], [1], [Define to 1 if you have the `ffs' 
function.])
+  if ( $CC --version | grep gccfss ) >/dev/null 2>&1 ; then
+dnl May be broken due to
+dnlhttps://forums.oracle.com/forums/thread.jspa?threadID=1997328
+dnl TODO: a more selective test, since bug may be version dependent.
+dnl We can't use AC_TRY_LINK because the failure does not appear until
+dnl run/load time and there is currently no precedent for AC_TRY_RUN
+dnl use in hwloc.  --PHH
+AC_DEFINE([HWLOC_HAVE_BROKEN_FFS], [1], [Define to 1 if your `ffs' 
function is known to be broken.])
+  fi
 ])
 AC_CHECK_FUNCS([ffsl], [
   _HWLOC_CHECK_DECL([ffsl],[


[hwloc-devel] hwloc-1.4 "gmake check" failure on Solaris-10/SPARC/gccfss

2012-01-31 Thread Paul H. Hargrove

The problem I described below is ALSO present in hwloc-1.4
-Paul

On 1/31/2012 4:57 PM, Paul H. Hargrove wrote:
This report is the flip-side of the problem w/ Solaris Studio 
compilers on x86-64.
With Solaris-10 on SPARC, I find I *can* build hwloc-1.3.1 w/ SS12.x, 
but instead am failing w/ gcc.


Keep in mind that /usr/bin/gcc on this system is one from Sun, not the 
FSF:

-bash-3.00$ which gcc
/usr/bin/gcc
-bash-3.00$ gcc --version
sparc-sun-solaris2.10-gcc (GCC) 4.0.4 (gccfss)
Copyright (C) 2006 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There 
is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR 
PURPOSE.

The key bit there is "(gccfss)" = "GCC for SPARC Systems"

The problem is a load-time missing symbol when I "gmake check":

$ gmake check V=1
Making check in src
[...]
gmake[2]: Entering directory 
`/home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/utils'
ld.so.1: hwloc-calc: fatal: relocation error: file 
/home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/src/.libs/libhwloc.so.4: 
symbol __ffssi2: referenced symbol not found

FAIL: test-hwloc-calc.sh
ld.so.1: hwloc-distrib: fatal: relocation error: file 
/home/hargrove/OMPI/hwloc-1.3.1-solaris10-sparcT2-gccfss404/BLD/src/.libs/libhwloc.so.4: 
symbol __ffssi2: referenced symbol not found

FAIL: test-hwloc-distrib.sh

2 of 2 tests failed
Please report to http://www.open-mpi.org/community/help/



Again I am sorry I didn't get a chance to discover this in 1.3.1rc2.

-Paul



--
Paul H. Hargrove  phhargr...@lbl.gov
Future Technologies Group
HPC Research Department   Tel: +1-510-495-2352
Lawrence Berkeley National Laboratory Fax: +1-510-486-6900