William Stein wrote:
> On Tue, Oct 20, 2009 at 5:54 AM, Dr. David Kirkby
> <[email protected]> wrote:
>> I reported earlier a problem with libm4ri which has code which determines the
>> number of CPUs and their cache sizes using autoconf macros. I do not know if
>> there is any other code in Sage that does this, but it was pointed out to me
>> on
>> by John Carr on comp.sys.hp.hpux that it's unwise to do this.
>>
>> When you think about it, he is absolutely right. Furthermore, it has much
>> wider
>> implications than the relatively rare HP-UX operating system which caused me
>> to investigate the issue, as one of the macros actually crashed on HP-UX.
>>
>> If we make a sage binary on sage.math with 24 cores, then any code like
>> libm4ri
>> that determines the number of processors at compile time will most likely
>> work
>> far from optimally on a typical computer. Therefor the use of autoconf
>> macros,
>> which only get executed at compile time, is not the place to determine the
>> number of processors available.
>
> We don't build our binaries on sage.math. We build them on virtual
> machines that have 1 core. It's reasonable that binaries would be
> slightly non-optimal compared to something one would be one's self
> from source.
>> Does for example ATLAS if built on sage.math assume the user who downloads
>> the
>> binaries has 24 CPUs?
>
> No.
>
>> In the case of libm4ri John Carr thought that uses Open MP, which would have
>> a
>> way to determine the number of processors at run-time.
>>
>> This is just one more example of where testing on an unusual operating system
>> (HP-UX) highlights issues which have far wider implications than just on the
>> platform where the issue was found.
>
> Respectfully, the assertion that it is "wrong" to use characteristic
> of the hardware such as number of CPU's (and cache sizes, etc.) to
> tune their compilation strikes me as naive. Sure, it would be nice
> in theory if there existed some magic way to write code that could run
> optimally on all possible numbers of CPU's (processor flags, cache
> sizes, etc.), but I think that's basically impossible with present
> technology for nontrivial algorithms.
>
> -- William
Some things like cache sizes, I would agree are probably non-trivial get to at
run time,
But the number of processors should be trivial to determine at run-time.
Wolfram
Research do it in Mathematica. Here's their code for Linux, Solaris and HP-UX.
if [ "${SystemID}" = "Linux" -o "${SystemID}" = "Linux-x86-64" -o
"${SystemID}" = "Linux-IA64" ]; then
if [ -z "${OMP_NUM_THREADS}" ]; then
OMP_NUM_THREADS=`cat /proc/cpuinfo | grep processor | wc -l | tr
-d
' '`;
export OMP_NUM_THREADS;
fi
fi
if [ "${SystemID}" = "Solaris-SPARC" -o "${SystemID}" = "Solaris-x86-64"
];
then
if [ -z "${OMP_NUM_THREADS}" ]; then
OMP_NUM_THREADS=`/usr/sbin/psrinfo | wc -l | tr -d ' '`;
export OMP_NUM_THREADS;
fi
fi
if [ "${SystemID}" = "HPUX-PA64" -o "${SystemID}" = "HP-RISC" ]; then
if [ -z "${MLIB_NUMBER_OF_THREADS}" ]; then
MLIB_NUMBER_OF_THREADS=`/usr/sbin/ioscan -k -C processor | grep
processor | wc -l`;
export MLIB_NUMBER_OF_THREADS;
fi
fi
One might say their solution is not optimal, given the multi-core CPUs in
existence. But it does seem a bit silly determining such things at compile
time,
if, as that person on the HP-UX newsgroup said, this could be done at runtime
using Open MP, which libm4ri uses anyway.
Given that:
1) The autoconf developers say that those two macros (one for number of CPUs,
the other for cache size) in the autoconf achieve are are badly written.
2) The autoconf developers say the macros will only work for a subset of x86
CPUs.
3) Neither macro works on Solaris.
4) Neither macro works on HP-UX
5) The macro for determining cache size crashes on HP-UX
There must surely be a better way.
I would have thought that the performance impact for assuming too many CPUs
would be smaller than the performance impact for underestimating the number of
CPUs. If you have quad processors and run assuming only one, their could be
upto
a factor of 4 peformance degradation. I suspect assuming 4 CPUs, with only 1
present would in general for multi-threaded code, have a far smaller impact.
I was going to look at fixing those autoconf macros, but from comments on both
the HP-UX newsgroup and the autoconf mailing list, have convinced me that not
only is their implementation poor, but the use of them is suspect.
Dave
--~--~---------~--~----~------------~-------~--~----~
To post to this group, send an email to [email protected]
To unsubscribe from this group, send an email to
[email protected]
For more options, visit this group at http://groups.google.com/group/sage-devel
URL: http://www.sagemath.org
-~----------~----~----~----~------~----~------~--~---