The handling of BLAS/LAPACK libraries suck, Sage basically hardcodes ATLAS 
or OSX accelerate into the build process. Matrix operations are one place 
where using ISA extensions and careful choice of blocking to match 
processor cache really matters, but we have no mechanism to use a good 
implementation in binary builds. 

My proposal is to make the BLAS library configurable at runtime with 
defaults chosen by an automatic benchmark run. This is similar to what 
GMP/MPIR does but at a higher level. Instead of writing our own "FAT" BLAS, 
we use a wrapper library that resolves (using dlopen/dlsym) the BLAS 
library at runtime to the reference/netlib BLAS, ATLAS, OpenBlas, Intel 
MKL, or any other implementation. The backend can be selected with the 
LINEAR_ALGEBRA_CONFIG environment variable. 

Ultimately, there will be a Python program to build different BLAS/LAPACK 
libraries and install them either in /usr/lib/linear_algebra/NAME/ (or 
/usr/lib64 or any compile-time prefix) or 
$(HOME)/.local/lib/linear_algebra/NAME/ where NAME is a combination of 
implementation and compile options used. The LINEAR_ALGEBRA_CONFIG 
environment variable will then be set to NAME.

As a demo, I wrote the wrapper library linalg_cblas.so, see

https://github.com/vbraun/linear_algebra_config/blob/master/library_src/cblas.c

which wraps two CBLAS functions. For now, LINEAR_ALGEBRA_CONFIG is used as 
name (possibly with full path info) of the cblas library, though this will 
be changed. The example program

https://github.com/vbraun/linear_algebra_config/blob/master/test/startup.c

is just a call to these two CBLAS functions and is linked with 
-llinalg_cblas -ldl. Then the CBLAS library can be selected at runtime 
(here switched between Sage GSL CBLAS, the Fedora GSL CBLAS, and ATLAS):

[vbraun@volker-desktop linear_algebra]$ 
LINEAR_ALGEBRA_CONFIG=~/Sage/sage/local/lib/libgslcblas.so ./test/startup 
Address of cblas_zdotc_sub: d817a0b0
Address of cblas_dgemm: d815ffb0
destructor
[vbraun@volker-desktop linear_algebra]$ 
LINEAR_ALGEBRA_CONFIG=libgslcblas.so.0 ./test/startup 
Address of cblas_zdotc_sub: c30260f0
Address of cblas_dgemm: c300c000
destructor
[vbraun@volker-desktop linear_algebra]$ LINEAR_ALGEBRA_CONFIG=libcblas.so 
./test/startup 
Address of cblas_zdotc_sub: c58159e0
Address of cblas_dgemm: c580c420
destructor

The computational cost is of course an extra stack frame whenever you call 
a BLAS/LAPACK function. But then thats already faster than any operation 
that you can do in Python. 

In the Sage build process we would then build the wrapper library and the 
reference blas by default. The wrapper will try to pick up system 
BLAS/LAPACK implementations if they are available. Installing ATLAS or 
OpenBlas would then be optional. Right now we compile ATLAS in parallel 
together with anything else, which makes little sense if you want to get 
accurate timings. Also the ATLAS library interface changed in ATLAS-3.10 so 
we'll have to modify everything sooner or later. Finally, ATLAS-3.10 
doesn't work on Itanium any more and there is not much interest in making 
it work there, so if we ever want to get AVX support, say, on modern Intel 
chips we need to fall back to a dumb implementation on Itanium (or face the 
fact that Itanium is dead).

-- 
You received this message because you are subscribed to the Google Groups 
"sage-devel" group.
To post to this group, send email to [email protected].
To unsubscribe from this group, send email to 
[email protected].
Visit this group at http://groups.google.com/group/sage-devel?hl=en.


Reply via email to