Re: [Numpy-discussion] Calling scipy blas from cython is extremely slow

2013-02-24 Thread Pauli Virtanen
23.02.2013 20:31, Sergio Callegari kirjoitti:
 Partially fixed.
 
 I was messing the row, column order.  For some reason this was working in some
 case. Now I've fixed it and it *always* works.
 
 However, it is still slower than the cblas
 
 cblas - 0.69 sec
 scipy blas - 0.74 sec

The possible explanations are that either the routine called is
different in the two cases, or, the benchmark if somehow faulty.

-- 
Pauli Virtanen

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] Calling scipy blas from cython is extremely slow

2013-02-23 Thread Sergio Callegari
Hi,

following the excellent advice of V. Armando Sole, I have finally succeeded in
calling the blas routines shipped with scipy from cython.

I am doing this to avoid shipping an extra blas library for some project of
mine that uses scipy but has some things coded in cython for extra speed.

So far I managed getting things working on Linux.  Here is what I do:

The following code snippet gives me the dgemv pointer (which is a pointer to a
fortran function, even if it comes from scipy.linalg.blas.cblas, weird).

from cpython cimport PyCObject_AsVoidPtr
import scipy as sp
__import__('scipy.linalg.blas')

ctypedef void (*dgemv_ptr) (char *trans, int *m, int *n,\
 double *alpha, double *a, int *lda, double *x,\
 int *incx,\
 double *beta,  double *y, int *incy)
cdef dgemv_ptr dgemv=dgemv_ptrPyCObject_AsVoidPtr(\
sp.linalg.blas.cblas.dgemv._cpointer)


Then, in a tight loop, I can call dgemv by first defining the constants
and then calling dgemv inside the loop

cdef int one=1
cdef double onedot = 1.0
cdef double zerodot = 0.0
cdef char trans = 'N'
for i in xrange(N):
dgemv(trans, nq, order,\
onedot, double *np.PyArray_DATA(C), order, \
double*np.PyArray_DATA(c_x0), one, \
zerodot, double*np.PyArray_DATA(y0), one)


It works, but it is many many times slower than linking to the cblas that is
available on the same system.  Specifically, I have about 8 calls to blas in my
tight loop, 4 of them are to dgemv and the others are to dcopy.  Changing a
single dgemv call from the system cblas to the blas function returned by
scipy.linalg.blas.cblas.dgemv._cpointer makes the execution time of a test case
jump from about 0.7 s to 1.25 on my system.

Any clue about why is this happening?

In the end, on linux, scipy dynamically link to atlas exactly as I link to
atlas when I use the cblas functions. 


___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling scipy blas from cython is extremely slow

2013-02-23 Thread Sergio Callegari
... and it is not deterministic too...

About 1 time over 6 the code calling the scipy blas gives a completely wrong
result. How can this be?

___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] Calling scipy blas from cython is extremely slow

2013-02-23 Thread Sergio Callegari
Partially fixed.

I was messing the row, column order.  For some reason this was working in some
case. Now I've fixed it and it *always* works.

However, it is still slower than the cblas

cblas - 0.69 sec
scipy blas - 0.74 sec

Any clue why?



___
NumPy-Discussion mailing list
NumPy-Discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion