Re: [Numpy-discussion] inplace dot products
On 20-Feb-09, at 6:41 AM, Olivier Grisel wrote: Alright, thanks for the reply. Is there a canonical way /sample code to gain low level access to blas / lapack atlas routines using ctypes from numpy / scipy code? I don't mind fixing the dimensions and the ndtype of my array if it can decrease the memory overhead. I got some clarification from Pearu Peterson off-list. For gemm the issue is that if the matrix C is not Fortran-ordered, it will be copied, and that copy will be over-written. order='F' when creating the array being overwritten will fix this. DWF ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://mail.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
Hi Olivier, There was this idea posted on the Scipy-user list a while back: http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html but it doesn't look like he got anywhere with it, or even got a response. I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy. subroutine tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C - alpha * op(A) * op(B) + beta * C I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype. I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know what that argument is :) David On 18-Feb-09, at 4:06 AM, Olivier Grisel wrote: Hi numpist people, I discovered the ufunc and there ability to compute the results on preallocated arrays: a = arange(10, dtype=float32) b = arange(10, dtype=float32) + 1 c = add(a, b, a) c is a True a array([ 1., 3., 5., 7., 9., 11., 13., 15., 17., 19.], dtype=float32) My questions is : is there a way to have an equivalent for the dot product operation: I want atlas to build my dot products without allocating a temporary array and reuse a preallocated array of results. Suppose I have: results = array((10, 3), dtype=float32) W = arange(6, dtype=float32).reshape((2, 3)) x = arange(20, dtype=float32).reshape((10, 2)) What I want is the equivalent of the following without the intermediate call to malloc: results[:] = dot(x, W) Any idea? I tried to introspect the various docstring of numpy core modules but I could not get any lead. -- Olivier ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
2009/2/20 David Warde-Farley d...@cs.toronto.edu: Hi Olivier, There was this idea posted on the Scipy-user list a while back: http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html but it doesn't look like he got anywhere with it, or even got a response. I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy. subroutine tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C - alpha * op(A) * op(B) + beta * C I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype. I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know what that argument is :) Alright, thanks for the reply. Is there a canonical way /sample code to gain low level access to blas / lapack atlas routines using ctypes from numpy / scipy code? I don't mind fixing the dimensions and the ndtype of my array if it can decrease the memory overhead. BTW, Robert if your insight on this topic would be very much appreciated. -- Olivier ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
Olivier Grisel wrote: 2009/2/20 David Warde-Farley d...@cs.toronto.edu: Hi Olivier, There was this idea posted on the Scipy-user list a while back: http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html but it doesn't look like he got anywhere with it, or even got a response. I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy. subroutine tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C - alpha * op(A) * op(B) + beta * C I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype. I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know what that argument is :) Alright, thanks for the reply. Is there a canonical way /sample code to gain low level access to blas / lapack atlas routines using ctypes from numpy / scipy code? You can just use ctypes to access ATLAS, as you would do for any library. Or do you mean something else ? cheers, David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
On 20-Feb-09, at 6:39 AM, David Cournapeau wrote: You can just use ctypes to access ATLAS, as you would do for any library. Or do you mean something else ? Say, David... :) Do you have any idea why the pyf wrapper for fblas3 completely ignores the overwrite_c argument? Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something. Regards, (Other) David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
On Fri, Feb 20, 2009 at 06:25, David Warde-Farley d...@cs.toronto.edu wrote: On 20-Feb-09, at 6:39 AM, David Cournapeau wrote: You can just use ctypes to access ATLAS, as you would do for any library. Or do you mean something else ? Say, David... :) Do you have any idea why the pyf wrapper for fblas3 completely ignores the overwrite_c argument? I believe the copy is the culprit. double precision dimension(m,n),intent(in,out,copy),depend(m,n),optional :: c Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something. Examples? -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
On Fri, Feb 20, 2009 at 05:18, David Warde-Farley d...@cs.toronto.edu wrote: Hi Olivier, There was this idea posted on the Scipy-user list a while back: http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html but it doesn't look like he got anywhere with it, or even got a response. I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy. subroutine tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C - alpha * op(A) * op(B) + beta * C I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype. What do you mean? Based on the rest of the argument information in that block, f2py creates the Python prototype. For example, all of the m,n,k,lda,ka,ldb,kb dimensions are found from the input arrays themselves, optional arguments are given defaults, etc. I'll poke around a little more, but in summary: there's no numpy- sanctioned way to specify an output array for a dot(), AFAIK. This is a bit of an annoyance, I agree, though I seem to remember Robert Kern offering a fairly compelling argument why it's hard. I just don't know I believe I was only talking about why it would be hard to use a higher-precision accumulator. -- Robert Kern I have come to believe that the whole world is an enigma, a harmless enigma that is made terrible by our own mad attempt to interpret it as though it had an underlying truth. -- Umberto Eco ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] inplace dot products (Robert Kern) (Re: Numpy-discussion Digest, Vol 29, Issue 69)
Hello all, I've been toying with the idea of an extended precision accumulator for the dot product written in numpy/core/src/multiarraymodule.c Once the modification is being performed, there is no reason not to allow the specification of an output array. The functions that exist now: The routine cumsum already exists with an accumulator of a specific type. The output location is an optional argument. numpy.cumsum(a, axis=None, dtype=None, out=None) Various forms of inner product (dot product) also exist: numpy.tensordot(a, b, axes=2) Returns the tensor dot product for (ndim = 1) arrays along an axes. The first element of the sequence determines the axis or axes in `a` to sum over, and the second element in `axes` argument sequence determines the axis or axes in `b` to sum over. numpy.vdot(a,b) Returns the dot product of a and b for scalars and vectors of floating point and complex types. The first argument, a, is conjugated. numpy.innerproduct(a,b) Returns the inner product of a and b for arrays of floating point types. Like the generic NumPy equivalent the product sum is over the last dimension of a and b. NB: The first argument is not conjugated. The proposed extensions: numpy.tensordot(a, b, axes=2, dtype=None, out=None) numpy.vdot(a,b, dtype=None, out=None) numpy.innerproduct(a,b, dtype=None, out=None) (found in numpy/core/ src/multiarraymodule.c) Is this so difficult an extension to implement? Is there a better functional specification to be made? Granted, these routines do not exist in blas. Therefore, they wouldn't be speedy, at least without blas extensions. But... the key routines to make the generic extension already exist in numpy without blas. from numeric.py: When Numpy is built with an accelerated BLAS like ATLAS, these functions are replaced to make use of the faster implementations. The faster implementations only affect float32, float64, complex64, and complex128 arrays. Furthermore, the BLAS API only includes matrix-matrix, matrix-vector, and vector-vector products. Products of arrays with larger dimensionalities use the built in functions and are not accelerated. Looking at numpy/core/src/multiarraymodule.c: The existing routine that allows an accumulator specification and output array: static PyObject * PyArray_Sum(PyArrayObject *self, int axis, int rtype, PyArrayObject *out) The routine that would be modified and its new function prototypes: static PyObject * PyArray_InnerProduct(PyObject *op1, PyObject *op2, int rtype, PyArrayObject *out) Other C code that might be modified to support the new functionality is in arraytypes.inc.src. Routines here perform a dot product along one dimension for various argument dtypes and accumulator dtypes. It looks like the accumulator type is encoded in the #out signature. As I read it, there is no change needed to this source as all the different accumulator types are present in the output signature. /**begin repeat #name=BYTE, UBYTE, SHORT, USHORT, INT, UINT, LONG, ULONG, LONGLONG, ULONGLONG, FLOAT, DOUBLE, LONGDOUBLE# #type= byte, ubyte, short, ushort, int, uint, long, ulong, longlong, ulonglong, float, double, longdouble# #out= long, ulong, long, ulong, long, ulong, long, ulong, longlong, ulonglong, float, double, longdouble# */ static void @n...@_dot(char *ip1, intp is1, char *ip2, intp is2, char *op, intp n, So far, the changes dont seem to be too messy. Am I missing anything here? David On Feb 20, 2009, at 10:00 AM, numpy-discussion-requ...@scipy.org wrote: Message: 2 Date: Fri, 20 Feb 2009 09:52:03 -0600 From: Robert Kern robert.k...@gmail.com Subject: Re: [Numpy-discussion] inplace dot products To: Discussion of Numerical Python numpy-discussion@scipy.org Message-ID: 3d375d730902200752k6b71efc9gb1e0ac1d260d...@mail.gmail.com Content-Type: text/plain; charset=UTF-8 On Fri, Feb 20, 2009 at 05:18, David Warde-Farley d...@cs.toronto.edu wrote: Hi Olivier, There was this idea posted on the Scipy-user list a while back: http://projects.scipy.org/pipermail/scipy-user/2008-August/ 017954.html but it doesn't look like he got anywhere with it, or even got a response. I just tried it and I observe the same behaviour. A quick look at the SciPy sources tells me there is something fishy. subroutine tchar=s,d,c,zgemm (m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb) ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0) ! Calculate C - alpha * op(A) * op(B) + beta * C I don't read Fortran very well, but it seems to me as though the Fortran prototype doesn't match the python prototype. What do you mean? Based on the rest of the argument information in that block, f2py creates the Python prototype. For example, all of the m,n,k,lda,ka,ldb,kb dimensions are found from the input arrays themselves, optional arguments are given defaults, etc. I'll poke around
Re: [Numpy-discussion] inplace dot products
On 20-Feb-09, at 10:39 AM, Robert Kern wrote: Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something. Examples? scipy.lib.lapack.flapack.dpotri, for example. I'm not sure of the proper usage, but when I pass it an identity matrix, depending on whether overwrite_c is True or not, the memory pointed to by the variable gets overwritten. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
Re: [Numpy-discussion] inplace dot products
On 20-Feb-09, at 10:39 AM, Robert Kern wrote: Fiddling around I've found other blas/lapack functions where the same arg is offered but the choice actually does something. Examples? An even better example is scipy.linalg.fblas.dgemv, the matrix-vector equivalent of dgemm. overwrite_y behaves correctly there. David ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion
[Numpy-discussion] inplace dot products
Hi numpist people, I discovered the ufunc and there ability to compute the results on preallocated arrays: a = arange(10, dtype=float32) b = arange(10, dtype=float32) + 1 c = add(a, b, a) c is a True a array([ 1., 3., 5., 7., 9., 11., 13., 15., 17., 19.], dtype=float32) My questions is : is there a way to have an equivalent for the dot product operation: I want atlas to build my dot products without allocating a temporary array and reuse a preallocated array of results. Suppose I have: results = array((10, 3), dtype=float32) W = arange(6, dtype=float32).reshape((2, 3)) x = arange(20, dtype=float32).reshape((10, 2)) What I want is the equivalent of the following without the intermediate call to malloc: results[:] = dot(x, W) Any idea? I tried to introspect the various docstring of numpy core modules but I could not get any lead. -- Olivier ___ Numpy-discussion mailing list Numpy-discussion@scipy.org http://projects.scipy.org/mailman/listinfo/numpy-discussion