Re: [Numpy-discussion] inplace dot products

2009-03-16 Thread David Warde-Farley
On 20-Feb-09, at 6:41 AM, Olivier Grisel wrote:

 Alright, thanks for the reply.

 Is there a canonical way /sample code to gain low level access to  
 blas / lapack
 atlas routines using ctypes from numpy / scipy code?

 I don't mind fixing the dimensions and the ndtype of my array if it  
 can
 decrease the memory overhead.

I got some  clarification from Pearu Peterson off-list.

For gemm the issue is that if the matrix C is not Fortran-ordered, it  
will be copied, and that copy will be over-written. order='F' when  
creating the array being overwritten will fix this.

DWF
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://mail.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread David Warde-Farley
Hi Olivier,

There was this idea posted on the Scipy-user list a while back:

http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html

but it doesn't look like he got anywhere with it, or even got a  
response.

I just tried it and I observe the same behaviour. A quick look at the  
SciPy sources tells me there is something fishy.

subroutine  
 
tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb)
   ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0)
   ! Calculate C - alpha * op(A) * op(B) + beta * C

I don't read Fortran very well, but it seems to me as though the  
Fortran prototype doesn't match the python prototype.

I'll poke around a little more, but in summary: there's no numpy- 
sanctioned way to specify an output array for a dot(), AFAIK. This is  
a bit of an annoyance, I agree, though I seem to remember Robert Kern  
offering a fairly compelling argument why it's hard. I just don't know  
what that argument is :)

David


On 18-Feb-09, at 4:06 AM, Olivier Grisel wrote:

 Hi numpist people,

 I discovered the ufunc and there ability to compute the results on
 preallocated arrays:

 a = arange(10, dtype=float32)
 b = arange(10, dtype=float32) + 1
 c = add(a, b, a)
 c is a
True
 a
array([  1.,   3.,   5.,   7.,   9.,  11.,  13.,  15.,  17.,
 19.], dtype=float32)

 My questions is : is there a way to have an equivalent for the dot
 product operation:
 I want atlas to build my dot products without allocating a temporary
 array and reuse
 a preallocated array of results. Suppose I have:

 results = array((10, 3), dtype=float32)
 W = arange(6, dtype=float32).reshape((2, 3))
 x = arange(20, dtype=float32).reshape((10, 2))

 What I want is the equivalent of the following without the
 intermediate call to malloc:

 results[:] = dot(x, W)

 Any idea? I tried to introspect the various docstring of numpy core
 modules but I
 could not get any lead.

 -- 
 Olivier
 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread Olivier Grisel
2009/2/20 David Warde-Farley d...@cs.toronto.edu:
 Hi Olivier,

 There was this idea posted on the Scipy-user list a while back:

        http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html

 but it doesn't look like he got anywhere with it, or even got a
 response.

 I just tried it and I observe the same behaviour. A quick look at the
 SciPy sources tells me there is something fishy.

 subroutine
 
 tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb)
   ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0)
   ! Calculate C - alpha * op(A) * op(B) + beta * C

 I don't read Fortran very well, but it seems to me as though the
 Fortran prototype doesn't match the python prototype.

 I'll poke around a little more, but in summary: there's no numpy-
 sanctioned way to specify an output array for a dot(), AFAIK. This is
 a bit of an annoyance, I agree, though I seem to remember Robert Kern
 offering a fairly compelling argument why it's hard. I just don't know
 what that argument is :)


Alright, thanks for the reply.

Is there a canonical way /sample code to gain low level access to blas / lapack
atlas routines using ctypes from numpy / scipy code?

I don't mind fixing the dimensions and the ndtype of my array if it can
decrease the memory overhead.

BTW, Robert if your insight on this topic would be very much appreciated.

-- 
Olivier
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread David Cournapeau
Olivier Grisel wrote:
 2009/2/20 David Warde-Farley d...@cs.toronto.edu:
   
 Hi Olivier,

 There was this idea posted on the Scipy-user list a while back:

http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html

 but it doesn't look like he got anywhere with it, or even got a
 response.

 I just tried it and I observe the same behaviour. A quick look at the
 SciPy sources tells me there is something fishy.

 subroutine
 
 tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb)
   ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0)
   ! Calculate C - alpha * op(A) * op(B) + beta * C

 I don't read Fortran very well, but it seems to me as though the
 Fortran prototype doesn't match the python prototype.

 I'll poke around a little more, but in summary: there's no numpy-
 sanctioned way to specify an output array for a dot(), AFAIK. This is
 a bit of an annoyance, I agree, though I seem to remember Robert Kern
 offering a fairly compelling argument why it's hard. I just don't know
 what that argument is :)

 

 Alright, thanks for the reply.

 Is there a canonical way /sample code to gain low level access to blas / 
 lapack
 atlas routines using ctypes from numpy / scipy code?
   

You can just use ctypes to access ATLAS, as you would do for any
library. Or do you mean something else ?

cheers,

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread David Warde-Farley

On 20-Feb-09, at 6:39 AM, David Cournapeau wrote:

 You can just use ctypes to access ATLAS, as you would do for any
 library. Or do you mean something else ?

Say, David... :)

Do you have any idea why the pyf wrapper for fblas3 completely ignores  
the overwrite_c argument? Fiddling around I've found other blas/lapack  
functions where the same arg is offered but the choice actually does  
something.

Regards,

(Other) David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread Robert Kern
On Fri, Feb 20, 2009 at 06:25, David Warde-Farley d...@cs.toronto.edu wrote:

 On 20-Feb-09, at 6:39 AM, David Cournapeau wrote:

 You can just use ctypes to access ATLAS, as you would do for any
 library. Or do you mean something else ?

 Say, David... :)

 Do you have any idea why the pyf wrapper for fblas3 completely ignores
 the overwrite_c argument?

I believe the copy is the culprit.

  double precision dimension(m,n),intent(in,out,copy),depend(m,n),optional :: c

 Fiddling around I've found other blas/lapack
 functions where the same arg is offered but the choice actually does
 something.

Examples?

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread Robert Kern
On Fri, Feb 20, 2009 at 05:18, David Warde-Farley d...@cs.toronto.edu wrote:
 Hi Olivier,

 There was this idea posted on the Scipy-user list a while back:

http://projects.scipy.org/pipermail/scipy-user/2008-August/017954.html

 but it doesn't look like he got anywhere with it, or even got a
 response.

 I just tried it and I observe the same behaviour. A quick look at the
 SciPy sources tells me there is something fishy.

 subroutine
 
 tchar=s,d,c,zgemm(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb)
   ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0)
   ! Calculate C - alpha * op(A) * op(B) + beta * C

 I don't read Fortran very well, but it seems to me as though the
 Fortran prototype doesn't match the python prototype.

What do you mean? Based on the rest of the argument information in
that block, f2py creates the Python prototype. For example, all of the
m,n,k,lda,ka,ldb,kb dimensions are found from the input arrays
themselves, optional arguments are given defaults, etc.

 I'll poke around a little more, but in summary: there's no numpy-
 sanctioned way to specify an output array for a dot(), AFAIK. This is
 a bit of an annoyance, I agree, though I seem to remember Robert Kern
 offering a fairly compelling argument why it's hard. I just don't know

I believe I was only talking about why it would be hard to use a
higher-precision accumulator.

-- 
Robert Kern

I have come to believe that the whole world is an enigma, a harmless
enigma that is made terrible by our own mad attempt to interpret it as
though it had an underlying truth.
  -- Umberto Eco
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] inplace dot products (Robert Kern) (Re: Numpy-discussion Digest, Vol 29, Issue 69)

2009-02-20 Thread David Henderson

Hello all,

I've been toying with the idea of an extended precision accumulator  
for the dot product written in numpy/core/src/multiarraymodule.c


Once the modification is being performed, there is no reason not to  
allow the specification of an output array.


The functions that exist now:

The routine cumsum already exists with an accumulator of a specific  
type. The output location is an optional argument.

numpy.cumsum(a, axis=None, dtype=None, out=None)

Various forms of inner product (dot product) also exist:

numpy.tensordot(a, b, axes=2)
Returns the tensor dot product for (ndim = 1) arrays along an axes.
The first element of the sequence determines the axis or axes
in `a` to sum over, and the second element in `axes` argument sequence
determines the axis or axes in `b` to sum over.

numpy.vdot(a,b)
Returns the dot product of a and b for scalars and vectors
of floating point and complex types.  The first argument, a, is  
conjugated.


numpy.innerproduct(a,b)
Returns the inner product of a and b for arrays of floating point types.
Like the generic NumPy equivalent the product sum is over
the last dimension of a and b.
NB: The first argument is not conjugated.

The proposed extensions:

numpy.tensordot(a, b, axes=2, dtype=None, out=None)
numpy.vdot(a,b, dtype=None, out=None)
numpy.innerproduct(a,b, dtype=None, out=None)   (found in numpy/core/ 
src/multiarraymodule.c)


Is this so difficult an extension to implement? Is there a better  
functional specification to be made?


Granted, these routines do not exist in blas. Therefore, they  
wouldn't be speedy, at least without blas extensions. But... the key  
routines to make the generic extension already exist in numpy without  
blas.


from numeric.py:
When Numpy is built with an accelerated BLAS like ATLAS,  
these functions
are replaced to make use of the faster implementations.  The  
faster
implementations only affect float32, float64, complex64, and  
complex128

arrays. Furthermore, the BLAS API only includes matrix-matrix,
matrix-vector, and vector-vector products. Products of  
arrays with larger
dimensionalities use the built in functions and are not  
accelerated.



Looking at numpy/core/src/multiarraymodule.c:

The existing routine that allows an accumulator specification and  
output array:


static PyObject *
PyArray_Sum(PyArrayObject *self, int axis, int rtype, PyArrayObject  
*out)


The routine that would be modified and its new function prototypes:

static PyObject *
PyArray_InnerProduct(PyObject *op1, PyObject *op2, int rtype,  
PyArrayObject *out)


Other C code that might be modified to support the new functionality  
is in arraytypes.inc.src. Routines here perform a dot product along  
one dimension for various argument dtypes and accumulator dtypes.


It looks like the accumulator type is encoded in the #out signature.  
As I read it, there is no change needed to this source as all the  
different accumulator types are present in the output signature.


/**begin repeat
#name=BYTE, UBYTE, SHORT, USHORT, INT, UINT, LONG, ULONG, LONGLONG,  
ULONGLONG, FLOAT, DOUBLE, LONGDOUBLE#
#type= byte, ubyte, short, ushort, int, uint, long, ulong, longlong,  
ulonglong, float, double, longdouble#
#out= long, ulong, long, ulong, long, ulong, long, ulong, longlong,  
ulonglong, float, double, longdouble#

*/
static void
@n...@_dot(char *ip1, intp is1, char *ip2, intp is2, char *op, intp n,


So far, the changes dont seem to be too messy. Am I missing anything  
here?


David



On Feb 20, 2009, at 10:00 AM, numpy-discussion-requ...@scipy.org wrote:


Message: 2
Date: Fri, 20 Feb 2009 09:52:03 -0600
From: Robert Kern robert.k...@gmail.com
Subject: Re: [Numpy-discussion] inplace dot products
To: Discussion of Numerical Python numpy-discussion@scipy.org
Message-ID:
3d375d730902200752k6b71efc9gb1e0ac1d260d...@mail.gmail.com
Content-Type: text/plain; charset=UTF-8

On Fri, Feb 20, 2009 at 05:18, David Warde-Farley  
d...@cs.toronto.edu wrote:

Hi Olivier,

There was this idea posted on the Scipy-user list a while back:

   http://projects.scipy.org/pipermail/scipy-user/2008-August/ 
017954.html


but it doesn't look like he got anywhere with it, or even got a
response.

I just tried it and I observe the same behaviour. A quick look at the
SciPy sources tells me there is something fishy.

subroutine

tchar=s,d,c,zgemm 
(m,n,k,alpha,a,b,beta,c,trans_a,trans_b,lda,ka,ldb,kb)

  ! c = gemm(alpha,a,b,beta=0,c=0,trans_a=0,trans_b=0,overwrite_c=0)
  ! Calculate C - alpha * op(A) * op(B) + beta * C

I don't read Fortran very well, but it seems to me as though the
Fortran prototype doesn't match the python prototype.


What do you mean? Based on the rest of the argument information in
that block, f2py creates the Python prototype. For example, all of the
m,n,k,lda,ka,ldb,kb dimensions are found from the input arrays
themselves, optional arguments are given defaults, etc.


I'll poke around

Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread David Warde-Farley
On 20-Feb-09, at 10:39 AM, Robert Kern wrote:

 Fiddling around I've found other blas/lapack
 functions where the same arg is offered but the choice actually does
 something.

 Examples?

scipy.lib.lapack.flapack.dpotri, for example. I'm not sure of the  
proper usage, but when I pass it an identity matrix, depending on  
whether overwrite_c is True or not, the memory pointed to by the  
variable gets overwritten.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] inplace dot products

2009-02-20 Thread David Warde-Farley
On 20-Feb-09, at 10:39 AM, Robert Kern wrote:

 Fiddling around I've found other blas/lapack
 functions where the same arg is offered but the choice actually does
 something.

 Examples?

An even better example is scipy.linalg.fblas.dgemv, the matrix-vector  
equivalent of dgemm. overwrite_y behaves correctly there.

David
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] inplace dot products

2009-02-18 Thread Olivier Grisel
Hi numpist people,

I discovered the ufunc and there ability to compute the results on
preallocated arrays:

 a = arange(10, dtype=float32)
 b = arange(10, dtype=float32) + 1
 c = add(a, b, a)
 c is a
True
 a
array([  1.,   3.,   5.,   7.,   9.,  11.,  13.,  15.,  17.,
19.], dtype=float32)

My questions is : is there a way to have an equivalent for the dot
product operation:
I want atlas to build my dot products without allocating a temporary
array and reuse
a preallocated array of results. Suppose I have:

 results = array((10, 3), dtype=float32)
 W = arange(6, dtype=float32).reshape((2, 3))
 x = arange(20, dtype=float32).reshape((10, 2))

What I want is the equivalent of the following without the
intermediate call to malloc:

 results[:] = dot(x, W)

Any idea? I tried to introspect the various docstring of numpy core
modules but I
could not get any lead.

-- 
Olivier
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion