Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> Yes. However, it is worth making the distinction between > embarrassingly parallel problems and SIMD problems. Not all > embarrassingly parallel problems are SIMD-capable. GPUs do SIMD, not > generally embarrassing problems. GPUs exploit both dimensions of parallelism, both simd (aka vectorizati

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Robert Kern
On Thu, Sep 10, 2009 at 07:28, Francesc Alted wrote: > A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué: > >> On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote: > >> > The point is: are GPUs prepared to compete with a general-purpose CPUs > >> > in all-road operations, lik

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> I think whatever supported by the underlying CPU, whenever it is extended > double precision (12 bytes) or quad precision (16 bytes). classic 64 bit cpu's support neither. > > -- > > Francesc Alted > > ___ > NumPy-Discussion mailing list > NumPy-Discus

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 15:51:15 Rohit Garg escrigué: > Apart from float and double, which floating point formats are > supported by numpy? I think whatever supported by the underlying CPU, whenever it is extended double precision (12 bytes) or quad precision (16 bytes). -- Francesc Alted

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
Apart from float and double, which floating point formats are supported by numpy? On Thu, Sep 10, 2009 at 7:09 PM, Bruce Southey wrote: > On 09/10/2009 07:40 AM, Francesc Alted wrote: > > A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué: > >> > That's nice to see. I think I'll change my

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Bruce Southey
On 09/10/2009 07:40 AM, Francesc Alted wrote: A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué: > > That's nice to see. I think I'll change my mind if someone could perform > > a vector-vector multiplication (a operation that is typically > > memory-bounded) > > You mean a dot pr

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué: > > That's nice to see. I think I'll change my mind if someone could perform > > a vector-vector multiplication (a operation that is typically > > memory-bounded) > > You mean a dot product? Whatever, dot product or element-wise product.

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> That's nice to see. I think I'll change my mind if someone could perform a > vector-vector multiplication (a operation that is typically memory-bounded) You mean a dot product? -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of Technolog

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> a = np.cos(b) > > where b is a 1x1 matrix is *very* embarrassing (in the parallel > meaning of the term ;-) On this operation, gpu's will eat up cpu's like a pack of pirhanas. :) -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics Indian Institute of

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué: > On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote: > >The point is: are GPUs prepared to compete with a general-purpose CPUs > > in all-road operations, like evaluating transcendental functions, > > conditionals all o

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:40:48 Sturla Molden escrigué: > Francesc Alted skrev: > > Numexpr already uses the Python parser, instead of build a new one. > > However the bytecode emitted after the compilation process is > > different, of course. > > > > Also, I don't see the point in requiring

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> The point is: are GPUs prepared to compete with a general-purpose CPUs in > all-road operations, like evaluating transcendental functions, conditionals > all of this with a rich set of data types? Yup. -- Rohit Garg http://rpg-314.blogspot.com/ Senior Undergraduate Department of Physics India

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Matthieu Brucher
> Sure. Specially because NumPy is all about embarrasingly parallel problems > (after all, this is how an ufunc works, doing operations > element-by-element). > > The point is: are GPUs prepared to compete with a general-purpose CPUs in > all-road operations, like evaluating transcendental function

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Francesc Alted skrev: > > Numexpr already uses the Python parser, instead of build a new one. > However the bytecode emitted after the compilation process is > different, of course. > > Also, I don't see the point in requiring immutable buffers. Could you > develop this further? > If you do lacy

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Gael Varoquaux
On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote: >The point is: are GPUs prepared to compete with a general-purpose CPUs in >all-road operations, like evaluating transcendental functions, >conditionals all of this with a rich set of data types? I would like to >believ

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:20:21 Gael Varoquaux escrigué: > On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote: > >Where are you getting this info from? IMO the technology of memory in > >graphics boards cannot be so different than in commercial > > motherboards. It could b

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 10:58:13 Rohit Garg escrigué: > > Where are you getting this info from? IMO the technology of memory in > > graphics boards cannot be so different than in commercial motherboards. > > It could be a *bit* faster (at the expenses of packing less of it), but > > I'd say no

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Gael Varoquaux
On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote: >Where are you getting this info from? IMO the technology of memory in >graphics boards cannot be so different than in commercial motherboards. It >could be a *bit* faster (at the expenses of packing less of it), but I'd >

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 11:11:22 Sturla Molden escrigué: > Citi, Luca skrev: > > That is exactly why numexpr is faster in these cases. > > I hope one day numpy will be able to perform such > > optimizations. > > I think it is going to require lazy evaluation. Whenever possible, an > operator w

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Rohit Garg skrev: > gtx280-->141GBps-->has 1GB > ati4870-->115GBps-->has 1GB > ati5870-->153GBps (launches sept 22, 2009)-->2GB models will be there too > That is going to help if buffers are kept in graphics memory. But the problem is that graphics memory is a scarse resource. S.M.

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Sturla Molden
Citi, Luca skrev: > That is exactly why numexpr is faster in these cases. > I hope one day numpy will be able to perform such > optimizations. > I think it is going to require lazy evaluation. Whenever possible, an operator would just return a symbolic representation of the operation. This wou

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> Where are you getting this info from? IMO the technology of memory in > graphics boards cannot be so different than in commercial motherboards. It > could be a *bit* faster (at the expenses of packing less of it), but I'd say > not as much as 4x faster (100 GB/s vs 25 GB/s of Intel i7 in sequenti

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Citi, Luca
Hi Sturla, > The proper way to speed up "dot(a*b+c*sqrt(d), e)" is to get rid of > temporary intermediates. I implemented a patch http://projects.scipy.org/numpy/ticket/1153 that reduces the number of temporary intermediates. In your example from 4 to 2. There is a big improvement in terms of me

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Francesc Alted
A Thursday 10 September 2009 09:45:29 Rohit Garg escrigué: > > You do realize that the throughput from onboard (video) RAM is going > > to be much higher, right? It's not just the parallelization but the > > memory bandwidth. And as James pointed out, if you can keep most of > > your intermediate c

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-10 Thread Rohit Garg
> You do realize that the throughput from onboard (video) RAM is going > to be much higher, right? It's not just the parallelization but the > memory bandwidth. And as James pointed out, if you can keep most of > your intermediate computation on-card, you stand to benefit immensely, > even if doing

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Fernando Perez
On Wed, Sep 9, 2009 at 9:47 PM, Sturla Molden wrote: > James Bergstra skrev: >> Suppose you want to evaluate "dot(a*b+c*sqrt(d), e)".  The GPU is >> great for doing dot(), > The CPU is equally great (or better?) for doing dot(). In both cases: > > - memory access scale O(n) for dot producs. > - com

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread David Warde-Farley
On 10-Sep-09, at 12:47 AM, Sturla Molden wrote: > The CPU is equally great (or better?) for doing dot(). In both cases: > > - memory access scale O(n) for dot producs. > - computation scale O(n) for dot producs. > - memory is low > - computation is fast (faster for GPU) You do realize that the th

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Sturla Molden
James Bergstra skrev: > Suppose you want to evaluate "dot(a*b+c*sqrt(d), e)". The GPU is > great for doing dot(), The CPU is equally great (or better?) for doing dot(). In both cases: - memory access scale O(n) for dot producs. - computation scale O(n) for dot producs. - memory is low - computat

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Sturla Molden
George Dahl skrev: > I know that for my work, I can get around an order of a 50-fold speedup over > numpy using a python wrapper for a simple GPU matrix class. So I might be > dealing with a lot of matrix products where I multiply a fixed 512 by 784 matrix > by a 784 by 256 matrix that chan

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Dag Sverre Seljebotn
Christopher Barker wrote: > George Dahl wrote: >> Sturla Molden molden.no> writes: >>> Teraflops peak performance of modern GPUs is impressive. But NumPy >>> cannot easily benefit from that. > >> I know that for my work, I can get around an order of a 50-fold speedup over >> numpy using a pytho

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread James Bergstra
On Wed, Sep 9, 2009 at 10:41 AM, Francesc Alted wrote: >> Numexpr mainly supports functions that are meant to be used element-wise, >> so the operation/element ratio is normally 1 (or close to 1). In these >> scenarios is where improved memory access is much more important than CPU >> (or, for th

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Francesc Alted
A Wednesday 09 September 2009 11:26:06 Francesc Alted escrigué: > A Tuesday 08 September 2009 23:21:53 Christopher Barker escrigué: > > Also, perhaps a GPU-aware numexpr could be helpful which I think is the > > kind of thing that Sturla was refering to when she wrote: > > > > "Incidentally, this

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Lev Givon
Received from Francesc Alted on Wed, Sep 09, 2009 at 05:18:48AM EDT: (snip) > The point here is that matrix-matrix multiplications (or, in general, > functions with a large operation/element ratio) are a *tiny* part of all the > possible operations between arrays that NumPy supports. This is w

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Francesc Alted
A Tuesday 08 September 2009 23:21:53 Christopher Barker escrigué: > Also, perhaps a GPU-aware numexpr could be helpful which I think is the > kind of thing that Sturla was refering to when she wrote: > > "Incidentally, this will also make it easier to leverage on modern GPUs." Numexpr mainly sup

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-09 Thread Francesc Alted
A Tuesday 08 September 2009 21:19:05 George Dahl escrigué: > Sturla Molden molden.no> writes: > > Erik Tollerud skrev: > > >> NumPy arrays on the GPU memory is an easy task. But then I would have > > >> to write the computation in OpenCL's dialect of C99? > > > > > > This is true to some extent, b

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-08 Thread Christopher Barker
George Dahl wrote: > Sturla Molden molden.no> writes: >> Teraflops peak performance of modern GPUs is impressive. But NumPy >> cannot easily benefit from that. > I know that for my work, I can get around an order of a 50-fold speedup over > numpy using a python wrapper for a simple GPU matrix c

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-08 Thread George Dahl
Sturla Molden molden.no> writes: > > Erik Tollerud skrev: > >> NumPy arrays on the GPU memory is an easy task. But then I would have to > >> write the computation in OpenCL's dialect of C99? > > This is true to some extent, but also probably difficult to do given > > the fact that paralellizabl

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-09-02 Thread Romain Brette
Hi everyone, In case anyone is interested, I just set up a google group to discuss GPU-based simulation for our Python neural simulator Brian: http://groups.google.fr/group/brian-on-gpu Our simulator relies heavily Numpy. I would be very happy if the GPU experts here would like to share their ex

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-21 Thread Sturla Molden
Erik Tollerud skrev: >> NumPy arrays on the GPU memory is an easy task. But then I would have to >> write the computation in OpenCL's dialect of C99? > This is true to some extent, but also probably difficult to do given > the fact that paralellizable algorithms are generally more difficult > to f

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-20 Thread Erik Tollerud
I realize this topic is a bit old, but I couldn't help but add something I forgot to mention earlier... >> I mean, once the computations are moved elsewhere numpy is basically a >> convenient way to address memory. > > That is how I mostly use NumPy, though. Computations I often do in > Fortran 95

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-07 Thread Romain Brette
Sturla Molden a écrit : > Thus, here is my plan: > > 1. a special context-manager class > 2. immutable arrays inside with statement > 3. lazy evaluation: expressions build up a parse tree > 4. dynamic code generation > 5. evaluation on exit > There seems to be some similarity with what we want t

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Robert Kern
On Thu, Aug 6, 2009 at 19:00, Fernando Perez wrote: > On Thu, Aug 6, 2009 at 1:57 PM, Sturla Molden wrote: >> In order to reduce the effect of immutable arrays, we could introduce a >> context-manager. Inside the with statement, all arrays would be >> immutable. Second, the __exit__ method could tr

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Fernando Perez
On Thu, Aug 6, 2009 at 1:57 PM, Sturla Molden wrote: > In order to reduce the effect of immutable arrays, we could introduce a > context-manager. Inside the with statement, all arrays would be > immutable. Second, the __exit__ method could trigger the code generator > and do all the evaluation. So

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 5:10 PM, Sturla Molden wrote: > Charles R Harris wrote: > > > I mean, once the computations are moved elsewhere numpy is basically a > > convenient way to address memory. > > That is how I mostly use NumPy, though. Computations I often do in > Fortran 95 or C. > > NumPy arr

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
James Bergstra wrote: > The plan you describe is a good one, and Theano > (www.pylearn.org/theano) almost exactly implements it. You should > check it out. It does not use 'with' syntax at the moment, but it > could provide the backend machinery for your mechanism if you want to > go forward with

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Charles R Harris wrote: > I mean, once the computations are moved elsewhere numpy is basically a > convenient way to address memory. That is how I mostly use NumPy, though. Computations I often do in Fortran 95 or C. NumPy arrays on the GPU memory is an easy task. But then I would have to wri

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 4:36 PM, Sturla Molden wrote: > Charles R Harris wrote: > > Whether the code that gets compiled is written using lazy evaluation > > (ala Sturla), or is expressed some other way seems like an independent > > issue. It sounds like one important thing would be having arrays t

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Sturla Molden wrote: > Memory management is slow compared to computation. Operations like > malloc, free and memcpy is not faster for VRAM than for RAM. Actually it's not VRAM anymore, but whatever you call the memory dedicated to the GPU. It is cheap to put 8 GB of RAM into a computer, but gr

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Charles R Harris wrote: > Whether the code that gets compiled is written using lazy evaluation > (ala Sturla), or is expressed some other way seems like an independent > issue. It sounds like one important thing would be having arrays that > reside on the GPU. Memory management is slow compared

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 3:29 PM, James Bergstra wrote: > On Thu, Aug 6, 2009 at 4:57 PM, Sturla Molden wrote: > > > >> Now linear algebra or FFTs on a GPU would probably be a huge boon, > >> I'll admit - especially if it's in the form of a drop-in replacement > >> for the numpy or scipy versions. >

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread James Bergstra
On Thu, Aug 6, 2009 at 4:57 PM, Sturla Molden wrote: > >> Now linear algebra or FFTs on a GPU would probably be a huge boon, >> I'll admit - especially if it's in the form of a drop-in replacement >> for the numpy or scipy versions. > > > NumPy generate temporary arrays for expressions involving nd

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
Robert Kern wrote: > I believe that is exactly the point that Erik is making. :-) > I wasn't arguing against him, just suggesting a solution. :-) I have big hopes for lazy evaluation, if we can find a way to to it right. Sturla ___ NumPy-Discussion m

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Robert Kern
On Thu, Aug 6, 2009 at 15:57, Sturla Molden wrote: > >> Now linear algebra or FFTs on a GPU would probably be a huge boon, >> I'll admit - especially if it's in the form of a drop-in replacement >> for the numpy or scipy versions. > > NumPy generate temporary arrays for expressions involving ndarra

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Sturla Molden
> Now linear algebra or FFTs on a GPU would probably be a huge boon, > I'll admit - especially if it's in the form of a drop-in replacement > for the numpy or scipy versions. NumPy generate temporary arrays for expressions involving ndarrays. This extra allocation and copying often takes more

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread David Warde-Farley
On 6-Aug-09, at 2:54 PM, Erik Tollerud wrote: > Now linear algebra or FFTs on a GPU would probably be a huge boon, > I'll > admit - especially if it's in the form of a drop-in replacement for > the > numpy or scipy versions. The word I'm hearing from people in my direct acquaintance who are

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Matthieu Brucher
2009/8/6 Erik Tollerud : > Note that this is from a "user" perspective, as I have no particular plan of > developing the details of this implementation, but I've thought for a long > time that GPU support could be great for numpy (I would also vote for OpenCL > support over cuda, although conceptua

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Erik Tollerud
Note that this is from a "user" perspective, as I have no particular plan of developing the details of this implementation, but I've thought for a long time that GPU support could be great for numpy (I would also vote for OpenCL support over cuda, although conceptually they seem quite similar)... B

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread James Bergstra
On Thu, Aug 6, 2009 at 1:19 PM, Charles R Harris wrote: > I almost looks like you are reimplementing numpy, in c++ no less. Is there > any reason why you aren't working with a numpy branch and just adding > ufuncs? I don't know how that would work. The Ufuncs need a datatype to work with, and AFA

Re: [Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread Charles R Harris
On Thu, Aug 6, 2009 at 11:12 AM, James Bergstra wrote: > >David Warde-Farley cs.toronto.edu> writes: > >> It did inspire some of our colleagues in Montreal to create this, > >> though: > >> > >> http://code.google.com/p/cuda-ndarray/ > >> > >> I gather it is VERY early in development, but I'

[Numpy-discussion] Fwd: GPU Numpy

2009-08-06 Thread James Bergstra
>David Warde-Farley cs.toronto.edu> writes: >> It did inspire some of our colleagues in Montreal to create this, >> though: >> >>      http://code.google.com/p/cuda-ndarray/ >> >> I gather it is VERY early in development, but I'm sure they'd love >> contributions! >> > >Hi David, >That does look q