> Yes. However, it is worth making the distinction between
> embarrassingly parallel problems and SIMD problems. Not all
> embarrassingly parallel problems are SIMD-capable. GPUs do SIMD, not
> generally embarrassing problems.
GPUs exploit both dimensions of parallelism, both simd (aka
vectorizati
On Thu, Sep 10, 2009 at 07:28, Francesc Alted wrote:
> A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué:
>
>> On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote:
>
>> > The point is: are GPUs prepared to compete with a general-purpose CPUs
>
>> > in all-road operations, lik
> I think whatever supported by the underlying CPU, whenever it is extended
> double precision (12 bytes) or quad precision (16 bytes).
classic 64 bit cpu's support neither.
>
> --
>
> Francesc Alted
>
> ___
> NumPy-Discussion mailing list
> NumPy-Discus
A Thursday 10 September 2009 15:51:15 Rohit Garg escrigué:
> Apart from float and double, which floating point formats are
> supported by numpy?
I think whatever supported by the underlying CPU, whenever it is extended
double precision (12 bytes) or quad precision (16 bytes).
--
Francesc Alted
Apart from float and double, which floating point formats are
supported by numpy?
On Thu, Sep 10, 2009 at 7:09 PM, Bruce Southey wrote:
> On 09/10/2009 07:40 AM, Francesc Alted wrote:
>
> A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué:
>
>> > That's nice to see. I think I'll change my
On 09/10/2009 07:40 AM, Francesc Alted wrote:
A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué:
> > That's nice to see. I think I'll change my mind if someone could
perform
> > a vector-vector multiplication (a operation that is typically
> > memory-bounded)
>
> You mean a dot pr
A Thursday 10 September 2009 14:36:16 Rohit Garg escrigué:
> > That's nice to see. I think I'll change my mind if someone could perform
> > a vector-vector multiplication (a operation that is typically
> > memory-bounded)
>
> You mean a dot product?
Whatever, dot product or element-wise product.
> That's nice to see. I think I'll change my mind if someone could perform a
> vector-vector multiplication (a operation that is typically memory-bounded)
You mean a dot product?
--
Rohit Garg
http://rpg-314.blogspot.com/
Senior Undergraduate
Department of Physics
Indian Institute of Technolog
> a = np.cos(b)
>
> where b is a 1x1 matrix is *very* embarrassing (in the parallel
> meaning of the term ;-)
On this operation, gpu's will eat up cpu's like a pack of pirhanas. :)
--
Rohit Garg
http://rpg-314.blogspot.com/
Senior Undergraduate
Department of Physics
Indian Institute of
A Thursday 10 September 2009 11:37:24 Gael Varoquaux escrigué:
> On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote:
> >The point is: are GPUs prepared to compete with a general-purpose CPUs
> > in all-road operations, like evaluating transcendental functions,
> > conditionals all o
A Thursday 10 September 2009 11:40:48 Sturla Molden escrigué:
> Francesc Alted skrev:
> > Numexpr already uses the Python parser, instead of build a new one.
> > However the bytecode emitted after the compilation process is
> > different, of course.
> >
> > Also, I don't see the point in requiring
> The point is: are GPUs prepared to compete with a general-purpose CPUs in
> all-road operations, like evaluating transcendental functions, conditionals
> all of this with a rich set of data types?
Yup.
--
Rohit Garg
http://rpg-314.blogspot.com/
Senior Undergraduate
Department of Physics
India
> Sure. Specially because NumPy is all about embarrasingly parallel problems
> (after all, this is how an ufunc works, doing operations
> element-by-element).
>
> The point is: are GPUs prepared to compete with a general-purpose CPUs in
> all-road operations, like evaluating transcendental function
Francesc Alted skrev:
>
> Numexpr already uses the Python parser, instead of build a new one.
> However the bytecode emitted after the compilation process is
> different, of course.
>
> Also, I don't see the point in requiring immutable buffers. Could you
> develop this further?
>
If you do lacy
On Thu, Sep 10, 2009 at 11:29:49AM +0200, Francesc Alted wrote:
>The point is: are GPUs prepared to compete with a general-purpose CPUs in
>all-road operations, like evaluating transcendental functions,
>conditionals all of this with a rich set of data types? I would like to
>believ
A Thursday 10 September 2009 11:20:21 Gael Varoquaux escrigué:
> On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote:
> >Where are you getting this info from? IMO the technology of memory in
> >graphics boards cannot be so different than in commercial
> > motherboards. It could b
A Thursday 10 September 2009 10:58:13 Rohit Garg escrigué:
> > Where are you getting this info from? IMO the technology of memory in
> > graphics boards cannot be so different than in commercial motherboards.
> > It could be a *bit* faster (at the expenses of packing less of it), but
> > I'd say no
On Thu, Sep 10, 2009 at 10:36:27AM +0200, Francesc Alted wrote:
>Where are you getting this info from? IMO the technology of memory in
>graphics boards cannot be so different than in commercial motherboards. It
>could be a *bit* faster (at the expenses of packing less of it), but I'd
>
A Thursday 10 September 2009 11:11:22 Sturla Molden escrigué:
> Citi, Luca skrev:
> > That is exactly why numexpr is faster in these cases.
> > I hope one day numpy will be able to perform such
> > optimizations.
>
> I think it is going to require lazy evaluation. Whenever possible, an
> operator w
Rohit Garg skrev:
> gtx280-->141GBps-->has 1GB
> ati4870-->115GBps-->has 1GB
> ati5870-->153GBps (launches sept 22, 2009)-->2GB models will be there too
>
That is going to help if buffers are kept in graphics memory. But the
problem is that graphics memory is a scarse resource.
S.M.
Citi, Luca skrev:
> That is exactly why numexpr is faster in these cases.
> I hope one day numpy will be able to perform such
> optimizations.
>
I think it is going to require lazy evaluation. Whenever possible, an
operator would just return a symbolic representation of the operation.
This wou
> Where are you getting this info from? IMO the technology of memory in
> graphics boards cannot be so different than in commercial motherboards. It
> could be a *bit* faster (at the expenses of packing less of it), but I'd say
> not as much as 4x faster (100 GB/s vs 25 GB/s of Intel i7 in sequenti
Hi Sturla,
> The proper way to speed up "dot(a*b+c*sqrt(d), e)" is to get rid of
> temporary intermediates.
I implemented a patch
http://projects.scipy.org/numpy/ticket/1153
that reduces the number of temporary intermediates.
In your example from 4 to 2.
There is a big improvement in terms of me
A Thursday 10 September 2009 09:45:29 Rohit Garg escrigué:
> > You do realize that the throughput from onboard (video) RAM is going
> > to be much higher, right? It's not just the parallelization but the
> > memory bandwidth. And as James pointed out, if you can keep most of
> > your intermediate c
> You do realize that the throughput from onboard (video) RAM is going
> to be much higher, right? It's not just the parallelization but the
> memory bandwidth. And as James pointed out, if you can keep most of
> your intermediate computation on-card, you stand to benefit immensely,
> even if doing
On Wed, Sep 9, 2009 at 9:47 PM, Sturla Molden wrote:
> James Bergstra skrev:
>> Suppose you want to evaluate "dot(a*b+c*sqrt(d), e)". The GPU is
>> great for doing dot(),
> The CPU is equally great (or better?) for doing dot(). In both cases:
>
> - memory access scale O(n) for dot producs.
> - com
On 10-Sep-09, at 12:47 AM, Sturla Molden wrote:
> The CPU is equally great (or better?) for doing dot(). In both cases:
>
> - memory access scale O(n) for dot producs.
> - computation scale O(n) for dot producs.
> - memory is low
> - computation is fast (faster for GPU)
You do realize that the th
James Bergstra skrev:
> Suppose you want to evaluate "dot(a*b+c*sqrt(d), e)". The GPU is
> great for doing dot(),
The CPU is equally great (or better?) for doing dot(). In both cases:
- memory access scale O(n) for dot producs.
- computation scale O(n) for dot producs.
- memory is low
- computat
George Dahl skrev:
> I know that for my work, I can get around an order of a 50-fold
speedup over
> numpy using a python wrapper for a simple GPU matrix class. So I
might be
> dealing with a lot of matrix products where I multiply a fixed 512 by
784 matrix
> by a 784 by 256 matrix that chan
Christopher Barker wrote:
> George Dahl wrote:
>> Sturla Molden molden.no> writes:
>>> Teraflops peak performance of modern GPUs is impressive. But NumPy
>>> cannot easily benefit from that.
>
>> I know that for my work, I can get around an order of a 50-fold speedup over
>> numpy using a pytho
On Wed, Sep 9, 2009 at 10:41 AM, Francesc Alted wrote:
>> Numexpr mainly supports functions that are meant to be used element-wise,
>> so the operation/element ratio is normally 1 (or close to 1). In these
>> scenarios is where improved memory access is much more important than CPU
>> (or, for th
A Wednesday 09 September 2009 11:26:06 Francesc Alted escrigué:
> A Tuesday 08 September 2009 23:21:53 Christopher Barker escrigué:
> > Also, perhaps a GPU-aware numexpr could be helpful which I think is the
> > kind of thing that Sturla was refering to when she wrote:
> >
> > "Incidentally, this
Received from Francesc Alted on Wed, Sep 09, 2009 at 05:18:48AM EDT:
(snip)
> The point here is that matrix-matrix multiplications (or, in general,
> functions with a large operation/element ratio) are a *tiny* part of all the
> possible operations between arrays that NumPy supports. This is w
A Tuesday 08 September 2009 23:21:53 Christopher Barker escrigué:
> Also, perhaps a GPU-aware numexpr could be helpful which I think is the
> kind of thing that Sturla was refering to when she wrote:
>
> "Incidentally, this will also make it easier to leverage on modern GPUs."
Numexpr mainly sup
A Tuesday 08 September 2009 21:19:05 George Dahl escrigué:
> Sturla Molden molden.no> writes:
> > Erik Tollerud skrev:
> > >> NumPy arrays on the GPU memory is an easy task. But then I would have
> > >> to write the computation in OpenCL's dialect of C99?
> > >
> > > This is true to some extent, b
George Dahl wrote:
> Sturla Molden molden.no> writes:
>> Teraflops peak performance of modern GPUs is impressive. But NumPy
>> cannot easily benefit from that.
> I know that for my work, I can get around an order of a 50-fold speedup over
> numpy using a python wrapper for a simple GPU matrix c
Sturla Molden molden.no> writes:
>
> Erik Tollerud skrev:
> >> NumPy arrays on the GPU memory is an easy task. But then I would have to
> >> write the computation in OpenCL's dialect of C99?
> > This is true to some extent, but also probably difficult to do given
> > the fact that paralellizabl
Hi everyone,
In case anyone is interested, I just set up a google group to discuss
GPU-based simulation for our Python neural simulator Brian:
http://groups.google.fr/group/brian-on-gpu
Our simulator relies heavily Numpy. I would be very happy if the GPU
experts here would like to share their ex
Erik Tollerud skrev:
>> NumPy arrays on the GPU memory is an easy task. But then I would have to
>> write the computation in OpenCL's dialect of C99?
> This is true to some extent, but also probably difficult to do given
> the fact that paralellizable algorithms are generally more difficult
> to f
I realize this topic is a bit old, but I couldn't help but add
something I forgot to mention earlier...
>> I mean, once the computations are moved elsewhere numpy is basically a
>> convenient way to address memory.
>
> That is how I mostly use NumPy, though. Computations I often do in
> Fortran 95
Sturla Molden a écrit :
> Thus, here is my plan:
>
> 1. a special context-manager class
> 2. immutable arrays inside with statement
> 3. lazy evaluation: expressions build up a parse tree
> 4. dynamic code generation
> 5. evaluation on exit
>
There seems to be some similarity with what we want t
On Thu, Aug 6, 2009 at 19:00, Fernando Perez wrote:
> On Thu, Aug 6, 2009 at 1:57 PM, Sturla Molden wrote:
>> In order to reduce the effect of immutable arrays, we could introduce a
>> context-manager. Inside the with statement, all arrays would be
>> immutable. Second, the __exit__ method could tr
On Thu, Aug 6, 2009 at 1:57 PM, Sturla Molden wrote:
> In order to reduce the effect of immutable arrays, we could introduce a
> context-manager. Inside the with statement, all arrays would be
> immutable. Second, the __exit__ method could trigger the code generator
> and do all the evaluation. So
On Thu, Aug 6, 2009 at 5:10 PM, Sturla Molden wrote:
> Charles R Harris wrote:
>
> > I mean, once the computations are moved elsewhere numpy is basically a
> > convenient way to address memory.
>
> That is how I mostly use NumPy, though. Computations I often do in
> Fortran 95 or C.
>
> NumPy arr
James Bergstra wrote:
> The plan you describe is a good one, and Theano
> (www.pylearn.org/theano) almost exactly implements it. You should
> check it out. It does not use 'with' syntax at the moment, but it
> could provide the backend machinery for your mechanism if you want to
> go forward with
Charles R Harris wrote:
> I mean, once the computations are moved elsewhere numpy is basically a
> convenient way to address memory.
That is how I mostly use NumPy, though. Computations I often do in
Fortran 95 or C.
NumPy arrays on the GPU memory is an easy task. But then I would have to
wri
On Thu, Aug 6, 2009 at 4:36 PM, Sturla Molden wrote:
> Charles R Harris wrote:
> > Whether the code that gets compiled is written using lazy evaluation
> > (ala Sturla), or is expressed some other way seems like an independent
> > issue. It sounds like one important thing would be having arrays t
Sturla Molden wrote:
> Memory management is slow compared to computation. Operations like
> malloc, free and memcpy is not faster for VRAM than for RAM.
Actually it's not VRAM anymore, but whatever you call the memory
dedicated to the GPU.
It is cheap to put 8 GB of RAM into a computer, but gr
Charles R Harris wrote:
> Whether the code that gets compiled is written using lazy evaluation
> (ala Sturla), or is expressed some other way seems like an independent
> issue. It sounds like one important thing would be having arrays that
> reside on the GPU.
Memory management is slow compared
On Thu, Aug 6, 2009 at 3:29 PM, James Bergstra wrote:
> On Thu, Aug 6, 2009 at 4:57 PM, Sturla Molden wrote:
> >
> >> Now linear algebra or FFTs on a GPU would probably be a huge boon,
> >> I'll admit - especially if it's in the form of a drop-in replacement
> >> for the numpy or scipy versions.
>
On Thu, Aug 6, 2009 at 4:57 PM, Sturla Molden wrote:
>
>> Now linear algebra or FFTs on a GPU would probably be a huge boon,
>> I'll admit - especially if it's in the form of a drop-in replacement
>> for the numpy or scipy versions.
>
>
> NumPy generate temporary arrays for expressions involving nd
Robert Kern wrote:
> I believe that is exactly the point that Erik is making. :-)
>
I wasn't arguing against him, just suggesting a solution. :-)
I have big hopes for lazy evaluation, if we can find a way to to it right.
Sturla
___
NumPy-Discussion m
On Thu, Aug 6, 2009 at 15:57, Sturla Molden wrote:
>
>> Now linear algebra or FFTs on a GPU would probably be a huge boon,
>> I'll admit - especially if it's in the form of a drop-in replacement
>> for the numpy or scipy versions.
>
> NumPy generate temporary arrays for expressions involving ndarra
> Now linear algebra or FFTs on a GPU would probably be a huge boon,
> I'll admit - especially if it's in the form of a drop-in replacement
> for the numpy or scipy versions.
NumPy generate temporary arrays for expressions involving ndarrays. This
extra allocation and copying often takes more
On 6-Aug-09, at 2:54 PM, Erik Tollerud wrote:
> Now linear algebra or FFTs on a GPU would probably be a huge boon,
> I'll
> admit - especially if it's in the form of a drop-in replacement for
> the
> numpy or scipy versions.
The word I'm hearing from people in my direct acquaintance who are
2009/8/6 Erik Tollerud :
> Note that this is from a "user" perspective, as I have no particular plan of
> developing the details of this implementation, but I've thought for a long
> time that GPU support could be great for numpy (I would also vote for OpenCL
> support over cuda, although conceptua
Note that this is from a "user" perspective, as I have no particular plan of
developing the details of this implementation, but I've thought for a long
time that GPU support could be great for numpy (I would also vote for OpenCL
support over cuda, although conceptually they seem quite similar)...
B
On Thu, Aug 6, 2009 at 1:19 PM, Charles R
Harris wrote:
> I almost looks like you are reimplementing numpy, in c++ no less. Is there
> any reason why you aren't working with a numpy branch and just adding
> ufuncs?
I don't know how that would work. The Ufuncs need a datatype to work
with, and AFA
On Thu, Aug 6, 2009 at 11:12 AM, James Bergstra
wrote:
> >David Warde-Farley cs.toronto.edu> writes:
> >> It did inspire some of our colleagues in Montreal to create this,
> >> though:
> >>
> >> http://code.google.com/p/cuda-ndarray/
> >>
> >> I gather it is VERY early in development, but I'
>David Warde-Farley cs.toronto.edu> writes:
>> It did inspire some of our colleagues in Montreal to create this,
>> though:
>>
>> http://code.google.com/p/cuda-ndarray/
>>
>> I gather it is VERY early in development, but I'm sure they'd love
>> contributions!
>>
>
>Hi David,
>That does look q
60 matches
Mail list logo