Re: [Numpy-discussion] GPU implementation?

2007-06-03 Thread Dave P. Novakovic
This may be of interest,

LLVM support in Mesa, and i believe there is work doing on with LLVM
and python in the pypy camp.

http://zrusin.blogspot.com/2007/05/mesa-and-llvm.html

I just stumbled on this page, while this conversation was happening :)

Dave

On 6/2/07, Bob Lewis [EMAIL PROTECTED] wrote:
 James Turner wrote:

  Hi Martin,
 
I was wondering if anyone has thought about accelerating NumPy with a
GPU. For example nVidia's CUDA SDK provides a feasible way to offload
vector math onto the very fast SIMD processors available on the GPU.
Currently GPUs primarily support single precision floats and are not
IEEE compliant, but still could be useful for some applications.
 
  I wasn't actually there, but I noticed that last year's SciPy
  conference page includes a talk entitled GpuPy: Using GPUs to
  Accelerate NumPy, by Benjamin Eitzen (I think I also found his Web
  page via Google):
 
http://www.scipy.org/SciPy2006/Schedule
 
  I also wondered whether Benjamin or anyone else who is interested had
  come across the Open Graphics Project (hadn't got around to asking)?

 Thanks for your interest.  Ben and I (mostly Ben, it's his MS thesis)
 are working on gpupy and expect to have a version ready for testing
 by people other than ourselves some time this summer.

 (Very) preliminary results are promising.

 - Bob Lewis
   School of EECS
   Washington State University

 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


[Numpy-discussion] GPU implementation?

2007-05-31 Thread Martin Ünsal
I was wondering if anyone has thought about accelerating NumPy with a
GPU. For example nVidia's CUDA SDK provides a feasible way to offload
vector math onto the very fast SIMD processors available on the GPU.
Currently GPUs primarily support single precision floats and are not
IEEE compliant, but still could be useful for some applications.

If there turns out to be a significant speedup over using the CPU, this
could be a very accessible way to do scientific and numerical
computation using GPUs, much easier than coding directly to the GPU APIs.

Martin


___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GPU implementation?

2007-05-31 Thread Charles R Harris

On 5/31/07, Martin Ünsal [EMAIL PROTECTED] wrote:


I was wondering if anyone has thought about accelerating NumPy with a
GPU. For example nVidia's CUDA SDK provides a feasible way to offload
vector math onto the very fast SIMD processors available on the GPU.
Currently GPUs primarily support single precision floats and are not
IEEE compliant, but still could be useful for some applications.



I've thought about it, but I think it would be a heck of a lot of work.
NumPy works with subarrays a lot and I suspect this would make it tricky to
stream through a GPU. Making good use of the several pipelines would also
require a certain degree of parallelism which is not there now. We would
also need computation of sin, cos, and other functions for ufuncs, so that
might not work well. For ordinary matrix/array arithmetic the shortest route
might be a version of ATLAS/BLAS, some of LAPACK, and maybe an FFT library
written to use a GPU.

Chuck
___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GPU implementation?

2007-05-31 Thread James Turner
Hi Martin,

  I was wondering if anyone has thought about accelerating NumPy with a
  GPU. For example nVidia's CUDA SDK provides a feasible way to offload
  vector math onto the very fast SIMD processors available on the GPU.
  Currently GPUs primarily support single precision floats and are not
  IEEE compliant, but still could be useful for some applications.

I wasn't actually there, but I noticed that last year's SciPy
conference page includes a talk entitled GpuPy: Using GPUs to
Accelerate NumPy, by Benjamin Eitzen (I think I also found his Web
page via Google):

   http://www.scipy.org/SciPy2006/Schedule

I also wondered whether Benjamin or anyone else who is interested had
come across the Open Graphics Project (hadn't got around to asking)?

   http://wiki.duskglow.com/tiki-index.php?page=open-graphics

This would be quite a specialized combination, but I'm sure it could
be useful to some people with high performance requirements or maybe
building some kind of special appliances.

Cheers,

James.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GPU implementation?

2007-05-31 Thread Andrew Corrigan
Martin Ünsal martinunsal at gmail.com writes:

 
 I was wondering if anyone has thought about accelerating NumPy with a
 GPU. For example nVidia's CUDA SDK provides a feasible way to offload
 vector math onto the very fast SIMD processors available on the GPU.
 Currently GPUs primarily support single precision floats and are not
 IEEE compliant, but still could be useful for some applications.
 
 If there turns out to be a significant speedup over using the CPU, this
 could be a very accessible way to do scientific and numerical
 computation using GPUs, much easier than coding directly to the GPU APIs.
 
 Martin
 

I've thought about this too and think that it's a great idea. The existing
library Brook, which has a similar programming model to NumPy, proves that it's
feasible.  And Brook was originally done with OpenGL  DirectX as backends to
access the hardware.  Needless to say, that's a lot harder than using CUDA. 
Since it hasn't already been pointed out, CUDA includes the cuBLAS and cuFFT
libraries.  I don't what the status of a LAPACK built on top of the cuBLAS is,
but I'd be surprised if someone isn't already working on it.  Also, NVIDIA has
stated that double-precision hardware will be available later this year, in case
that's an issue for anyone.

I agree very much that it would make the GPUs more accessible, although CUDA has
done an amazing job at that already.  I think the most helpful thing about this
would be if it allowed us to code using the existing array interface from NumPy
in a way that the code automatically runs on the GPU in an optimized way - using
shared memory + avoiding bank conflicts.

I'd happily contribute to such a project if someone else got it started.

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion


Re: [Numpy-discussion] GPU implementation?

2007-05-31 Thread Brian Granger
This is very much worth pursuing.  I have been working on things
related to this on and off at my day job.  I can't say specifically
what I have been doing, but I can make some general comments:

* It is very easy to wrap the different parts of cude using ctypes and
call it from/numpy.

* Compared to a recent fast Intel CPU, the speedups we see are
consistent with what the NVIDIA literature reports:   10-30x is common
and in some cases we have seen up to 170x.

* Certain parts of numpy will be very easy to accelerate:  things
covered by blas, ffts, and ufuncs, random variates -  but each of
these will have very different speedups.

* LAPACK will be tough, extremely tough in some cases.  The main issue
is that various algorithms in LAPACK rely on different levels of BLAS
(1,2, or 3).  The algorithms in LAPACK that primarily use level 1 BLAS
functions (vector operations), like LU-decomp, are probably not worth
porting to the GPU - at least not using the BLAS that NVIDIA provides.
 On the other hand, the algorithms that use more of the level 2 and 3
BLAS functions are probably worth looking at.

* NVIDIA made a design decision in its implementation of cuBLAS and
cuFFT that is somewhat detrimental for certain algorithms.  In their
implementation, the BLAS and FFT routines can _only_ be called from
the CPU, not from code running on the GPU.  Thus if you have an
algorithm that makes many calls to cuBLAS/cuFFT, you pay a large
overhead in having to keep the main flow of the algorithm on the CPU.
It is not uncommon for this overhead to completely erode any speedup
you may have gotten on the GPU.

* For many BLAS calls, the cuBLAS won't be much faster than a good
optimized BLAS from ATLAS or Goto.

Brian


On 5/31/07, Martin Ünsal [EMAIL PROTECTED] wrote:
 I was wondering if anyone has thought about accelerating NumPy with a
 GPU. For example nVidia's CUDA SDK provides a feasible way to offload
 vector math onto the very fast SIMD processors available on the GPU.
 Currently GPUs primarily support single precision floats and are not
 IEEE compliant, but still could be useful for some applications.

 If there turns out to be a significant speedup over using the CPU, this
 could be a very accessible way to do scientific and numerical
 computation using GPUs, much easier than coding directly to the GPU APIs.

 Martin


 ___
 Numpy-discussion mailing list
 Numpy-discussion@scipy.org
 http://projects.scipy.org/mailman/listinfo/numpy-discussion

___
Numpy-discussion mailing list
Numpy-discussion@scipy.org
http://projects.scipy.org/mailman/listinfo/numpy-discussion