Hi all,
I am happy to announce the availability of PyCuda [1,8], which is a
value-added Python wrapper around Nvidia's CUDA [2] GPU Computation
framework. In the presence of other wrapping modules [3,4], why would you
want to use PyCuda?
* It's designed to work and interact with numpy.
*
On Sun, Jun 22, 2008 at 3:58 PM, Andreas Klöckner [EMAIL PROTECTED] wrote:
PyCuda is based on the driver API. CUBLAS uses the high-level API. Once
*can*
violate this rule without crashing immediately. But sketchy stuff does
happen. Instead, for BLAS-1 operations, PyCuda comes with a class
On Sonntag 22 Juni 2008, Kevin Jacobs [EMAIL PROTECTED] wrote:
Thanks for the clarification. That makes perfect sense. Do you have any
feelings on the relative performance of GPUArray versus CUBLAS?
Same. If you check out the past version of PyCuda that still has CUBLAS, there
are files
select_types(PyUFuncObject *self, int *arg_types,
PyUFuncGenericFunction *function, void **data,
PyArray_SCALARKIND *scalars,
PyObject *typetup)
{
int i, j;
char start_type;
int userdef=-1;
int userdef_ind=-1;
if (self-userloops) {
For what it's worth, I tried deleting scipy/testing, changed all the
scipy.testing references to numpy.testing, and it would appear that
the NumPy test setup is able to run all the SciPy tests just fine
(which shouldn't be too surprising, since that's where I stole it from
in the first place). So
Matthew Brett wrote:
Hi,
The feature of compiling code for multiple types is somewhat orthogonal
to
ndarray support; better treat them seperately and take one at the time.
Well, it's relevant to numpy because if you want to implement - for
example - a numpy sort, then you've got to deal
I tried tweak my Cython code for performance by manually inlining a small
function, and ended up with a less performant code. I must confess I
don't really understand what is going on here. If somebody has an
explaination, I'd be delighted. The code follows.
Gael Varoquaux wrote:
I tried tweak my Cython code for performance by manually inlining a small
function, and ended up with a less performant code. I must confess I
don't really understand what is going on here. If somebody has an
explaination, I'd be delighted. The code follows.
On Sun, Jun 22, 2008 at 06:39:21PM -1000, Eric Firing wrote:
Another typo is the culprit:
In [2]:timeit do_Mandelbrot_cython()
10 loops, best of 3: 53.8 ms per loop
In [3]:timeit do_Mandelbrot_cython2()
10 loops, best of 3: 54 ms per loop
This is after I put the underscore in the x_buffer