Hello everybody,
I've written a kernel to perform a custom dot operation that would work
perfectly if there was not an issue with the memory allocation. Maybe I am
missing something in the mapping process?
From what I understood matrices are allocated column-wise. So in this case b[0]
and b[1] (kernel) would be respectively b[0][0] and b[1][0], but from the
result it looks like the matrix is stored by rows. Is this an option?
Thanks,
RL
#### the code ####
import pycuda.autoinit, driver as drv
import numpy
from pycuda.compiler import SourceModule
a = numpy.array([0,1],dtype=numpy.float32)
b = numpy.array([[0,1,2],[0,1,2]],dtype=numpy.float32)
BLOCK_SIZE = (len(a),1,1)
GRID_SIZE = (len(b[0]),1)
mod = SourceModule("""
__global__ void vecmatdot(float *dest, float *a, float *b)
{
float sum = 0;
const int bx = blockIdx.x;
const int linear_thr_idx = bx * blockDim.x + threadIdx.x;
sum += a[threadIdx.x] * b[linear_thr_idx];
dest[bx] = sum;
}
""")
vecmatdot = mod.get_function("vecmatdot")
dest = numpy.zeros_like(b[0])
vecmatdot( drv.Out(dest), drv.In(a), drv.In(b),
block=BLOCK_SIZE, grid=GRID_SIZE)
print dest
#### the end ####
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda