Hello everybody,

I've written a kernel to perform a custom dot operation that would work 
perfectly if there was not an issue with the memory allocation. Maybe I am 
missing something  in the mapping process?
From what I understood matrices are allocated column-wise. So in this case b[0] 
and b[1] (kernel) would be respectively b[0][0] and b[1][0], but from the 
result it looks like the matrix is stored by rows. Is this an option?

Thanks,
RL

#### the code ####
import pycuda.autoinit, driver as drv
import numpy
from pycuda.compiler import SourceModule

a = numpy.array([0,1],dtype=numpy.float32)
b = numpy.array([[0,1,2],[0,1,2]],dtype=numpy.float32)

BLOCK_SIZE = (len(a),1,1)
GRID_SIZE = (len(b[0]),1)

mod = SourceModule("""
__global__ void vecmatdot(float *dest, float *a, float *b)
{
  float sum = 0;
  const int bx = blockIdx.x;
  const int linear_thr_idx = bx * blockDim.x + threadIdx.x;
  sum += a[threadIdx.x] * b[linear_thr_idx];
  dest[bx] = sum;
}
""")

vecmatdot = mod.get_function("vecmatdot")
dest = numpy.zeros_like(b[0])
vecmatdot( drv.Out(dest), drv.In(a), drv.In(b),
        block=BLOCK_SIZE, grid=GRID_SIZE)
print dest

#### the end ####



_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to