Javier Baladron wrote:
Bogdan Opanchuk wrote:
Hi Javier,

It would probably help if you attach the source of the expon_them()
function (since something is definitely happening there).

I'll try to do some psychic debugging though. I find these lines suspicious:
self.weights_lateral = gpuarray.to_gpu(self.weight_matrixLateral())
self.weights_local = gpuarray.to_gpu(self.weight_matrixLocal())
...
self.unitarysX = gpuarray.to_gpu(unitaryX)
self.unitarysY = gpuarray.to_gpu(unitaryY)

In other places you use single precision (np.float32) explicitly, but
here it looks like you are just tossing numpy arrays to GPU, and numpy
arrays have double precision by default. If this is the case, try to
change the lines like:
self.weights_lateral =
gpuarray.to_gpu(self.weight_matrixLateral().astype(np.float32))
... and so on.

Best regards,
Bogdan

On Wed, Oct 13, 2010 at 4:17 AM, Javier <[email protected]> wrote:
Hello,

I am writing some code using pycuda and have some problems, hope somebody
could  please help me!

I am writing an application that will execute several calls to a function that executes a kernel. Every call will share some of the same data, so I am planning on sending this data before the first call and keep it inside the gpu for the future ones. This way I will not need to copy the same thing over and over again to the gpu, saving time. Also this function make a call
to two different kernels.

I show you how I am trying to do this:

in the init of my class I have:

self.weights_lateral = gpuarray.to_gpu(self.weight_matrixLateral())
self.weights_local = gpuarray.to_gpu(self.weight_matrixLocal())
unitaryX, unitaryY = self.unitary_vectors()
self.unitarysX = gpuarray.to_gpu(unitaryX)
self.unitarysY = gpuarray.to_gpu(unitaryY)

This are the gpus arrays I want to keep in the gpu memory.

Afterwards I have a function that do this:

#load data into the gpu
       values_gpu =
gpuarray.to_gpu(np.array(current_values,dtype=np.float32))

       #calculate the array of S(v)
       expon_them = mod.get_function("expon_them")

       svv =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
       expon_them(svv, values_gpu,
np.float32(0.0),np.float32(self.lamb),grid =
(self.num_blocks,1),block=(self.threads_per_block,1,1))


resul =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
       dv_them = mod.get_function("dv_them")


       dv_them(resul, values_gpu, self.dx,self.weights_lateral,
self.weights_local, svv, self.alpha, self.mu, self.beta, self.inpt
,self.unitarysX,self.unitarysY,np.float32(0),self.lamb, grid =
(self.num_blocks,1),block=(self.threads_per_block,1,1))


I am having an error when I do resul =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
The error is: LaunchError: cuMemcpyHtoD failed: launch failed

I have also been checking if I can acces the gpu arrays that I keep in
memory and I can not access them.

Something seems to be happening to the context after I execute expon_them
and I dont know what.

Could somebody please tell me how to fix this?
--
View this message in context: http://pycuda.2962900.n2.nabble.com/Some-help-with-contexts-please-tp5627674p5627674.html
Sent from the PyCuda mailing list archive at Nabble.com.

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Thanks a lot for the quick answer.

I will change the lines as you said in order to improve my code, but I dont thinks that is the cause of the error, because the expon them function is giving good results.

the code of the expon them function is this:

__global__ void expon_them(float *dest, float *v,int startingPoint, float lambda){

   const int i = blockDim.x*blockIdx.x + threadIdx.x+startingPoint;
   const int reali = blockDim.x*blockIdx.x + threadIdx.x;
   float temp;
   temp = S(v[i],lambda);
   dest[reali] = temp;
}

__device__ float S(float x, float lam){


float threshold = 0.4;
return 1/(1+expf(-1*lam*(x-threshold)));

}


The real i and i stuff are because I took this kernel from a C code I did before that uses multiple GPUs and it is just to manage that. Now I want to make a simple version using python and pycuda.

Hope that with this info we can get to a solution, thank you very much

Regards,

Javier
Thanks a lot for the quick answer.

I will change the lines as you said in order to improve my code, but I dont thinks that is the cause of the error, because the expon them function is giving good results.

the code of the expon them function is this:

__global__ void expon_them(float *dest, float *v,int startingPoint, float lambda){

  const int i = blockDim.x*blockIdx.x + threadIdx.x+startingPoint;
  const int reali = blockDim.x*blockIdx.x + threadIdx.x;
  float temp;
  temp = S(v[i],lambda);
  dest[reali] = temp;
}

__device__ float S(float x, float lam){


float threshold = 0.4;
return 1/(1+expf(-1*lam*(x-threshold)));

}


The real i and i stuff are because I took this kernel from a C code I did before that uses multiple GPUs and it is just to manage that. Now I want to make a simple version using python and pycuda.

Hope that with this info we can get to a solution, thank you very much

Regards,

Javier

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Reply via email to