Re: [PyCUDA] Some help with contexts please

Javier Baladron Wed, 13 Oct 2010 01:23:34 -0700

Javier Baladron wrote:

Bogdan Opanchuk wrote:
Hi Javier,
It would probably help if you attach the source of the expon_them()
function (since something is definitely happening there).
I'll try to do some psychic debugging though. I find these linessuspicious:
self.weights_lateral = gpuarray.to_gpu(self.weight_matrixLateral())
self.weights_local = gpuarray.to_gpu(self.weight_matrixLocal())
...
self.unitarysX = gpuarray.to_gpu(unitaryX)
self.unitarysY = gpuarray.to_gpu(unitaryY)

In other places you use single precision (np.float32) explicitly, but
here it looks like you are just tossing numpy arrays to GPU, and numpy
arrays have double precision by default. If this is the case, try to
change the lines like:
self.weights_lateral =
gpuarray.to_gpu(self.weight_matrixLateral().astype(np.float32))
... and so on.

Best regards,
Bogdan
On Wed, Oct 13, 2010 at 4:17 AM, Javier <[email protected]>wrote:
Hello,
I am writing some code using pycuda and have some problems, hopesomebody
could  please help me!
I am writing an application that will execute several calls to afunctionthat executes a kernel. Every call will share some of the same data,so I amplanning on sending this data before the first call and keep itinside thegpu for the future ones. This way I will not need to copy the samethingover and over again to the gpu, saving time. Also this function makea call
to two different kernels.

I show you how I am trying to do this:

in the init of my class I have:

self.weights_lateral = gpuarray.to_gpu(self.weight_matrixLateral())
self.weights_local = gpuarray.to_gpu(self.weight_matrixLocal())
unitaryX, unitaryY = self.unitary_vectors()
self.unitarysX = gpuarray.to_gpu(unitaryX)
self.unitarysY = gpuarray.to_gpu(unitaryY)

This are the gpus arrays I want to keep in the gpu memory.

Afterwards I have a function that do this:

#load data into the gpu
       values_gpu =
gpuarray.to_gpu(np.array(current_values,dtype=np.float32))

       #calculate the array of S(v)
       expon_them = mod.get_function("expon_them")

       svv =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
       expon_them(svv, values_gpu,
np.float32(0.0),np.float32(self.lamb),grid =
(self.num_blocks,1),block=(self.threads_per_block,1,1))


resul =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
       dv_them = mod.get_function("dv_them")


       dv_them(resul, values_gpu, self.dx,self.weights_lateral,
self.weights_local, svv, self.alpha, self.mu, self.beta, self.inpt
,self.unitarysX,self.unitarysY,np.float32(0),self.lamb, grid =
(self.num_blocks,1),block=(self.threads_per_block,1,1))


I am having an error when I do resul =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
The error is: LaunchError: cuMemcpyHtoD failed: launch failed

I have also been checking if I can acces the gpu arrays that I keep in
memory and I can not access them.
Something seems to be happening to the context after I executeexpon_them
and I dont know what.

Could somebody please tell me how to fix this?
--
View this message in context:http://pycuda.2962900.n2.nabble.com/Some-help-with-contexts-please-tp5627674p5627674.html
Sent from the PyCuda mailing list archive at Nabble.com.

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda
Thanks a lot for the quick answer.
I will change the lines as you said in order to improve my code, but Idont thinks that is the cause of the error, because the expon themfunction is giving good results.
the code of the expon them function is this:
__global__ void expon_them(float *dest, float *v,int startingPoint,float lambda){
   const int i = blockDim.x*blockIdx.x + threadIdx.x+startingPoint;
   const int reali = blockDim.x*blockIdx.x + threadIdx.x;
   float temp;
   temp = S(v[i],lambda);
   dest[reali] = temp;
}

__device__ float S(float x, float lam){


float threshold = 0.4;
return 1/(1+expf(-1*lam*(x-threshold)));

}
The real i and i stuff are because I took this kernel from a C code Idid before that uses multiple GPUs and it is just to manage that. NowI want to make a simple version using python and pycuda.
Hope that with this info we can get to a solution, thank you very much

Regards,

Javier

Thanks a lot for the quick answer.

I will change the lines as you said in order to improve my code, but Idont thinks that is the cause of the error, because the expon themfunction is giving good results.


the code of the expon them function is this:

__global__ void expon_them(float *dest, float *v,int startingPoint,float lambda){


  const int i = blockDim.x*blockIdx.x + threadIdx.x+startingPoint;
  const int reali = blockDim.x*blockIdx.x + threadIdx.x;
  float temp;
  temp = S(v[i],lambda);
  dest[reali] = temp;
}

__device__ float S(float x, float lam){


float threshold = 0.4;
return 1/(1+expf(-1*lam*(x-threshold)));

}

The real i and i stuff are because I took this kernel from a C code Idid before that uses multiple GPUs and it is just to manage that. Now Iwant to make a simple version using python and pycuda.


Hope that with this info we can get to a solution, thank you very much

Regards,

Javier

_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda

Re: [PyCUDA] Some help with contexts please

Reply via email to