Javier Baladron wrote:
Bogdan Opanchuk wrote:
Hi Javier,
It would probably help if you attach the source of the expon_them()
function (since something is definitely happening there).
I'll try to do some psychic debugging though. I find these lines
suspicious:
self.weights_lateral = gpuarray.to_gpu(self.weight_matrixLateral())
self.weights_local = gpuarray.to_gpu(self.weight_matrixLocal())
...
self.unitarysX = gpuarray.to_gpu(unitaryX)
self.unitarysY = gpuarray.to_gpu(unitaryY)
In other places you use single precision (np.float32) explicitly, but
here it looks like you are just tossing numpy arrays to GPU, and numpy
arrays have double precision by default. If this is the case, try to
change the lines like:
self.weights_lateral =
gpuarray.to_gpu(self.weight_matrixLateral().astype(np.float32))
... and so on.
Best regards,
Bogdan
On Wed, Oct 13, 2010 at 4:17 AM, Javier <[email protected]>
wrote:
Hello,
I am writing some code using pycuda and have some problems, hope
somebody
could please help me!
I am writing an application that will execute several calls to a
function
that executes a kernel. Every call will share some of the same data,
so I am
planning on sending this data before the first call and keep it
inside the
gpu for the future ones. This way I will not need to copy the same
thing
over and over again to the gpu, saving time. Also this function make
a call
to two different kernels.
I show you how I am trying to do this:
in the init of my class I have:
self.weights_lateral = gpuarray.to_gpu(self.weight_matrixLateral())
self.weights_local = gpuarray.to_gpu(self.weight_matrixLocal())
unitaryX, unitaryY = self.unitary_vectors()
self.unitarysX = gpuarray.to_gpu(unitaryX)
self.unitarysY = gpuarray.to_gpu(unitaryY)
This are the gpus arrays I want to keep in the gpu memory.
Afterwards I have a function that do this:
#load data into the gpu
values_gpu =
gpuarray.to_gpu(np.array(current_values,dtype=np.float32))
#calculate the array of S(v)
expon_them = mod.get_function("expon_them")
svv =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
expon_them(svv, values_gpu,
np.float32(0.0),np.float32(self.lamb),grid =
(self.num_blocks,1),block=(self.threads_per_block,1,1))
resul =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
dv_them = mod.get_function("dv_them")
dv_them(resul, values_gpu, self.dx,self.weights_lateral,
self.weights_local, svv, self.alpha, self.mu, self.beta, self.inpt
,self.unitarysX,self.unitarysY,np.float32(0),self.lamb, grid =
(self.num_blocks,1),block=(self.threads_per_block,1,1))
I am having an error when I do resul =
gpuarray.to_gpu(np.zeros(self.angle_size*self.image_size*self.image_size,dtype
= np.float32))
The error is: LaunchError: cuMemcpyHtoD failed: launch failed
I have also been checking if I can acces the gpu arrays that I keep in
memory and I can not access them.
Something seems to be happening to the context after I execute
expon_them
and I dont know what.
Could somebody please tell me how to fix this?
--
View this message in context:
http://pycuda.2962900.n2.nabble.com/Some-help-with-contexts-please-tp5627674p5627674.html
Sent from the PyCuda mailing list archive at Nabble.com.
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda
Thanks a lot for the quick answer.
I will change the lines as you said in order to improve my code, but I
dont thinks that is the cause of the error, because the expon them
function is giving good results.
the code of the expon them function is this:
__global__ void expon_them(float *dest, float *v,int startingPoint,
float lambda){
const int i = blockDim.x*blockIdx.x + threadIdx.x+startingPoint;
const int reali = blockDim.x*blockIdx.x + threadIdx.x;
float temp;
temp = S(v[i],lambda);
dest[reali] = temp;
}
__device__ float S(float x, float lam){
float threshold = 0.4;
return 1/(1+expf(-1*lam*(x-threshold)));
}
The real i and i stuff are because I took this kernel from a C code I
did before that uses multiple GPUs and it is just to manage that. Now
I want to make a simple version using python and pycuda.
Hope that with this info we can get to a solution, thank you very much
Regards,
Javier
Thanks a lot for the quick answer.
I will change the lines as you said in order to improve my code, but I
dont thinks that is the cause of the error, because the expon them
function is giving good results.
the code of the expon them function is this:
__global__ void expon_them(float *dest, float *v,int startingPoint,
float lambda){
const int i = blockDim.x*blockIdx.x + threadIdx.x+startingPoint;
const int reali = blockDim.x*blockIdx.x + threadIdx.x;
float temp;
temp = S(v[i],lambda);
dest[reali] = temp;
}
__device__ float S(float x, float lam){
float threshold = 0.4;
return 1/(1+expf(-1*lam*(x-threshold)));
}
The real i and i stuff are because I took this kernel from a C code I
did before that uses multiple GPUs and it is just to manage that. Now I
want to make a simple version using python and pycuda.
Hope that with this info we can get to a solution, thank you very much
Regards,
Javier
_______________________________________________
PyCUDA mailing list
[email protected]
http://lists.tiker.net/listinfo/pycuda