Thanks Ian and Andreas,
About the algorithm, memory isn't a huge concern so if I'm doing this
operation for an array of length N I don't mind having a permanently
allocated extra array of length N that I'll probably only ever use the
first 20 elements of for storing the indices. At the moment, this is
what I'm doing in my C code to save going through the array twice.
About the GPU code, I think what you're saying is that I should have an
array x say, a global memory array J and a global index say j into J,
and then do something like:
__global__ threshold(double *x, double x0)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
if(x[i]>x0){
atomicInc(&j, N);
J[j] = i;
}
}
(Bear with me if I'm way off, as I said I only just started programming
with CUDA.)
Isn't there a danger that at the end of the atomicInc instruction,
before the J[j]=i instruction, another thread could do a second
atomicInc and so one of the elements of J would be skipped out? It's
true that this would be a rare event, but almost certain to happen
eventually. Ah, although maybe the idea is to have global_j be the
global index, and then do:
int j = atomicInc(&global_j, N);
J[j] = i;
I guess this would work even in that case?
One last technical question, I think I see how the
pycuda.driver.mem_alloc function works, but how do I refer to this
memory in the CUDA code? (I don't think there's an example that
demonstrates this in the pycuda release.) The Nvidia CUDA documentation
talks about having to manage the global memory by offsets, so I would
guess you do something like this (based on the nvidia docs):
extern __device__ int J0[];
__global__ threshold(double *x, double x0)
{
int i = blockIdx.x * blockDim.x + threadIdx.x;
int *J = (int*)J0;
int *global_j = (int*)&J[N];
if(x[i]>x0){
int j = atomicInc(global_j, N);
J[j] = i;
}
}
Is that right? I'll go and have a play around with this now, but I
figure it probably won't work so I'm getting my question in early. ;-)
Dan
p.s. apologies if this posts twice, I sent it from the wrong email
address before but maybe it will go through anyway.
_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net