Hi all, I am rather new to cuda, and I am having trouble with this simple program. I am writing to classify pixels of an image in to terrain types. It is currently failing with the following error:
python new_main.py Traceback (most recent call last): File "new_main.py", line 56, in <module> processedArray = process.tercAccel(slopewin, tpiwin, xwin, ywin) File "/home/nathan/NetBeansProjects/HeightAboveAverage/src/nonFreq.py", line 131, in tercAccel cuda.memcpy_dtoh(out,out_gpu) pycuda._driver.LaunchError: cuMemcpyDtoH failed: launch failed Error in atexit._run_exitfuncs: Traceback (most recent call last): File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) LaunchError: cuCtxPopCurrent failed: launch failed Error in sys.exitfunc: PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuMemFree failed: invalid context PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuMemFree failed: invalid context PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuMemFree failed: invalid context PyCUDA WARNING: a clean-up operation failed (dead context maybe?) cuModuleUnload failed: invalid context Traceback (most recent call last): File "/usr/lib/python2.6/atexit.py", line 24, in _run_exitfuncs func(*targs, **kargs) pycuda._driver.LaunchError: cuCtxPopCurrent failed: launch failed ------------------------------------------------------------------- PyCUDA ERROR: The context stack was not empty upon module cleanup. ------------------------------------------------------------------- A context was still active when the context stack was being cleaned up. At this point in our execution, CUDA may already have been deinitialized, so there is no way we can finish cleanly. The program will be aborted now. Use Context.pop() to avoid this problem. ------------------------- The code producing the error: def tercAccel(self, slopeArray,tpiArray,xwin,ywin): out = zeros((ywin,xwin),dtype=numpy.int16) slope_gpu = cuda.mem_alloc(slopeArray.size * slopeArray.dtype.itemsize) tpi_gpu = cuda.mem_alloc(tpiArray.size * tpiArray.dtype.itemsize) out_gpu = cuda.mem_alloc(out.size * out.dtype.itemsize) cuda.memcpy_htod(slope_gpu,slopeArray) cuda.memcpy_htod(tpi_gpu,tpiArray) mod = SourceModule(""" __global__ void classify(int *out, float *slope, float *tpi) { int x = blockIdx.x * blockDim.x + threadIdx.x; int y = blockIdx.y * blockDim.y + threadIdx.y; int idx = y * 6688 + x; float s = slope[idx]; float t = tpi[idx]; /* #Terrain Classfication #1 ridge tpi > 1 #2 upper slope tpi >0.5 and tpi =<1 #3 middle slope tpi >-0.5 and tpi < 0.5 and slope > 5 #4 flats slope tpi >= -0.5 and tpi <= 0.5 and slope <= 5 #5 lower slope tpi >= -1.0 and tpi < 0.5 #6 valleys tpi < -1.0 */ if( t > 1 ){ out[idx] = 1; }else if( t > 0.5 && t <= 1){ out[idx] = 2; }else if( t > -0.5 && t < 0.5 && s > 5 ){ out[idx] = 3; }else if( t >= -0.5 && t <= 0.5 && s <= 5){ out[idx] = 4; }else if( t >= -1.0 and t < 0.5 ){ out[idx] = 5; }else if( t < -1.0){ out[idx] = 6; } } """) classify = mod.get_function("classify") classify(out_gpu,slope_gpu,tpi_gpu,block=(32,32,1),grid=(209,209)) cuda.memcpy_dtoh(out,out_gpu) return out The "image" is 6688x6688 any ideas to why this program is failing. The code is running on a gtx 460. Thanks in advance, Nathaniel H Clay
_______________________________________________ PyCUDA mailing list PyCUDA@tiker.net http://lists.tiker.net/listinfo/pycuda