Received from Ahmed Fasih on Wed, Nov 07, 2012 at 11:07:31PM EST: (snip)
> Thanks Lev! These gists were really useful in understanding how to use > these functions, and they work for me too. Nonetheless, I tried and > succeeded in breaking the second one: see > https://gist.github.com/4036693 > > First, I had to add "assert" in the calls to np.allclose to make sure > I'd be informed if things weren't all close. Then I extended the > kernel to work with multiple blocks, and finally I moved the unpinned > test first. As I increased N from 20 to 22, both tests passed. But at > N=23 (23 by 23 array), although the unpinned version works, the pinned > assertion fails and PyCUDA complains that cleanup operations failed. > > I can't find any documented limit on the size of page-locked memory > allocations, but it ought to be >3kb, right? I'm not aware of any such limits. > Ubuntu 11.10, NVIDIA driver 304.51, CUDA 5, PyCUDA 2012.1, Tesla > C2050. If you or any other kind soul is able to successfully run this > gist, let me know! https://gist.github.com/4036693 > > Thanks again, > Ahmed When N*N > 512, the mismatch between array size (np.double().nbytes*N*N) and the default alignment assumed by pycuda.driver.aligned_empty() (4096) prevents all of the array elements from being properly updated; if you preallocate a device-mapped array, you don't need to worry about setting the alignment. L.G. _______________________________________________ PyCUDA mailing list [email protected] http://lists.tiker.net/listinfo/pycuda
