The first error is easy, "cuMemAlloc failed: out of memory"
This example is _extremely_ memory inefficient, and I am using a bulk of the
memory on my 260 to do 5000x5000.
Ideally the x, y coords should be generated in the kernel (with some index
tricks) and only the extents of the domain should be passed.
In the next example I am going to do this, and try to use only shared memory
for a wicked speed increase.
The in the second looks pretty straight forward too.
Firstly, you need to respect the max block size of your device (I don't know
what that is, but one of the example scripts will tell you)
Secondly, the block and grid size when multiplied need to be to be the pixel
count. There is a particular page in the blog entry I cite, but the diagram
within the actual CUDA manual will make this idea a bit more clear I hope.
So if you change the shape to 1000x1000, and the line:
mb( drv.In(x), drv.In(y), drv.InOut(itr), block=(500,1,1), grid =
(50000,1))
To:
mb( drv.In(x), drv.In(y), drv.InOut(itr), block=(100,1,1), grid =
(10000,1))
Things ought to roll on.
Let me know if that works for you!
-Matt
On Mon, Jan 5, 2009 at 9:46 AM, Randy Heiland <[email protected]> wrote:
> When I try to run this example, I get an error and I was hoping folks might
> help me understand why:
> running your script as is (you can disregard the deprecation warning as I'm
> using Python 2.6):
>
> $ python mset.py
> /Users/heiland/dev/Python-2.6.1/framework/Python.framework/Versions/2.6/lib/python2.6/site-packages/pycuda-0.91.1-py2.6-macosx-10.3-i386.egg/pycuda/driver.py:421:
> DeprecationWarning: the md5 module is deprecated; use hashlib instead
> import md5
> Traceback (most recent call last):
> File "mset.py", line 63, in <module>
> mb( drv.In(x), drv.In(y), drv.InOut(itr), block=(500,1,1), grid =
> (50000,1))
> File
> "/Users/heiland/dev/Python-2.6.1/framework/Python.framework/Versions/2.6/lib/python2.6/site-packages/pycuda-0.91.1-py2.6-macosx-10.3-i386.egg/pycuda/driver.py",
> line 108, in function_call
> handlers = func.param_set(*args)
> File
> "/Users/heiland/dev/Python-2.6.1/framework/Python.framework/Versions/2.6/lib/python2.6/site-packages/pycuda-0.91.1-py2.6-macosx-10.3-i386.egg/pycuda/driver.py",
> line 69, in function_param_set
> arg_data.append(int(arg.get_device_alloc()))
> File
> "/Users/heiland/dev/Python-2.6.1/framework/Python.framework/Versions/2.6/lib/python2.6/site-packages/pycuda-0.91.1-py2.6-macosx-10.3-i386.egg/pycuda/driver.py",
> line 21, in get_device_alloc
> self.dev_alloc = mem_alloc_like(self.array)
> File
> "/Users/heiland/dev/Python-2.6.1/framework/Python.framework/Versions/2.6/lib/python2.6/site-packages/pycuda-0.91.1-py2.6-macosx-10.3-i386.egg/pycuda/driver.py",
> line 264, in mem_alloc_like
> return mem_alloc(ary.nbytes)
> pycuda._driver.MemoryError: cuMemAlloc failed: out of memory
>
>
> then I tried changing 'shape' to be 1000x1000 and get:
> mb( drv.In(x), drv.In(y), drv.InOut(itr), block=(500,1,1), grid =
> (50000,1))
> File
> "/Users/heiland/dev/Python-2.6.1/framework/Python.framework/Versions/2.6/lib/python2.6/site-packages/pycuda-0.91.1-py2.6-macosx-10.3-i386.egg/pycuda/driver.py",
> line 130, in function_call
> Context.synchronize()
> pycuda._driver.LaunchError: cuCtxSynchronize failed: launch failed
> terminate called after throwing an instance of 'cuda::error'
> what(): cuMemFree failed: launch failed
> Abort trap
>
> -Randy
>
> On Jan 4, 2009, at 12:36 AM, Matt G wrote:
>
> So, I put together a Mandelbrot demo if anyone is interested. I put it (an
> a big description) oh how it all works up on one of the blogs I maintain. I
> realize there are better ways to accomplish some of the things in the demo
> code, but I was hoping you guys might look over it and tell me I have the
> thinking half correct.
>
> http://scipyed.blogspot.com/
>
> Next up CFD . . . ugh.
> Thanks!
>
> -Matt
> <ATT00001.txt>
>
>
>
_______________________________________________
PyCuda mailing list
[email protected]
http://tiker.net/mailman/listinfo/pycuda_tiker.net