Two more quick points... If I let the code keep running on the ION2 system I get this:
<http://www.spvi.com/files/bccd-out-9.txt> And... if I set the environment variable to show compiler output on the ION2 system.. I see this. <http://www.spvi.com/files/bccd-compiler-output-9.txt> I'm struggling to interpret what that all means. ;-) Any hints appreciated. BTW... is there a 'release' memory method needed when using pyopencl? Do I need to create my context/queue only once and pass it around to be reused all the time? thanks, -steve On Jan 27, 2012, at 6:24 AM, Steve Spicklemire <[email protected]> wrote: > Hi Folks, > > More on this saga. ;-) > > Short story.. I *think* I'm having memory management trouble... but I'm not > sure how, or how to track it down. > > I've changed my code a fair amount after getting a bit more educated WRT GPU > programming. > > I've got two systems I'm testing on, my laptop (15" macbook pro, NVIDIA > GeForce GT 330M 512 MB) and a baby cluster I've built using BCCD (6x debian > intel atom itx boards with ION2 graphics built-in). > > The laptop is more portable. ;-) > > I decided to try to use ranluxcl directly inside a custom kernel rather than > the cl.rand module (but I read the source and tried to use that as an example > of it's use). > > I'm still using the ReductionKernel class to get the final result. > > Here's the code I'm running on the mac: > > <http://www.spvi.com/files/compute_pi_9.py> > > And here are the results.... > > <http://www.spvi.com/files/mac-out-9.txt> > > It runs to completion... but notice that the 'random' numbers aren't behaving > randomly! I thought the period of ranlux was very large.. so I'm puzzled. > > Next... when I run this code: > > <http://www.spvi.com/files/bccd-compute_pi_9.py> > > on one of the cluster nodes.. I get this: > > <http://www.spvi.com/files/bccd-out-9.txt> > > Wacky! Same code (more or less... just startup is different). > > If I let it keep running it will eventually say "Host memory exhausted" or > some-such. By "host" I'm assuming it means the CPU, not the GPU right? Very > little host memory involved I think... it's almost entirely on the GPU... but > anyway, doesn't memory get freed when the function exits and the local python > variables go out of scope? Mysterious! > > I'm pretty sure I'm still missing some basic rule/concept about pyopencl... > any feedback appreciated! > > thanks, > -steve > > On Jan 20, 2012, at 10:55 AM, Andreas Kloeckner wrote: > >>> >>> I guess I was hoping for a significant speedup going to a GPU >>> approach. (note I'm naturally uninterested in the actual value of pi! >>> I'm just trying to figure out how to get results out of a GPU. I'm >>> building a small cluster with 6 baby GPUs and I'd like to get smart >>> about making use of the resource) >>> >>> I'm also a little worried about the warning I'm getting about "can't >>> query SMD group size". Looking at the source it appears the platform >>> is returning "Apple" as a vendor, and that case is not treated in the >>> code that checks.. so it just returns None. When I run >>> 'dump_properties' I see that the max group size is pretty big! >>> >>> <http://spvi.com/files/pyopencl-mc-lux0-time> >>> >>> Anyway.. I'll try your idea of using enqueue_marker to try to track >>> down what's really taking the time. (I guess 60% of it *was* >>> generating excessivly luxurious random numbers!) But I still feel I >>> should be able to beat the CPU by quite a lot. >> >> Set >> >> export COMPUTE_PROFILE=1 >> >> and rerun your code. The driver will have written a profiler log file >> that breaks down what's using time on the GPU. (This might not be true >> on Apple CL if you're on a MacBook, not sure if that provides an >> equivalent facility. If you find out, please report back to the list.) >> >> Next, take into account a GT330M lags by a factor of ~6-7 compared to a >> 'real' discrete GPU, firstly in mem bandwith (GT330M: 25 MB/s, good >> discrete chip: ~180 MB/s), and, less critically, in processing >> power. Also consider that your CPU can probably get to ~10 MB/s mem >> bandwidth if used well. >> >> HTH, >> Andreas >> >
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
