Dear Steve, On Thu, 19 Jan 2012 15:18:55 -0700, Steve Spicklemire <[email protected]> wrote: > First, thanks much for your reply. I tried the luxury dial.. (set it > to zero) and got a factor of 3 speedup! So that's encouraging. My > comparison is a similar approach with weave.inline, not threaded, all [ 22 more citation lines. Click/Enter to show. ] > CPU giving me 10**8 x,y pairs and computing pi in more like 2.8 > seconds wall time. > > <http://spvi.com/files/weave-monte-carlo> > > <http://spvi.com/files/weave-mc-time> > > I guess I was hoping for a significant speedup going to a GPU > approach. (note I'm naturally uninterested in the actual value of pi! > I'm just trying to figure out how to get results out of a GPU. I'm > building a small cluster with 6 baby GPUs and I'd like to get smart > about making use of the resource) > > I'm also a little worried about the warning I'm getting about "can't > query SMD group size". Looking at the source it appears the platform > is returning "Apple" as a vendor, and that case is not treated in the > code that checks.. so it just returns None. When I run > 'dump_properties' I see that the max group size is pretty big! > > <http://spvi.com/files/pyopencl-mc-lux0-time> > > Anyway.. I'll try your idea of using enqueue_marker to try to track > down what's really taking the time. (I guess 60% of it *was* > generating excessivly luxurious random numbers!) But I still feel I > should be able to beat the CPU by quite a lot.
Set export COMPUTE_PROFILE=1 and rerun your code. The driver will have written a profiler log file that breaks down what's using time on the GPU. (This might not be true on Apple CL if you're on a MacBook, not sure if that provides an equivalent facility. If you find out, please report back to the list.) Next, take into account a GT330M lags by a factor of ~6-7 compared to a 'real' discrete GPU, firstly in mem bandwith (GT330M: 25 MB/s, good discrete chip: ~180 MB/s), and, less critically, in processing power. Also consider that your CPU can probably get to ~10 MB/s mem bandwidth if used well. HTH, Andreas
pgpfd44NTDd7g.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
