On Tue, 2012-01-17 at 10:01 -0500, Andreas Kloeckner wrote: > On Tue, 17 Jan 2012 15:05:00 +0100, Tomasz Rybak <[email protected]> wrote: > > On Mon, 2012-01-16 at 20:58 -0500, Andreas Kloeckner wrote: > > > Hi Tomasz, > > > > > > > > > > > I think I found it. > > > > Like in CUDA reduction bug (related to Fermi) it again seems > > > > to be related to too eager concurrency when reducing results. > > > > According to http://oscarbg.blogspot.com/2009/10/news-from-web.html > > > > "Actually the wavefront size is only 64 for the highend cards(48XX, > > > > 58XX, 57XX), but 32 for the middleend cards and 16 for the lowend > > > > cards." > > > > IMO we should use PREFERRED_WORK_GROUP_SIZE_MULTIPLE to get > > > > non_sync_size. At the same size we lose SIMD CPU optimisation, > > > > but I do not know for now how to fix those two at the same time. > > > > Attached patch fixes problem on Loveland, not breaking anything on > > > > NVIDIA ION. > > > > > > > > Investigating this I have found another problem with reasonable_work_* > > > > function. First, dev.warp_size_nv was raising LogicError (not > > > > AttributeError) so I have changed it to be the same as in > > > > get_simd_group_size. Second, there was problem with getting attributes > > > > from compiled but not build kernel. I had to add prg.build() and > > > > __kernel and __global - without those I was getting SEGFAULT > > > > from AMD OpenCL libraries. > > > > > > Thank you very much for investigating this, and for your fixes. I've > > > changed your fix slightly, in that get_simd_group() now *uses* > > > reasonable_work_group_size_multiple to find its best guess at the AMD > > > GPU wavefront size. > > > > > > I'd much appreciate if you could check the current code and report > > > back. We can then debate what to do about PyOpenCL 2012.1 (yes, it'll be > > > that). > > > > Code works OK on both Loveland and ION (all tests except image on CPU > > pass). I had to add pyopencl.characterize to setup.py (patch attached) > > for package to install characterize on Debian after your changes > > though. > > Good catch, thanks. Applied. Now there are two options: Release as-is, > or add a bit more 'scan magic'. By that I mean a) segmented scan and b) > all those little scan-based magic tricks that Thrust can do--copy_if, > unique_by_key, etc. Given that we have a working scan, those aren't hard > to add. It would take about a week, I guess. I'll leave the choice up to > you. >
I am not in hurry, and Debian will not freeze for some time, so in my opinion we can wait for 2012.1. It there some description of planned scan improvements - I would like to help. Regards. -- Tomasz Rybak GPG/PGP key ID: 2AD5 9860 Fingerprint A481 824E 7DD3 9C0E C40A 488E C654 FB33 2AD5 9860 http://member.acm.org/~tomaszrybak
signature.asc
Description: This is a digitally signed message part
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
