On Tue, 14 Feb 2012 18:36:16 +0100, Holger Rapp <[email protected]> wrote: > >> >> 2) I pass the params_buf as __constant to my kernel. I have some > >> >> functions doing arithmetic with DualQuaternions and I have to first copy > >> >> all data from my structure before working with them: e.g. > >> >> > >> >> void conjugate(const DualQuaternion * a, DualQuaternion * rv); > >> >> > >> >> DualQuaternion rv; > >> >> conjugate(&measurement->w2c, &rv); > >> >> Gives this error: > >> >> passing 'DualQuaternion __attribute__((address_space(2)))const *' > >> >> discards qualifiers, expected 'DualQuaternion const *' > >> >> > >> >> DualQuaternion temp = measurement->w2c; > >> >> conjugate(&w2c, &rv); > >> >> is working okay. > >> >> > >> >> I understand the reason for this I think: functions need to work in one > >> >> address space only. But is there a way to pass my structures to my > >> >> kernel > >> >> that the explicit copy is not needed? > >> > > >> >My advice would be to pass the arguments to conjugate() by value and use > >> >a return value. This avoids issues of address space matching > >> >(i.e. declaring __constant args in conjugate()), and any half-way smart > >> >compiler will generate equivalent code anyway. > >> > >> My profiling shoes that this not the case. Passing const > >> DualQuaternion* is roughly 20% faster than passing const > >> DualQuaternion. Maybe I need to activate optimization or so? Would > >> that be cl.Program.build(["-O2"])? > >Good question. If you're on Nv, maybe start by looking at the > >PTX. (prg.binaries[0]) > I guess this is beyond my capabilities. I am actually on nvidia but > using Apples OpenCL. I will just turn optimization on and keep > monitoring my performance. > > I already ran into a new problem. The following program fails for me on > one box (Linux 64 bit) but not the other (Apple). I'd like to know why > it fails on the Linux box. Below is the sample program, the output when > it is ran and the properties of the linux card. I see no apparent reason > why the image does not work - it only has ~30 MB. Can other programs > influence the amount of memory available on the card?
No problem on Linux with VERSION: OpenCL 1.1 CUDA 4.1.1 <pyopencl.Device 'GeForce GTX 260' on 'NVIDIA CUDA' at 0x2b8e4a0> I figure that might be an issue with the implementation/version you're using. Andreas
pgpV1Lrq7qR8j.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
