On Tue, Feb 14, 2012 at 07:18:24PM -0500, Andreas Kloeckner wrote:
Thank you. Burned 6 hours on this yesterday. Time to dig in for a new card I guess.On Tue, 14 Feb 2012 18:36:16 +0100, Holger Rapp <[email protected]> wrote:>> >> 2) I pass the params_buf as __constant to my kernel. I have some >> >> functions doing arithmetic with DualQuaternions and I have to first copy >> >> all data from my structure before working with them: e.g. >> >> >> >> void conjugate(const DualQuaternion * a, DualQuaternion * rv); >> >> >> >> DualQuaternion rv; >> >> conjugate(&measurement->w2c, &rv); >> >> Gives this error: >> >> passing 'DualQuaternion __attribute__((address_space(2)))const *' discards qualifiers, expected 'DualQuaternion const *' >> >> >> >> DualQuaternion temp = measurement->w2c; >> >> conjugate(&w2c, &rv); >> >> is working okay. >> >> >> >> I understand the reason for this I think: functions need to work in one >> >> address space only. But is there a way to pass my structures to my kernel >> >> that the explicit copy is not needed? >> > >> >My advice would be to pass the arguments to conjugate() by value and use >> >a return value. This avoids issues of address space matching >> >(i.e. declaring __constant args in conjugate()), and any half-way smart >> >compiler will generate equivalent code anyway. >> >> My profiling shoes that this not the case. Passing const >> DualQuaternion* is roughly 20% faster than passing const >> DualQuaternion. Maybe I need to activate optimization or so? Would >> that be cl.Program.build(["-O2"])? >Good question. If you're on Nv, maybe start by looking at the >PTX. (prg.binaries[0]) I guess this is beyond my capabilities. I am actually on nvidia but using Apples OpenCL. I will just turn optimization on and keep monitoring my performance.I already ran into a new problem. The following program fails for me on one box (Linux 64 bit) but not the other (Apple). I'd like to know why it fails on the Linux box. Below is the sample program, the output when it is ran and the properties of the linux card. I see no apparent reason why the image does not work - it only has ~30 MB. Can other programs influence the amount of memory available on the card?No problem on Linux with VERSION: OpenCL 1.1 CUDA 4.1.1 <pyopencl.Device 'GeForce GTX 260' on 'NVIDIA CUDA' at 0x2b8e4a0> I figure that might be an issue with the implementation/version you're using.
Cheers, Holger
pgpiTrJlBhjMw.pgp
Description: PGP signature
_______________________________________________ PyOpenCL mailing list [email protected] http://lists.tiker.net/listinfo/pyopencl
