On Tue, 14 Feb 2012 18:36:16 +0100, Holger Rapp <[email protected]> wrote:
> >> >> 2) I pass the params_buf as __constant to my kernel. I have some
> >> >> functions doing arithmetic with DualQuaternions and I have to first copy
> >> >> all data from my structure before working with them: e.g.
> >> >>
> >> >> void conjugate(const DualQuaternion * a, DualQuaternion * rv);
> >> >>
> >> >> DualQuaternion rv;
> >> >> conjugate(&measurement->w2c, &rv);
> >> >> Gives this error:
> >> >> passing 'DualQuaternion __attribute__((address_space(2)))const *' 
> >> >> discards qualifiers, expected 'DualQuaternion const *'
> >> >>
> >> >> DualQuaternion temp = measurement->w2c;
> >> >> conjugate(&w2c, &rv);
> >> >> is working okay.
> >> >>
> >> >> I understand the reason for this I think: functions need to work in one
> >> >> address space only. But is there a way to pass my structures to my 
> >> >> kernel
> >> >> that the explicit copy is not needed?
> >> >
> >> >My advice would be to pass the arguments to conjugate() by value and use
> >> >a return value. This avoids issues of address space matching
> >> >(i.e. declaring __constant args in conjugate()), and any half-way smart
> >> >compiler will generate equivalent code anyway.
> >>
> >> My profiling shoes that this not the case. Passing const
> >> DualQuaternion* is roughly 20% faster than passing const
> >> DualQuaternion. Maybe I need to activate optimization or so? Would
> >> that be cl.Program.build(["-O2"])?
> >Good question. If you're on Nv, maybe start by looking at the
> >PTX. (prg.binaries[0])
> I guess this is beyond my capabilities. I am actually on nvidia but 
> using Apples OpenCL. I will just turn optimization on and keep 
> monitoring my performance.
> 
> I already ran into a new problem. The following program fails for me on 
> one box (Linux 64 bit) but not the other (Apple). I'd like to know why 
> it fails on the Linux box. Below is the sample program, the output when 
> it is ran and the properties of the linux card. I see no apparent reason 
> why the image does not work - it only has ~30 MB. Can other programs 
> influence the amount of memory available on the card?

No problem on Linux with

VERSION: OpenCL 1.1 CUDA 4.1.1
<pyopencl.Device 'GeForce GTX 260' on 'NVIDIA CUDA' at 0x2b8e4a0>

I figure that might be an issue with the implementation/version you're using.

Andreas

Attachment: pgpV1Lrq7qR8j.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to