On Tue, Feb 14, 2012 at 07:18:24PM -0500, Andreas Kloeckner wrote:
On Tue, 14 Feb 2012 18:36:16 +0100, Holger Rapp <[email protected]> wrote:
>> >> 2) I pass the params_buf as __constant to my kernel. I have some
>> >> functions doing arithmetic with DualQuaternions and I have to first copy
>> >> all data from my structure before working with them: e.g.
>> >>
>> >> void conjugate(const DualQuaternion * a, DualQuaternion * rv);
>> >>
>> >> DualQuaternion rv;
>> >> conjugate(&measurement->w2c, &rv);
>> >> Gives this error:
>> >> passing 'DualQuaternion __attribute__((address_space(2)))const *' 
discards qualifiers, expected 'DualQuaternion const *'
>> >>
>> >> DualQuaternion temp = measurement->w2c;
>> >> conjugate(&w2c, &rv);
>> >> is working okay.
>> >>
>> >> I understand the reason for this I think: functions need to work in one
>> >> address space only. But is there a way to pass my structures to my kernel
>> >> that the explicit copy is not needed?
>> >
>> >My advice would be to pass the arguments to conjugate() by value and use
>> >a return value. This avoids issues of address space matching
>> >(i.e. declaring __constant args in conjugate()), and any half-way smart
>> >compiler will generate equivalent code anyway.
>>
>> My profiling shoes that this not the case. Passing const
>> DualQuaternion* is roughly 20% faster than passing const
>> DualQuaternion. Maybe I need to activate optimization or so? Would
>> that be cl.Program.build(["-O2"])?
>Good question. If you're on Nv, maybe start by looking at the
>PTX. (prg.binaries[0])
I guess this is beyond my capabilities. I am actually on nvidia but
using Apples OpenCL. I will just turn optimization on and keep
monitoring my performance.

I already ran into a new problem. The following program fails for me on
one box (Linux 64 bit) but not the other (Apple). I'd like to know why
it fails on the Linux box. Below is the sample program, the output when
it is ran and the properties of the linux card. I see no apparent reason
why the image does not work - it only has ~30 MB. Can other programs
influence the amount of memory available on the card?

No problem on Linux with

VERSION: OpenCL 1.1 CUDA 4.1.1
<pyopencl.Device 'GeForce GTX 260' on 'NVIDIA CUDA' at 0x2b8e4a0>

I figure that might be an issue with the implementation/version you're using.
Thank you. Burned 6 hours on this yesterday. Time to dig in for a new card I guess.

Cheers,
Holger


Attachment: pgpiTrJlBhjMw.pgp
Description: PGP signature

_______________________________________________
PyOpenCL mailing list
[email protected]
http://lists.tiker.net/listinfo/pyopencl

Reply via email to