> Am 15.02.2016 um 22:43 schrieb Andreas Kloeckner <[email protected]>:
> 
> Gregor Thalhammer <[email protected]> writes:
>> on my macbook pro (running os x 10.11) I have been plagued with nasty
>> segfaults when using the nvidia 750M GPU and the high level methods of
>> pyopencl.array together with complex64 arrays. Ultimately it seems to
>> be due to buggy nvidia OpenCL drivers on os x, but I found a
>> workaround:
>> 
>> The typedef in pyopencl/cl/pyopencl-complex.h  for cfloat_t (after macro 
>> expansion) 
>> as a 
>> "union {struct {float x,y;}; struct {float real, imag;}}“ 
>> is too sophisticated. Same for a simpler „struct {float x,y;}“ But segfaults 
>> go away if instead I use  "typedef float2 cfloat_t;“ and then replace a.real 
>> by a.x (same for .imag -> .y). All tests pass (could not test for complex 
>> double). This is not beautiful, but it works for me. 
>> 
>> The docs state that the the struct has been introduced to avoid silent
>> bugs, e.g. complex + real not giving the expected result, so I
>> understand that my workaround is not acceptable for a PR.
> 
> I would be happy to accept a pull request to this effect that, based on
> some flag, replaces the struct-based definition of the complex number
> with a vector-based one, to work around broken OpenCL implementations,
> of which there appear to be many. The one requirement would be that the
> struct-based implementation remains the default, and that all code
> continues to work with it.
> 
> Andreas
> 

Great, I will try to look into this. At first glance it seems the 
infrastructure for providing a flag is not yet there. Such workarounds are 
device dependent. I am thinking about adding device specific build options, 
similar to _PLAT_BUILD_OPTIONS in pyopencl/__init__.py, which defines an macro 
(e.g. PYOPENCL_COMPLEX_NOSTRUCT) to switch definitions in pyopencl-complex.h 
based on this flag. 

A drawback of optionally reverting to the old vector-based complex types is 
that the nice .real and .imag attribute access cannot be used anymore, instead 
one has to revert to the .x and .y attributes, or introduce macros for 
attribute access.

You think that is a reasonable approach?

Gregor



_______________________________________________
PyOpenCL mailing list
[email protected]
https://lists.tiker.net/listinfo/pyopencl

Reply via email to