Hi,

For some research project of mine, I'd like to accelerate the calculations
using OpenCL. I prefer OpenCL over CUDA because it is more lightweight and
covers also non-nVidia GPUs. I can live with that is has been dropped on
the Mac with 10.14 'Mojave', since the program will rarely be used on
anything else than Windows.

I've tried to work with "macro" kernels to solve *entire problems* in one
step, but it didn't work out properly. I would need to maintain two
implementations (numpy.ndarray and OpenCL) completely different from
each other, while the OpenCL codes are longish compared to the numpy ones.
So I've come to the conclusion that it'll be best when I use OpenCL *on the
atomic* layer, that means, to implement clndarray.

Here are the requirements:

   1. Implementation of offsets and strides using pyopencl.Buffers for the
   data storage. I only need C order, but once offsets and strides are
   implemented, probably the order is rather arbitrary. Handling these
   shape-metadata can happen just in plain numpy Python. Only the actual
   calculations need to be carried out on the GPU.
   2. Providing index access and index assignment *using integers and
   Slices only*. I (currently) do not need boolean indices or
   integer-valued index arrays.
   3. The code using clndarray needs to run also when pyopencl is not
   present.
   4. Possibly it would be wise to limit the number of dimensions to three,
   otherwise I would need to replicate numpy for the excess dimensions.
   Such a limitation would possibly simplify the implementation, at the
   expense of losing generality (at least in the core).
   5. In my use case, I'd like to be able to run the code on all devices
   (including the CPU) in succession, in case that it fails to run on a
   specific device (e.g. due to lack of memory).

For the implementation of offsets/strides I think it might be useful to
have a look at the numpy code, to not reinvent the wheel. This is the
reason for this email: I'd appreciate *very much* *any pointer* towards *where
to look for strides implementation in numpy*. Maybe I can use the code to
avoid pitfalls, and maybe my coding process could be more efficient.

I've got some ideas about how to solve the design questions outlined above,
but for the moment I consider these as off-topic. I'd like to see first how
they work out, when clndarray picks up some speed.

Best,
Friedrich
_______________________________________________
NumPy-Discussion mailing list -- numpy-discussion@python.org
To unsubscribe send an email to numpy-discussion-le...@python.org
https://mail.python.org/mailman3//lists/numpy-discussion.python.org
Member address: arch...@mail-archive.com

Reply via email to