Hi, For some research project of mine, I'd like to accelerate the calculations using OpenCL. I prefer OpenCL over CUDA because it is more lightweight and covers also non-nVidia GPUs. I can live with that is has been dropped on the Mac with 10.14 'Mojave', since the program will rarely be used on anything else than Windows.
I've tried to work with "macro" kernels to solve *entire problems* in one step, but it didn't work out properly. I would need to maintain two implementations (numpy.ndarray and OpenCL) completely different from each other, while the OpenCL codes are longish compared to the numpy ones. So I've come to the conclusion that it'll be best when I use OpenCL *on the atomic* layer, that means, to implement clndarray. Here are the requirements: 1. Implementation of offsets and strides using pyopencl.Buffers for the data storage. I only need C order, but once offsets and strides are implemented, probably the order is rather arbitrary. Handling these shape-metadata can happen just in plain numpy Python. Only the actual calculations need to be carried out on the GPU. 2. Providing index access and index assignment *using integers and Slices only*. I (currently) do not need boolean indices or integer-valued index arrays. 3. The code using clndarray needs to run also when pyopencl is not present. 4. Possibly it would be wise to limit the number of dimensions to three, otherwise I would need to replicate numpy for the excess dimensions. Such a limitation would possibly simplify the implementation, at the expense of losing generality (at least in the core). 5. In my use case, I'd like to be able to run the code on all devices (including the CPU) in succession, in case that it fails to run on a specific device (e.g. due to lack of memory). For the implementation of offsets/strides I think it might be useful to have a look at the numpy code, to not reinvent the wheel. This is the reason for this email: I'd appreciate *very much* *any pointer* towards *where to look for strides implementation in numpy*. Maybe I can use the code to avoid pitfalls, and maybe my coding process could be more efficient. I've got some ideas about how to solve the design questions outlined above, but for the moment I consider these as off-topic. I'd like to see first how they work out, when clndarray picks up some speed. Best, Friedrich
_______________________________________________ NumPy-Discussion mailing list -- numpy-discussion@python.org To unsubscribe send an email to numpy-discussion-le...@python.org https://mail.python.org/mailman3//lists/numpy-discussion.python.org Member address: arch...@mail-archive.com