Re: DCompute OpenCL kernels now work

2017-10-08 Thread Dmitry Olshansky via Digitalmars-d-announce

On Sunday, 8 October 2017 at 04:40:35 UTC, Nicholas Wilson wrote:
I am happy to announce that DCompute will soon[1] support the 
OpenCL 2.1 runtime. I have tested it locally and it works now 
:) I wasted a good deal of time wondering why it wasn't and it 
was a small typo in DerelictCL trying to load OpenCL 2.2 
symbols when loading 2.1.  That along with OpenCL 2.x support 
for DerelictCL will be merged and tagged soon and then [1] can 
be merged.


Most of the hard work is now done, leaving general polish and 
user feedback as the main development tasks. A unified driver 
abstracting over the CUDA and OpenCL drivers is on my todo list.


https://github.com/libmir/dcompute/pull/36


This is awesome! I have some ambitious plans which depend on this 
development.





DCompute OpenCL kernels now work

2017-10-07 Thread Nicholas Wilson via Digitalmars-d-announce
I am happy to announce that DCompute will soon[1] support the 
OpenCL 2.1 runtime. I have tested it locally and it works now :) 
I wasted a good deal of time wondering why it wasn't and it was a 
small typo in DerelictCL trying to load OpenCL 2.2 symbols when 
loading 2.1.  That along with OpenCL 2.x support for DerelictCL 
will be merged and tagged soon and then [1] can be merged.


Most of the hard work is now done, leaving general polish and 
user feedback as the main development tasks. A unified driver 
abstracting over the CUDA and OpenCL drivers is on my todo list.


Launching a kernel is as simple as
```
enum size_t N = 128;
float alpha = 5.0;
float[N] res, x,y;
foreach (i; 0 .. N)
{
x[i] = N - i;
y[i] = i * i;
}

auto platforms = Platform.getPlatforms(theAllocator);
auto platform  = platforms[0]; // Assuming the 0'th platform 
supports 2.1

auto devices  = platform.getDevices(theAllocator);
auto ctx  = Context(devices[0 ..1],null);
Program.globalProgram = 
ctx.createProgram(cast(ubyte[])read("./.dub/obj/kernels_ocl200_64.spv"));

Program.globalProgram.build(devices,"");
auto queue= ctx.createQueue(devices[0]);

Buffer!(float) b_res, b_x, b_y;
b_res = ctx.createBuffer(res[], Memory.Flags.useHostPointer | 
Memory.Flags.readWrite);
b_x = ctx.createBuffer(x[], Memory.Flags.useHostPointer | 
Memory.Flags.readWrite);
b_y = ctx.createBuffer(y[], Memory.Flags.useHostPointer | 
Memory.Flags.readWrite);


Event e = queue.enqueue!(saxpy)([N])(b_res,alpha,b_x,b_y, N);
e.wait();

foreach(i; 0 .. N)
enforce(res[i] == alpha * x[i] + y[i]);
writeln(res[]);
```

Mike, want me to do another blog post about this and the CUDA 
support?


[1]: https://github.com/libmir/dcompute/pull/36

P.S: can those who answer foundat...@dlang.org please tell me 
what you think of my plan to advance the development and exposure 
of DCompute?