Hi,
If the context and queue are not attached to objects, then they would
essentially represent global state, which is something I want to avoid.
I was thinking that the context returned would be specific to the Mat
and the device it was about to run on.
Users who want to do the assembly right on the OpenCL device usually
want us to use *their* context, hence all the need for such an interface.
What if a user for example wants to split the matrix accross multiple
OpenCL contexts (e.g. an AMD GPU and a Xeon Phi)?
Maybe the GetSource() should take an argument specifying which device it
was obtaining code for? I'm not convinced that this sort of hybridism
is useful, however.
This depends on the degree of optimization. If you really want to go for
utmost performance, you need to return a source string which is device
specific. For a first implementation it is sufficient to just have one
kernel for all devices. The main benefit of the assembly on the device
is avoiding PCI-Express, so a few percent in raw kernel performance can
be considered microtuning...
I think you were referring to the 'Mat' on the device, while I was
referring to the plain PETSc Mat. The difficulty for a 'Mat' on the
device is a limitation of OpenCL in defining opaque types: It is not
possible to have something like
typedef struct OpenCLMat {
__global int row_indices;
__global int col_indices;
__global float entries;
} PetscMat;
and pass this as a single kernel argument.
(cf. OpenCL standard or
http://stackoverflow.com/questions/17635898/passing-struct-with-pointer-members-to-opencl-kernel-using-pyopencl)
Umm, can't I copy the struct to the device and give the user a pointer
that they can shuttle into their kernels?
Not a struct containing device buffer handles. You can copy
struct A {
int a;
double b[5];
};
and anything else which is of fixed size, but you are not allowed to
pack cl_mem into a struct and pass that struct on to the kernel. This
is, unfortunately, a considerable abstraction problem, because it does
not allow you to 'just pass an opaque object'. For example, the three
CSR arrays need to be passed as separate kernel arguments. Yes, this is
ugly.
Best regards,
Karli