Hi,

If the context and queue are not attached to objects, then they would
essentially represent global state, which is something I want to avoid.

I was thinking that the context returned would be specific to the Mat
and the device it was about to run on.

Users who want to do the assembly right on the OpenCL device usually want us to use *their* context, hence all the need for such an interface.


What if a user for example wants to split the matrix accross multiple
OpenCL contexts (e.g. an AMD GPU and a Xeon Phi)?

Maybe the GetSource() should take an argument specifying which device it
was obtaining code for?  I'm not convinced that this sort of hybridism
is useful, however.

This depends on the degree of optimization. If you really want to go for utmost performance, you need to return a source string which is device specific. For a first implementation it is sufficient to just have one kernel for all devices. The main benefit of the assembly on the device is avoiding PCI-Express, so a few percent in raw kernel performance can be considered microtuning...


I think you were referring to the 'Mat' on the device, while I was
referring to the plain PETSc Mat. The difficulty for a 'Mat' on the
device is a limitation of OpenCL in defining opaque types: It is not
possible to have something like
   typedef struct OpenCLMat {
     __global int row_indices;
     __global int col_indices;
     __global float entries;
   } PetscMat;
and pass this as a single kernel argument.
(cf. OpenCL standard or
http://stackoverflow.com/questions/17635898/passing-struct-with-pointer-members-to-opencl-kernel-using-pyopencl)

Umm, can't I copy the struct to the device and give the user a pointer
that they can shuttle into their kernels?

Not a struct containing device buffer handles. You can copy
  struct A {
    int a;
    double b[5];
  };
and anything else which is of fixed size, but you are not allowed to pack cl_mem into a struct and pass that struct on to the kernel. This is, unfortunately, a considerable abstraction problem, because it does not allow you to 'just pass an opaque object'. For example, the three CSR arrays need to be passed as separate kernel arguments. Yes, this is ugly.

Best regards,
Karli

Reply via email to