Karl Rupp <[email protected]> writes: > Hi, > >>> If the context and queue are not attached to objects, then they would >>> essentially represent global state, which is something I want to avoid. >> >> I was thinking that the context returned would be specific to the Mat >> and the device it was about to run on. > > Users who want to do the assembly right on the OpenCL device usually > want us to use *their* context, hence all the need for such an interface.
Hmm, I think we're use "context" to mean different things. When I say
"matrix context", I mean whatever kernels use to identify the matrix
into which they want to set entries.
I think you were referring to the cl_context, which (now that you have
pointed out the issue) I think should be passed to the Mat*GetSource().
>>> What if a user for example wants to split the matrix accross multiple
>>> OpenCL contexts (e.g. an AMD GPU and a Xeon Phi)?
>>
>> Maybe the GetSource() should take an argument specifying which device it
>> was obtaining code for? I'm not convinced that this sort of hybridism
>> is useful, however.
>
> This depends on the degree of optimization. If you really want to go for
> utmost performance, you need to return a source string which is device
> specific. For a first implementation it is sufficient to just have one
> kernel for all devices. The main benefit of the assembly on the device
> is avoiding PCI-Express, so a few percent in raw kernel performance can
> be considered microtuning...
Yup.
>> Umm, can't I copy the struct to the device and give the user a pointer
>> that they can shuttle into their kernels?
>
> Not a struct containing device buffer handles. You can copy
> struct A {
> int a;
> double b[5];
> };
> and anything else which is of fixed size, but you are not allowed to
> pack cl_mem into a struct and pass that struct on to the kernel. This
> is, unfortunately, a considerable abstraction problem, because it does
> not allow you to 'just pass an opaque object'. For example, the three
> CSR arrays need to be passed as separate kernel arguments. Yes, this is
> ugly.
Ugh, that's terrible. Alternative:
We return an array of (size, arg_data) pairs. The user adds these to
their kernel. We'll provide a struct that they initialize somewhere at
the top of their kernel to pack all our programmatically-generated
arguments into something they can pass around reasonable. Ugly, but not
a showstopper.
pgpfHkeIdEYSV.pgp
Description: PGP signature
