Karl Rupp <[email protected]> writes: > a) > I think this needs a second thought on how we manage the raw OpenCL > buffers. My suggestion last year was that we 'wrap' pointers to raw > memory buffers into something like > struct generic_ptr { > void * cpu_ptr; > void * cuda_ptr; > cl_mem opencl_ptr; > }; > underneath the 'special pointer' for Vec and Mat, but we then decided on > using a library-specific dispatch, i.e. spptr points to whatever a > library needs. For MatOpenCLGetSetValuesSource() we would have to be > very careful in the way the buffers are passed to the kernel, as > different OpenCL backends may expect slightly different semantics. > Currently we only have ViennaCL for that purpose, but even though it is > 'my own' library, there is no point in being restrictive here.
Perhaps that *GetSource method should also return an opaque device "Mat" pointer that the user is responsible for shepherding into the kernel From which they call the device MatSetValues? > b) > Other than that, I'm not sure whether I understand the semantics of the > proposed function correctly. In order for MatOpenCLGetSetValuesSource() > to be callable by device threads, The *GetSource method would be called from the CPU and would return a string containing the implementation of a type-specialized MatSetValues implementation. The user would prepend its source to the string they pass to the OpenCL compiler. Their own part of that string would contain code that calls MatSetValues (perhaps with a name that makes it clear that it's running on the device). > it needs to be all embedded into the OpenCL sources, which means that > it has no knowledge about any of the PETSc types. If, on the other > hand, this is supposed to be a PETSc function, then I don't know what > 'synchronization_mechanism' is supposed to do. In addition, the OpenCL > context and command queue should be passed as parameters to > MatOpenCLGetSetValuesSource(). Suppose the column indices have been set in advance. Now if the application already has a way of preventing conflicted cross-threadblock writes to those slots within an insertion round (e.g., coloring), PETSc would not need any synchronization and wouldn't need to stash possibly-conflicted writes elsewhere. Otherwise, PETSc would have to manage the stashing, use atomics, or some other scheme.
pgpvoeGY1ozdB.pgp
Description: PGP signature
