We have some motivated users that would like a way to assemble matrices
on a device, without needing to store all the element matrices to global
memory or to transfer them to the CPU.  Given GPU execution models, this
means we need something that can be done on-the-spot in kernels.  So
what about a function that can be called by device threads?

PetscErrorCode MatOpenCLGetSetValuesSource(Mat, synchronization_mechanism, char 
**);

The user concatenates this type-specialized code into their source and
calls MatSetValues().  The users I'm talking to here synchronize by
coordinating threads using coloring of a sort.  The user still needs to
call MatAssemblyBegin/End from outside a kernel, though that function
may or may not need to invoke its own kernel.

Crazy?

Attachment: pgpCNbw1Ocpld.pgp
Description: PGP signature

Reply via email to