Hi Lawrence,
>> That's it for now, after some more refining I'll start with a careful >> migration of the code/concepts into PETSc. Comments are, of course, >> always welcome. > > So we're working on FE assembly + solve on GPUs using fenics kernels > (github.com/OP2/PyOP2). For the GPU solve, it would be nice if we could > backdoor assembled matrices straight on to the GPU. That is, create a Mat > saying "this is the sparsity pattern" and then, rather than calling > MatSetValues on the host, just pass a pointer to the device data. Thanks for the input. My reference implementation supports such kind of backdooring, so there is no conceptional problem with that. What I don't know yet is 'The Right Way' of integrating this functionality into the existing PETSc interface routines. Anyhow, I see this as an essential feature, so it's on my roadmap already. > At the moment, we're doing a similar thing using CUSP, but are looking at > doing multi-GPU assembly + solve and would like not to have to reinvent too > many wheels, in particular, the MPI-parallel layer. Additionally, we're > already using PETSc for the CPU-side linear algebra so it would be nice to > use the same interface everywhere. Yes, that's what we are aiming for. The existing MPI-layer just works well irrespective of whether you're dealing with CPUs or GPUs on each rank. > I guess effectively we'd like something like MatCreateSeqAIJWithArrays and > MatCreateMPIAIJWithSplitArrays but with the ability to pass device pointers > rather than host pointers. Is there any roadmap in PETSc for this kind of > thing? Would patches in this direction be welcome? Type safety is a bit nasty. CUDA allows to deal with plain 'void *', while OpenCL expects cl_mem. This suggests to use something like MatCreateSeqAIJWithCUDAArrays(), MatCreateSeqAIJWithOpenCLArrays(), but as I said above, I haven't come to a decision on that yet. I'm not aware of any roadmap on the GPU part, but I want to integrate this rather sooner than later. Patches are of course welcome, either for the current branch, or later on based on the refurbished GPU extensions. Best regards, Karli
