Another thing perhaps of interest is the stencil-based GPU matrix assembly functionality that Mark introduced.
> Am 13.03.2021 um 07:59 schrieb Stefano Zampini <stefano.zamp...@gmail.com>: > > The COO assembly is entirely based on thrust primitives, I don’t have much > experience to say we will get a serious speedup by writing our own kernels, > but it is definitely worth a try if we will end up adopting COO as entry > point for GPU irregular assembly. > Jed, you mentioned BDDC deluxe, what do you mean by that? Porting > setup/application of deluxe scaling onto GPU? > > Timings are not so bad for me joining the hackaton. > >> On Mar 13, 2021, at 8:17 AM, Barry Smith <bsm...@petsc.dev >> <mailto:bsm...@petsc.dev>> wrote: >> >> >> >>> On Mar 12, 2021, at 10:49 PM, Jed Brown <j...@jedbrown.org >>> <mailto:j...@jedbrown.org>> wrote: >>> >>> Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> writes: >>> >>>>> On Mar 12, 2021, at 6:58 PM, Jed Brown <j...@jedbrown.org >>>>> <mailto:j...@jedbrown.org>> wrote: >>>>> >>>>> Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> writes: >>>>> >>>>>> I think we should start porting the PetscFE infrastructure, numerical >>>>>> integrations, vector and matrix assembly to GPUs soon. It is dog slow on >>>>>> CPUs and should be able to deliver higher performance on GPUs. >>>>> >>>>> IMO, this comes via interfaces to libCEED, not rolling yet another way to >>>>> invoke quadrature routines on GPUs. >>>> >>>> I am not talking about matrix-free stuff, that definitely belongs in >>>> libCEED, no reason to rewrite. >>>> >>>> But does libCEED also support the traditional finite element construction >>>> process where the matrices are built explicitly? Or does it provide some >>>> of the code, integration points, integration formula etc. that could be >>>> shared and used as a starting point? If it includes all of these >>>> "traditional" things then we should definitely get it all hooked into >>>> PetscFE/DMPLEX and go to town. (But yes not so much need for the GPU >>>> hackathon since it is wiring more than GPU code). The way I have always >>>> heard about libCEED was as a matrix-free engine, so I may have miss >>>> understood. It is definitely not my intention to start a project that >>>> reproduces functionality that we can just use. >>> >>> MFEM wants this too and it's in a draft libCEED PR right now. My intent is >>> to ensure it's compatible with Stefano's split-phase COO assembly. >> >> Cool, would this be something that, in combination with perhaps some >> libCEED folk, could be incorporated in the Hackathon? Anyone can join our >> group Hackathon group, they don't have to have any financial connection with >> "PETSc". >> >>> >>>> We do need solid support for traditional finite element assembly on GPUs, >>>> matrix-free finite elements alone is not enough. >>> >>> Agreed, and while libCEED could be further optimized for lowest order, even >>> naive assembly will be faster than what's in DMPlex. >