Another thing perhaps of interest is the stencil-based GPU matrix assembly 
functionality that Mark introduced.

> Am 13.03.2021 um 07:59 schrieb Stefano Zampini <stefano.zamp...@gmail.com>:
> 
> The COO assembly is entirely based on thrust primitives, I don’t have much 
> experience to say we will get a serious speedup by writing our own kernels, 
> but it is definitely worth a try if we will end up adopting COO as entry 
> point for GPU irregular assembly.
> Jed, you mentioned BDDC deluxe, what do you mean by that? Porting 
> setup/application of deluxe scaling onto GPU?
> 
> Timings are not so bad for me joining the hackaton. 
> 
>> On Mar 13, 2021, at 8:17 AM, Barry Smith <bsm...@petsc.dev 
>> <mailto:bsm...@petsc.dev>> wrote:
>> 
>> 
>> 
>>> On Mar 12, 2021, at 10:49 PM, Jed Brown <j...@jedbrown.org 
>>> <mailto:j...@jedbrown.org>> wrote:
>>> 
>>> Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> writes:
>>> 
>>>>> On Mar 12, 2021, at 6:58 PM, Jed Brown <j...@jedbrown.org 
>>>>> <mailto:j...@jedbrown.org>> wrote:
>>>>> 
>>>>> Barry Smith <bsm...@petsc.dev <mailto:bsm...@petsc.dev>> writes:
>>>>> 
>>>>>>    I think we should start porting the PetscFE infrastructure, numerical 
>>>>>> integrations, vector and matrix assembly to GPUs soon. It is dog slow on 
>>>>>> CPUs and should be able to deliver higher performance on GPUs. 
>>>>> 
>>>>> IMO, this comes via interfaces to libCEED, not rolling yet another way to 
>>>>> invoke quadrature routines on GPUs.
>>>> 
>>>>  I am not talking about matrix-free stuff, that definitely belongs in 
>>>> libCEED, no reason to rewrite. 
>>>> 
>>>>  But does libCEED also support the traditional finite element construction 
>>>> process where the matrices are built explicitly? Or does it provide some 
>>>> of the code, integration points, integration formula etc. that could be 
>>>> shared and used as a starting point? If it includes all of these 
>>>> "traditional" things then we should definitely get it all hooked into 
>>>> PetscFE/DMPLEX and go to town. (But yes not so much need for the GPU 
>>>> hackathon since it is wiring more than GPU code). The way I have always 
>>>> heard about libCEED was as a matrix-free engine, so I may have miss 
>>>> understood. It is definitely not my intention to start a project that 
>>>> reproduces functionality that we can just use. 
>>> 
>>> MFEM wants this too and it's in a draft libCEED PR right now. My intent is 
>>> to ensure it's compatible with Stefano's split-phase COO assembly. 
>> 
>>  Cool, would this be something that, in combination with perhaps some 
>> libCEED folk, could be incorporated in the Hackathon? Anyone can join our 
>> group Hackathon group, they don't have to have any financial connection with 
>> "PETSc". 
>> 
>>> 
>>>>  We do need solid support for traditional finite element assembly on GPUs, 
>>>> matrix-free finite elements alone is not enough.
>>> 
>>> Agreed, and while libCEED could be further optimized for lowest order, even 
>>> naive assembly will be faster than what's in DMPlex.
> 

Reply via email to