Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Stefano Zampini
DILU in openfoam is our block Jacobi ilu subdomain solvers On Tue, Jan 10, 2023, 23:45 Barry Smith wrote: > > The default is some kind of Jacobi plus Chebyshev, for a certain class > of problems, it is quite good. > > > > On Jan 10, 2023, at 3:31 PM, Mark Lohry wrote: > > So what are people

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Barry Smith
The default is some kind of Jacobi plus Chebyshev, for a certain class of problems, it is quite good. > On Jan 10, 2023, at 3:31 PM, Mark Lohry wrote: > > So what are people using for GAMG configs on GPU? I was hoping petsc today > would be performance competitive with AMGx but it sounds

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Mark Lohry
So what are people using for GAMG configs on GPU? I was hoping petsc today would be performance competitive with AMGx but it sounds like that's not the case? On Tue, Jan 10, 2023 at 3:03 PM Jed Brown wrote: > Mark Lohry writes: > > > I definitely need multigrid. I was under the impression that

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Jed Brown
Mark Lohry writes: > I definitely need multigrid. I was under the impression that GAMG was > relatively cuda-complete, is that not the case? What functionality works > fully on GPU and what doesn't, without any host transfers (aside from > what's needed for MPI)? > > If I use -ksp-pc_type gamg

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Mark Lohry
> > BTW, on unstructured grids, coloring requires a lot of colors and thus > many times more bandwidth (due to multiple passes) than the operator itself. I've noticed -- in AMGx the multicolor GS was generally dramatically slower than jacobi because of lots of colors with few elements. You can

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Jed Brown
The joy of GPUs. You can use sparse triangular kernels like ILU (provided by cuBLAS), but they are so mindbogglingly slow that you'll go back to the drawing board and try to use a multigrid method of some sort with polynomial/point-block smoothing. BTW, on unstructured grids, coloring requires

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Mark Lohry
Well that's suboptimal. What are my options for 100% GPU solves with no host transfers? On Tue, Jan 10, 2023, 2:23 PM Barry Smith wrote: > > > On Jan 10, 2023, at 2:19 PM, Mark Lohry wrote: > > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if >> the node size is not

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Barry Smith
> On Jan 10, 2023, at 2:19 PM, Mark Lohry wrote: > >> Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if >> the node size is not uniform). The are good choices for scale-resolving CFD >> on GPUs. > > I was hoping you'd know :) pbjacobi is underperforming ilu by a

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Mark Lohry
> > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if > the node size is not uniform). The are good choices for scale-resolving CFD > on GPUs. > I was hoping you'd know :) pbjacobi is underperforming ilu by a pretty wide margin on some of the systems i'm looking at. We

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Barry Smith
We don't have colored smoothers currently in PETSc. > On Jan 10, 2023, at 12:56 PM, Jed Brown wrote: > > Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if the > node size is not uniform). The are good choices for scale-resolving CFD on > GPUs. > > Mark Lohry

Re: [petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Jed Brown
Is DILU a point-block method? We have -pc_type pbjacobi (and vpbjacobi if the node size is not uniform). The are good choices for scale-resolving CFD on GPUs. Mark Lohry writes: > I'm running GAMG with CUDA, and I'm wondering how the nominally serial > smoother algorithms are implemented on

[petsc-users] GPU implementation of serial smoothers

2023-01-10 Thread Mark Lohry
I'm running GAMG with CUDA, and I'm wondering how the nominally serial smoother algorithms are implemented on GPU? Specifically SOR/GS and ILU(0) -- in e.g. AMGx these are applied by first creating a coloring, and the smoother passes are done color by color. Is this how it's done in petsc AMG?