date:20231214

Re: [petsc-users] Matvecs and KSPSolves with multiple vectors

2023-12-14 Thread Pierre Jolivet


> On 14 Dec 2023, at 11:45 PM, Sreeram R Venkat  wrote:
> 
> Thanks, I will try to create a minimal reproducible example. This may take me 
> some time though, as I need to figure out how to extract only the relevant 
> parts (the full program this solve is used in is getting quite complex).

You could just do -ksp_view_mat binary:Amat.bin -ksp_view_pmat binary:Pmat.bin 
-ksp_view_rhs binary:rhs.bin and send me those three files (I’m guessing your 
are using double-precision scalars with 32-bit PetscInt).

> I'll also try out some of the BoomerAMG options to see if that helps.

These should work (this is where all “PCMatApply()-ready” PC are being tested): 
https://petsc.org/release/src/ksp/ksp/tutorials/ex79.c.html#line215
You can see it’s also testing PCHYPRE + KSPHPDDM on device (but not with HIP).
I’m aware the performance should not be optimal (see your comment about 
host/device copies), I’ve money to hire someone to work on this but: a) I need 
to find the correct engineer/post-doc, b) I currently don’t have good use cases 
(of course, I could generate a synthetic benchmark, for science).
So even if you send me the three Mat, a MWE would be appreciated if the 
KSPMatSolve() is performance-critical for you (see point b) from above).

Thanks,
Pierre

> Thanks,
> Sreeram
> 
> On Thu, Dec 14, 2023, 1:12 PM Pierre Jolivet  > wrote:
>> 
>> 
>>> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat >> > wrote:
>>> 
>>> Hello Pierre,
>>> 
>>> Thank you for your reply. I tried out the HPDDM CG as you said, and it 
>>> seems to be doing the batched solves, but the KSP is not converging due to 
>>> a NaN or Inf being generated. I also noticed there are a lot of 
>>> host-to-device and device-to-host copies of the matrices (the non-batched 
>>> KSP solve did not have any memcopies). I have attached dump.0 again. Could 
>>> you please take a look?
>> 
>> Yes, but you’d need to send me something I can run with your set of options 
>> (if you are more confident doing this in private, you can remove the list 
>> from c/c).
>> Not all BoomerAMG smoothers handle blocks of right-hand sides, and there is 
>> not much error checking, so instead of erroring out, this may be the reason 
>> why you are getting garbage.
>> 
>> Thanks,
>> Pierre
>> 
>>> Thanks,
>>> Sreeram
>>> 
>>> On Thu, Dec 14, 2023 at 12:42 AM Pierre Jolivet >> > wrote:
 Hello Sreeram,
 KSPCG (PETSc implementation of CG) does not handle solves with multiple 
 columns at once.
 There is only a single native PETSc KSP implementation which handles 
 solves with multiple columns at once: KSPPREONLY.
 If you use --download-hpddm, you can use a CG (or GMRES, or more advanced 
 methods) implementation which handles solves with multiple columns at once 
 (via -ksp_type hpddm -ksp_hpddm_type cg or KSPSetType(ksp, KSPHPDDM); 
 KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);).
 I’m the main author of HPDDM, there is preliminary support for device 
 matrices, but if it’s not working as intended/not faster than column by 
 column, I’d be happy to have a deeper look (maybe in private), because 
 most (if not all) of my users interested in (pseudo-)block Krylov solvers 
 (i.e., solvers that treat right-hand sides in a single go) are using plain 
 host matrices.
 
 Thanks,
 Pierre
 
 PS: you could have a look at 
 https://www.sciencedirect.com/science/article/abs/pii/S089812212155 to 
 understand the philosophy behind block iterative methods in PETSc (and in 
 HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned earlier, was 
 developed in the context of this paper to produce Figures 2-3. Note that 
 this paper is now slightly outdated, since then, PCHYPRE and PCMG (among 
 others) have been made “PCMatApply()-ready”.
 
> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat  > wrote:
> 
> Hello Pierre,
> 
> I am trying out the KSPMatSolve with the BoomerAMG preconditioner. 
> However, I am noticing that it is still solving column by column (this is 
> stated explicitly in the info dump attached). I looked at the code for 
> KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is true, 
> it should do the batched solve, though I'm not sure where that gets set. 
> 
> I am using the options -pc_type hypre -pc_hypre_type boomeramg when 
> running the code.
> 
> Can you please help me with this?
> 
> Thanks,
> Sreeram
> 
> 
> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams  > wrote:
>> N.B., AMGX interface is a bit experimental.
>> Mark
>> 
>> On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat > > wrote:
>>> Oh, in that case I will try out BoomerAMG. Getting AMGX to build 
>>> correctly was also tricky so

[petsc-users] PETSc configuration to solve Poisson equation on a 2D cartesian grid of points with nVidia GPUs (CUDA)

2023-12-14 Thread Vittorio Sciortino




Dear PETSc developers,

My name is Vittorio Sciortion, I am a PhD student in Italy and I am 
really curious about the applications and  possibilities of your 
library. I would ask you two questions about PETSc.


My study case consists in the development of a 2D electrostatic Particle 
In Cell code which simulates a plasma interacting with the shaped 
surface of adjacent divertor mono-blocks.
This type of scenario requires to solve the electro-static Poisson 
equation on the whole set of grid nodes (a cartesian grid) applying some 
boundary conditions.
Currently, we are using the KSPSolve subroutine set to apply the gmres 
iterative method in conjunction with hypre (used as pre-conditioner).
Some boundary conditons are necessary for our specific problem 
(Dirichlet and Neumann conditions on specific line of points).
I have two small curiosity about the possibilities offered by your 
library, which is very interesting:


1. are we using the best possible pair to solve our problem?

2. currently, PETSc is compiled with openMP parallelization and the 
iterative method is executed on the CPU.
Is it possible to configure the compilation of our library to execute 
these iterations on a nVidia GPU? Which are the best compilation options 
that you suggest for your library?


thank you in advance
Greetings
Vittorio Sciortino
PhD student in Physics
Bari, Italy

Recently, I sent a subscribe request to the users mailing list using 
another e-mail, because this one could be deactivated in two/three 
months.  private email: vsciortino.phdcou...@gmail.com

--
Vittorio Sciortino

Sostieni la formazione e la ricerca universitaria con il tuo 5 per mille 
all'Università di Bari.
Firma la casella "Finanziamento della ricerca scientifica e della 
Università"

indicando il codice fiscale 80002170720.

Il tuo contributo può fare la differenza: oggi più che mai!

Re: [petsc-users] Matvecs and KSPSolves with multiple vectors

2023-12-14 Thread Sreeram R Venkat

Thanks, I will try to create a minimal reproducible example. This may take
me some time though, as I need to figure out how to extract only the
relevant parts (the full program this solve is used in is getting quite
complex).

I'll also try out some of the BoomerAMG options to see if that helps.

Thanks,
Sreeram

On Thu, Dec 14, 2023, 1:12 PM Pierre Jolivet  wrote:

>
>
> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat  wrote:
>
> Hello Pierre,
>
> Thank you for your reply. I tried out the HPDDM CG as you said, and it
> seems to be doing the batched solves, but the KSP is not converging due to
> a NaN or Inf being generated. I also noticed there are a lot of
> host-to-device and device-to-host copies of the matrices (the non-batched
> KSP solve did not have any memcopies). I have attached dump.0 again. Could
> you please take a look?
>
>
> Yes, but you’d need to send me something I can run with your set of
> options (if you are more confident doing this in private, you can remove
> the list from c/c).
> Not all BoomerAMG smoothers handle blocks of right-hand sides, and there
> is not much error checking, so instead of erroring out, this may be the
> reason why you are getting garbage.
>
> Thanks,
> Pierre
>
> Thanks,
> Sreeram
>
> On Thu, Dec 14, 2023 at 12:42 AM Pierre Jolivet  wrote:
>
>> Hello Sreeram,
>> KSPCG (PETSc implementation of CG) does not handle solves with multiple
>> columns at once.
>> There is only a single native PETSc KSP implementation which handles
>> solves with multiple columns at once: KSPPREONLY.
>> If you use --download-hpddm, you can use a CG (or GMRES, or more advanced
>> methods) implementation which handles solves with multiple columns at once
>> (via -ksp_type hpddm -ksp_hpddm_type cg or KSPSetType(ksp, KSPHPDDM);
>> KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);).
>> I’m the main author of HPDDM, there is preliminary support for device
>> matrices, but if it’s not working as intended/not faster than column by
>> column, I’d be happy to have a deeper look (maybe in private), because most
>> (if not all) of my users interested in (pseudo-)block Krylov solvers (i.e.,
>> solvers that treat right-hand sides in a single go) are using plain host
>> matrices.
>>
>> Thanks,
>> Pierre
>>
>> PS: you could have a look at
>> https://www.sciencedirect.com/science/article/abs/pii/S089812212155 to
>> understand the philosophy behind block iterative methods in PETSc (and in
>> HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned earlier, was
>> developed in the context of this paper to produce Figures 2-3. Note that
>> this paper is now slightly outdated, since then, PCHYPRE and PCMG (among
>> others) have been made “PCMatApply()-ready”.
>>
>> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat 
>> wrote:
>>
>> Hello Pierre,
>>
>> I am trying out the KSPMatSolve with the BoomerAMG preconditioner.
>> However, I am noticing that it is still solving column by column (this is
>> stated explicitly in the info dump attached). I looked at the code for
>> KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is
>> true, it should do the batched solve, though I'm not sure where that gets
>> set.
>>
>> I am using the options -pc_type hypre -pc_hypre_type boomeramg when
>> running the code.
>>
>> Can you please help me with this?
>>
>> Thanks,
>> Sreeram
>>
>>
>> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams  wrote:
>>
>>> N.B., AMGX interface is a bit experimental.
>>> Mark
>>>
>>> On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat 
>>> wrote:
>>>
 Oh, in that case I will try out BoomerAMG. Getting AMGX to build
 correctly was also tricky so hopefully the HYPRE build will be easier.

 Thanks,
 Sreeram

 On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet  wrote:

>
>
> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat 
> wrote:
>
> Thank you Barry and Pierre; I will proceed with the first option.
>
> I want to use the AMGX preconditioner for the KSP. I will try it out
> and see how it performs.
>
>
> Just FYI, AMGX does not handle systems with multiple RHS, and thus has
> no PCMatApply() implementation.
> BoomerAMG does, and there is a PCMatApply_HYPRE_BoomerAMG()
> implementation.
> But let us know if you need assistance figuring things out.
>
> Thanks,
> Pierre
>
> Thanks,
> Sreeram
>
> On Thu, Dec 7, 2023 at 2:02 PM Pierre Jolivet  wrote:
>
>> To expand on Barry’s answer, we have observed repeatedly that
>> MatMatMult with MatAIJ performs better than MatMult with MatMAIJ, you can
>> reproduce this on your own with
>> https://petsc.org/release/src/mat/tests/ex237.c.html.
>> Also, I’m guessing you are using some sort of preconditioner within
>> your KSP.
>> Not all are “KSPMatSolve-ready”, i.e., they may treat blocks of
>> right-hand sides column by column, which is very inefficient.
>> You could run your code with -info dump and send us dump.0 to see

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

2023-12-14 Thread Jed Brown

I had a one-character typo in the diff above. This MR to release should work 
now.

https://gitlab.com/petsc/petsc/-/merge_requests/7120

Jed Brown  writes:

> 17 GB for a 1D DMDA, wow. :-)
>
> Could you try applying this diff to make it work for DMDA (it's currently 
> handled by DMPlex)?
>
> diff --git i/src/dm/impls/da/fdda.c w/src/dm/impls/da/fdda.c
> index cad4d926504..bd2a3bda635 100644
> --- i/src/dm/impls/da/fdda.c
> +++ w/src/dm/impls/da/fdda.c
> @@ -675,19 +675,21 @@ PetscErrorCode DMCreateMatrix_DA(DM da, Mat *J)
> specialized setting routines depend only on the particular preallocation
> details of the matrix, not the type itself.
>*/
> -  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIAIJSetPreallocation_C", ));
> -  if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqAIJSetPreallocation_C", ));
> -  if (!aij) {
> -PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIBAIJSetPreallocation_C", ));
> -if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqBAIJSetPreallocation_C", ));
> -if (!baij) {
> -  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISBAIJSetPreallocation_C", ));
> -  if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSBAIJSetPreallocation_C", ));
> -  if (!sbaij) {
> -PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISELLSetPreallocation_C", ));
> -if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSELLSetPreallocation_C", ));
> +  if (!dm->prealloc_skip) { // Flag is likely set when user intends to use 
> MatSetPreallocationCOO()
> +PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIAIJSetPreallocation_C", ));
> +if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqAIJSetPreallocation_C", ));
> +if (!aij) {
> +  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPIBAIJSetPreallocation_C", ));
> +  if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqBAIJSetPreallocation_C", ));
> +  if (!baij) {
> +PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISBAIJSetPreallocation_C", ));
> +if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSBAIJSetPreallocation_C", ));
> +if (!sbaij) {
> +  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatMPISELLSetPreallocation_C", ));
> +  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatSeqSELLSetPreallocation_C", ));
> +}
> +if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatISSetPreallocation_C", ));
>}
> -  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
> "MatISSetPreallocation_C", ));
>  }
>}
>if (aij) {
>
>
> "Fackler, Philip via petsc-users"  writes:
>
>> I'm using the following sequence of functions related to the Jacobian matrix:
>>
>> DMDACreate1d(..., );
>> DMSetFromOptions(da);
>> DMSetUp(da);
>> DMSetMatType(da, MATAIJKOKKOS);
>> DMSetMatrixPreallocateSkip(da, PETSC_TRUE);
>> Mat J;
>> DMCreateMatrix(da, );
>> MatSetPreallocationCOO(J, ...);
>>
>> I recently added the call to DMSetMatrixPreallocateSkip, hoping the 
>> allocation would be delayed to MatSetPreallocationCOO, and that it would 
>> require less memory. The 
>> documentation
>>  says that the data structures will not be preallocated. The following data 
>> from heaptrack shows that the allocation is still happening in the call to 
>> DMCreateMatrix.
>>
>> [cid:bda9ef12-a46f-47b2-9b9b-a4b2808b6b13]
>>
>> Can someone help me understand this?
>>
>> Thanks,
>>
>> Philip Fackler
>> Research Software Engineer, Application Engineering Group
>> Advanced Computing Systems Research Section
>> Computer Science and Mathematics Division
>> Oak Ridge National Laboratory

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

2023-12-14 Thread Matthew Knepley

On Thu, Dec 14, 2023 at 2:06 PM Fackler, Philip via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> I'm using the following sequence of functions related to the Jacobian
> matrix:
>
> DMDACreate1d(..., );
> DMSetFromOptions(da);
> DMSetUp(da);
> DMSetMatType(da, MATAIJKOKKOS);
> DMSetMatrixPreallocateSkip(da, PETSC_TRUE);
> Mat J;
> DMCreateMatrix(da, );
> MatSetPreallocationCOO(J, ...);
>
> I recently added the call to DMSetMatrixPreallocateSkip, hoping the
> allocation would be delayed to MatSetPreallocationCOO, and that it would
> require less memory. The documentation
>  says
> that the data structures will not be preallocated.
>

You are completely correct. DMDA is just ignoring this flag. We will fix it.

  Thanks for catching this.

 Matt


> The following data from heaptrack shows that the allocation is still
> happening in the call to DMCreateMatrix.
>
>
> Can someone help me understand this?
>
> Thanks,
>
>
> *Philip Fackler *
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> *Oak Ridge National Laboratory*
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

2023-12-14 Thread Jed Brown

17 GB for a 1D DMDA, wow. :-)

Could you try applying this diff to make it work for DMDA (it's currently 
handled by DMPlex)?

diff --git i/src/dm/impls/da/fdda.c w/src/dm/impls/da/fdda.c
index cad4d926504..bd2a3bda635 100644
--- i/src/dm/impls/da/fdda.c
+++ w/src/dm/impls/da/fdda.c
@@ -675,19 +675,21 @@ PetscErrorCode DMCreateMatrix_DA(DM da, Mat *J)
specialized setting routines depend only on the particular preallocation
details of the matrix, not the type itself.
   */
-  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIAIJSetPreallocation_C", ));
-  if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqAIJSetPreallocation_C", ));
-  if (!aij) {
-PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIBAIJSetPreallocation_C", ));
-if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqBAIJSetPreallocation_C", ));
-if (!baij) {
-  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISBAIJSetPreallocation_C", ));
-  if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSBAIJSetPreallocation_C", ));
-  if (!sbaij) {
-PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISELLSetPreallocation_C", ));
-if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSELLSetPreallocation_C", ));
+  if (!dm->prealloc_skip) { // Flag is likely set when user intends to use 
MatSetPreallocationCOO()
+PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIAIJSetPreallocation_C", ));
+if (!aij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqAIJSetPreallocation_C", ));
+if (!aij) {
+  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPIBAIJSetPreallocation_C", ));
+  if (!baij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqBAIJSetPreallocation_C", ));
+  if (!baij) {
+PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISBAIJSetPreallocation_C", ));
+if (!sbaij) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSBAIJSetPreallocation_C", ));
+if (!sbaij) {
+  PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatMPISELLSetPreallocation_C", ));
+  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatSeqSELLSetPreallocation_C", ));
+}
+if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatISSetPreallocation_C", ));
   }
-  if (!sell) PetscCall(PetscObjectQueryFunction((PetscObject)A, 
"MatISSetPreallocation_C", ));
 }
   }
   if (aij) {


"Fackler, Philip via petsc-users"  writes:

> I'm using the following sequence of functions related to the Jacobian matrix:
>
> DMDACreate1d(..., );
> DMSetFromOptions(da);
> DMSetUp(da);
> DMSetMatType(da, MATAIJKOKKOS);
> DMSetMatrixPreallocateSkip(da, PETSC_TRUE);
> Mat J;
> DMCreateMatrix(da, );
> MatSetPreallocationCOO(J, ...);
>
> I recently added the call to DMSetMatrixPreallocateSkip, hoping the 
> allocation would be delayed to MatSetPreallocationCOO, and that it would 
> require less memory. The 
> documentation
>  says that the data structures will not be preallocated. The following data 
> from heaptrack shows that the allocation is still happening in the call to 
> DMCreateMatrix.
>
> [cid:bda9ef12-a46f-47b2-9b9b-a4b2808b6b13]
>
> Can someone help me understand this?
>
> Thanks,
>
> Philip Fackler
> Research Software Engineer, Application Engineering Group
> Advanced Computing Systems Research Section
> Computer Science and Mathematics Division
> Oak Ridge National Laboratory

Re: [petsc-users] Matvecs and KSPSolves with multiple vectors

2023-12-14 Thread Pierre Jolivet



> On 14 Dec 2023, at 8:02 PM, Sreeram R Venkat  wrote:
> 
> Hello Pierre,
> 
> Thank you for your reply. I tried out the HPDDM CG as you said, and it seems 
> to be doing the batched solves, but the KSP is not converging due to a NaN or 
> Inf being generated. I also noticed there are a lot of host-to-device and 
> device-to-host copies of the matrices (the non-batched KSP solve did not have 
> any memcopies). I have attached dump.0 again. Could you please take a look?

Yes, but you’d need to send me something I can run with your set of options (if 
you are more confident doing this in private, you can remove the list from c/c).
Not all BoomerAMG smoothers handle blocks of right-hand sides, and there is not 
much error checking, so instead of erroring out, this may be the reason why you 
are getting garbage.

Thanks,
Pierre

> Thanks,
> Sreeram
> 
> On Thu, Dec 14, 2023 at 12:42 AM Pierre Jolivet  > wrote:
>> Hello Sreeram,
>> KSPCG (PETSc implementation of CG) does not handle solves with multiple 
>> columns at once.
>> There is only a single native PETSc KSP implementation which handles solves 
>> with multiple columns at once: KSPPREONLY.
>> If you use --download-hpddm, you can use a CG (or GMRES, or more advanced 
>> methods) implementation which handles solves with multiple columns at once 
>> (via -ksp_type hpddm -ksp_hpddm_type cg or KSPSetType(ksp, KSPHPDDM); 
>> KSPHPDDMSetType(ksp, KSP_HPDDM_TYPE_CG);).
>> I’m the main author of HPDDM, there is preliminary support for device 
>> matrices, but if it’s not working as intended/not faster than column by 
>> column, I’d be happy to have a deeper look (maybe in private), because most 
>> (if not all) of my users interested in (pseudo-)block Krylov solvers (i.e., 
>> solvers that treat right-hand sides in a single go) are using plain host 
>> matrices.
>> 
>> Thanks,
>> Pierre
>> 
>> PS: you could have a look at 
>> https://www.sciencedirect.com/science/article/abs/pii/S089812212155 to 
>> understand the philosophy behind block iterative methods in PETSc (and in 
>> HPDDM), src/mat/tests/ex237.c, the benchmark I mentioned earlier, was 
>> developed in the context of this paper to produce Figures 2-3. Note that 
>> this paper is now slightly outdated, since then, PCHYPRE and PCMG (among 
>> others) have been made “PCMatApply()-ready”.
>> 
>>> On 13 Dec 2023, at 11:05 PM, Sreeram R Venkat >> > wrote:
>>> 
>>> Hello Pierre,
>>> 
>>> I am trying out the KSPMatSolve with the BoomerAMG preconditioner. However, 
>>> I am noticing that it is still solving column by column (this is stated 
>>> explicitly in the info dump attached). I looked at the code for 
>>> KSPMatSolve_Private() and saw that as long as ksp->ops->matsolve is true, 
>>> it should do the batched solve, though I'm not sure where that gets set. 
>>> 
>>> I am using the options -pc_type hypre -pc_hypre_type boomeramg when running 
>>> the code.
>>> 
>>> Can you please help me with this?
>>> 
>>> Thanks,
>>> Sreeram
>>> 
>>> 
>>> On Thu, Dec 7, 2023 at 4:04 PM Mark Adams >> > wrote:
 N.B., AMGX interface is a bit experimental.
 Mark
 
 On Thu, Dec 7, 2023 at 4:11 PM Sreeram R Venkat >>> > wrote:
> Oh, in that case I will try out BoomerAMG. Getting AMGX to build 
> correctly was also tricky so hopefully the HYPRE build will be easier.
> 
> Thanks,
> Sreeram
> 
> On Thu, Dec 7, 2023, 3:03 PM Pierre Jolivet  > wrote:
>> 
>> 
>>> On 7 Dec 2023, at 9:37 PM, Sreeram R Venkat >> > wrote:
>>> 
>>> Thank you Barry and Pierre; I will proceed with the first option. 
>>> 
>>> I want to use the AMGX preconditioner for the KSP. I will try it out 
>>> and see how it performs.
>> 
>> Just FYI, AMGX does not handle systems with multiple RHS, and thus has 
>> no PCMatApply() implementation.
>> BoomerAMG does, and there is a PCMatApply_HYPRE_BoomerAMG() 
>> implementation.
>> But let us know if you need assistance figuring things out.
>> 
>> Thanks,
>> Pierre
>> 
>>> Thanks,
>>> Sreeram
>>> 
>>> On Thu, Dec 7, 2023 at 2:02 PM Pierre Jolivet >> > wrote:
 To expand on Barry’s answer, we have observed repeatedly that 
 MatMatMult with MatAIJ performs better than MatMult with MatMAIJ, you 
 can reproduce this on your own with 
 https://petsc.org/release/src/mat/tests/ex237.c.html.
 Also, I’m guessing you are using some sort of preconditioner within 
 your KSP.
 Not all are “KSPMatSolve-ready”, i.e., they may treat blocks of 
 right-hand sides column by column, which is very inefficient.
 You could run your code with -info dump and send us dump.0 to see what 
 needs to be done on our end to make things more efficient,

[petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

2023-12-14 Thread Fackler, Philip via petsc-users

I'm using the following sequence of functions related to the Jacobian matrix:

DMDACreate1d(..., );
DMSetFromOptions(da);
DMSetUp(da);
DMSetMatType(da, MATAIJKOKKOS);
DMSetMatrixPreallocateSkip(da, PETSC_TRUE);
Mat J;
DMCreateMatrix(da, );
MatSetPreallocationCOO(J, ...);

I recently added the call to DMSetMatrixPreallocateSkip, hoping the allocation 
would be delayed to MatSetPreallocationCOO, and that it would require less 
memory. The 
documentation
 says that the data structures will not be preallocated. The following data 
from heaptrack shows that the allocation is still happening in the call to 
DMCreateMatrix.

[cid:bda9ef12-a46f-47b2-9b9b-a4b2808b6b13]

Can someone help me understand this?

Thanks,

Philip Fackler
Research Software Engineer, Application Engineering Group
Advanced Computing Systems Research Section
Computer Science and Mathematics Division
Oak Ridge National Laboratory

Re: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower) convergence for classical AMG (sequential and especially in parallel)

2023-12-14 Thread Mark Adams

On Thu, Dec 14, 2023 at 10:15 AM LEDAC Pierre  wrote:

> Hello Mark,
>
>
> Thanks for your answer. Indeed, I didn't see the information that
> classical AMG was not really supported:
>
>
>  -solver2_pc_gamg_type : Type of AMG method 
> (only
> 'agg' supported and useful) (one of) classical geo agg (PCGAMGSetType)
>
> We switched very recently from GAMG("agg") to GAMG("classical") for a weak
> scaling test up to 32000 cores, where we saw very good scalability with 
> GAMG("classical")
> compared to GAMG("agg"). But it was with PETSc 3.14...
>

AMG is sensitive to parameters.
What PDE and discretization are you solving?
For example, I recently optimized the Q2 Laplacian benchmark and found good
scaling with
-pc_gamg_threshold 0.04 -pc_gamg_threshold_scale .25
Hypre scaled well without optimization (see below),


>
> So today, we are going to upgrade to 3.20 and focus on GAMG("agg") or
> Hypre Classical AMG. We will see how it compares.
>

You might want to update to v3.20.2
That has some of my recent GAMG updates.


> May I ask you what is your point of view of the current state of the GPU
> versions of GAMG("agg") versus Hypre AMG Classical ?
>

Hypre is well supported with several developers over decades, whereas I
really just maintain GAMG + I add some things like anisotropy support
recently/currently.
But, I build on the PETSc sparse linear algebra that is well supported in
PETSc and hypre, and we have several good people doing that.

TL;DR
Both run the solve and matrix setup phase on the GPU.
Hypre puts the graph setup phase on the GPU, but this phase is 1) not well
suited to GPUs and 2) is amortized in most applications (just done once).
GAMG is easier to deal with because it is built-in and the interface to
hypre can be fragile with respect to GPUs (eg, if you use '-mat_type
hypre') in my experience.
If performance is critical and you have the time to put into it, hypre will
be a good option, and GAMG can be a backup.


>
> In fact, the reason of our move from 3.14 to 3.20 is to take advantage of
> all the progress in PETSc and Hypre on accelerated solvers/preconditioners
> during the last 2 years.
>
>
And I can give you advice on GAMG parameters, if you send me the output
with '-info :pc' (and 'grep GAMG').

Thanks,
Mark


> Greatly appreciate your help,
>
> Pierre LEDAC
> Commissariat à l’énergie atomique et aux énergies alternatives
> Centre de SACLAY
> DES/ISAS/DM2S/SGLS/LCAN
> Bâtiment 451 – point courrier n°43
> F-91191 Gif-sur-Yvette
> +33 1 69 08 04 03
> +33 6 83 42 05 79
> --
> *De :* Mark Adams 
> *Envoyé :* mercredi 13 décembre 2023 20:54:17
> *À :* LEDAC Pierre
> *Cc :* petsc-users@mcs.anl.gov; BRUNETON Adrien
> *Objet :* Re: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower)
> convergence for classical AMG (sequential and especially in parallel)
>
> Hi Pierre,
>
> Sorry I missed this post and your issues were brought to my attention
> today.
>
> First, the classic version is not supported well. The postdoc that wrote
> the code is long gone and I don't know the code at all.
> It is really a reference implementation that someone could build on and is
> not meant for production.
> In 10 years you are the first user that has connected us.
>
> The hypre package is a very good AMG solver and it uses classical AMG as
> the main solver.
> I wrote GAMG ("agg") which is a smoothed aggregation AMG solver and is
> very different from classical.
> I would suggest you move to hypre or '-pc_gamg_type agg'.
>
> The coarsening was developed in this time frame and there was a lot of
> churn as a new strategy for aggressive coarsening did not work well for
> some users and I had to add the old method in and then made it the default
> (again).
> This change missed v3.20, but you can get the old aggressive strategy with
> '-pc_gamg_aggressive_square_graph'.
> Check with -options_left to check that it is being used.
>
> As far as your output (nice formatting, thank you), the coarse grid is
> smaller in the new code.
> rows=41, cols=41   |  rows=30, cols=30
> "square graph" should fix this.
>
> You can also try not using aggressive coarsening with:
> You could try '-pc_gamg_aggressive_coarsening 0'
>
> Let me know how it goes and let's try to get you into a more sustainable
> state ... I really try not to change this code but sometimes need to.
>
> Thanks,
> Mark
>
>
>
>
>
> On Mon, Oct 9, 2023 at 10:43 AM LEDAC Pierre  wrote:
>
>> Hello all,
>>
>>
>> I am struggling to find the same convergence in iterations when using
>> classical algebric multigrid in my code with PETSc 3.20 compared to PETSc
>> 3.14.
>>
>>
>> I am using in order to solve a Poisson system:
>>
>> *-ksp_type cg -pc_type gamg -pc_gamg_type classical*
>>
>>
>> I read the different releases notes between 3.15 and 3.20:
>>
>> https://petsc.org/release/changes/317
>>
>> https://petsc.org/main/manualpages/PC/PCGAMGSetThreshold/
>>
>>
>> And have a look at the archive mailing list (especially

Re: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower) convergence for classical AMG (sequential and especially in parallel)

2023-12-14 Thread LEDAC Pierre

Hello Mark,


Thanks for your answer. Indeed, I didn't see the information that classical AMG 
was not really supported:


 -solver2_pc_gamg_type : Type of AMG method (only 
'agg' supported and useful) (one of) classical geo agg (PCGAMGSetType)

We switched very recently from GAMG("agg") to GAMG("classical") for a weak 
scaling test up to 32000 cores, where we saw very good scalability with 
GAMG("classical") compared to GAMG("agg"). But it was with PETSc 3.14...

So today, we are going to upgrade to 3.20 and focus on GAMG("agg") or Hypre 
Classical AMG. We will see how it compares.

May I ask you what is your point of view of the current state of the GPU 
versions of GAMG("agg") versus Hypre AMG Classical ?

In fact, the reason of our move from 3.14 to 3.20 is to take advantage of all 
the progress in PETSc and Hypre on accelerated solvers/preconditioners during 
the last 2 years.

Greatly appreciate your help,


Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 – point courrier n°43
F-91191 Gif-sur-Yvette
+33 1 69 08 04 03
+33 6 83 42 05 79

De : Mark Adams 
Envoyé : mercredi 13 décembre 2023 20:54:17
À : LEDAC Pierre
Cc : petsc-users@mcs.anl.gov; BRUNETON Adrien
Objet : Re: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower) 
convergence for classical AMG (sequential and especially in parallel)

Hi Pierre,

Sorry I missed this post and your issues were brought to my attention today.

First, the classic version is not supported well. The postdoc that wrote the 
code is long gone and I don't know the code at all.
It is really a reference implementation that someone could build on and is not 
meant for production.
In 10 years you are the first user that has connected us.

The hypre package is a very good AMG solver and it uses classical AMG as the 
main solver.
I wrote GAMG ("agg") which is a smoothed aggregation AMG solver and is very 
different from classical.
I would suggest you move to hypre or '-pc_gamg_type agg'.

The coarsening was developed in this time frame and there was a lot of churn as 
a new strategy for aggressive coarsening did not work well for some users and I 
had to add the old method in and then made it the default (again).
This change missed v3.20, but you can get the old aggressive strategy with 
'-pc_gamg_aggressive_square_graph'.
Check with -options_left to check that it is being used.

As far as your output (nice formatting, thank you), the coarse grid is smaller 
in the new code.
rows=41, cols=41   |  rows=30, cols=30
"square graph" should fix this.

You can also try not using aggressive coarsening with:
You could try '-pc_gamg_aggressive_coarsening 0'

Let me know how it goes and let's try to get you into a more sustainable state 
... I really try not to change this code but sometimes need to.

Thanks,
Mark





On Mon, Oct 9, 2023 at 10:43 AM LEDAC Pierre 
mailto:pierre.le...@cea.fr>> wrote:

Hello all,


I am struggling to find the same convergence in iterations when using classical 
algebric multigrid in my code with PETSc 3.20 compared to PETSc 3.14.


I am using in order to solve a Poisson system:

-ksp_type cg -pc_type gamg -pc_gamg_type classical


I read the different releases notes between 3.15 and 3.20:

https://petsc.org/release/changes/317

https://petsc.org/main/manualpages/PC/PCGAMGSetThreshold/


And have a look at the archive mailing list (especially this one: 
https://www.mail-archive.com/petsc-users@mcs.anl.gov/msg46688.html)

so I added some other options to try to have the same behaviour than PETSc 3.14:


-ksp_type cg -pc_type gamg -pc_gamg_type classical -mg_levels_pc_type sor 
-pc_gamg_threshold 0.


It improves the convergence but there still a different convergence though (26 
vs 18 iterations).

On another of my test case, the number of levels is different (e.g. 6 vs 4) 
also, and here it is the same, but with a different coarsening according to the 
output from the -ksp_view option


The main point is that the convergence dramatically degrades in parallel on a 
third test case, so I can't upgrade to PETSc 3.20 for now unhappily.


I send you the partial report (petsc_314_vs_petsc_320.ksp_view) with -ksp_view 
(left PETSc 3.14, right PETSc 3.20) and the configure/command line options used 
(in petsc_XXX_petsc.TU files).


Could my issue related to the following 3.18 change ? I have not tried the 
first one.

  *   Remove PCGAMGSetSymGraph() and -pc_gamg_sym_graph. The user should now 
indicate symmetry and structural symmetry using 
MatSetOption() and 
GAMG will symmetrize the graph if a symmetric options is not set

  *   Change -pc_gamg_reuse_interpolation default from false to true.


Any advice would be greatly appreciated,


Pierre LEDAC
Commissariat à l’énergie atomique et aux énergies alternatives
Centre de SACLAY
DES/ISAS/DM2S/SGLS/LCAN
Bâtiment 451 –

Re: [petsc-users] 捕获

2023-12-14 Thread Matthew Knepley

On Thu, Dec 14, 2023 at 1:27 AM 291--- via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Dear SLEPc Developers,
>
> I a am student from Tongji University. Recently I am trying to write a c++
> program for matrix solving, which requires importing the PETSc library that
> you have developed. However a lot of errors occur in the cpp file when I
> use #include  directly. I also try to use extern "C" but it
> gives me the error in the picture below. Is there a good way to use the
> PETSc library in a c++ program? (I compiled using cmake and my compiler is
> g++ (GCC) 4.8.5 20150623 (Red Hat 4.8.5-44)).
>
> My cmakelists.txt is:
>
> cmake_minimum_required(VERSION 3.1.0)
>
> set(CMAKE_INSTALL_RPATH_USE_LINK_PATH TRUE)
>
> set(PETSC $ENV{PETSC_DIR}/$ENV{PETSC_ARCH})
> set(SLEPC $ENV{SLEPC_DIR}/$ENV{PETSC_ARCH})
> set(ENV{PKG_CONFIG_PATH} ${PETSC}/lib/pkgconfig:${SLEPC}/lib/pkgconfig)
>
> set (CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -std=c++11")
> set (CMAKE_C_FLAGS "${CMAKE_C_FLAGS} -std=c99")
>
> project(test)
>
> add_executable(${PROJECT_NAME} eigen_test2.cpp)
> find_package(PkgConfig REQUIRED)
>
> pkg_search_module(PETSc REQUIRED IMPORTED_TARGET PETSc)
> target_link_libraries(${PROJECT_NAME} PkgConfig::PETSc)
>
> The testing code is:eigen_test2.cpp
>

First, get rid of the "extern C" in front of the headers.

  Thanks,

 Matt


> extern "C"{
>//#include 
>#include 
>#include 
>#include 
>#include 
> }
>
> int main(int argc,char **argv)
> {
>return 0;
> }
>
>
>
> Best regards
>
> Weijie Xu
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

Re: [petsc-users] Matvecs and KSPSolves with multiple vectors

[petsc-users] PETSc configuration to solve Poisson equation on a 2D cartesian grid of points with nVidia GPUs (CUDA)

Re: [petsc-users] Matvecs and KSPSolves with multiple vectors

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

Re: [petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

Re: [petsc-users] Matvecs and KSPSolves with multiple vectors

[petsc-users] Call to DMSetMatrixPreallocateSkip not changing allocation behavior

Re: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower) convergence for classical AMG (sequential and especially in parallel)

Re: [petsc-users] PETSc 3.14 to PETSc 3.20: Different (slower) convergence for classical AMG (sequential and especially in parallel)

Re: [petsc-users] 捕获

11 matches

Site Navigation

Mail list logo

Footer information