[petsc-users] 回复: Mat created by DMStag cannot access ghost points

2022-05-31 Thread Ye Changqing
Dear all,

[BugReport.c] is a sample code, [BugReportParallel.output] is the output when 
execute BugReport with mpiexec, [BugReportSerial.output] is the output in 
serial execution.

Best,
Changqing


发件人: Dave May 
发送时间: 2022年5月31日 22:55
收件人: Ye Changqing 
抄送: petsc-users@mcs.anl.gov 
主题: Re: [petsc-users] Mat created by DMStag cannot access ghost points



On Tue 31. May 2022 at 16:28, Ye Changqing 
mailto:ye_changq...@outlook.com>> wrote:
Dear developers of PETSc,

I encountered a problem when using the DMStag module. The program could be 
executed perfectly in serial, while errors are thrown out in parallel (using 
mpiexec). Some rows in Mat cannot be accessed in local processes when looping 
all elements in DMStag. The DM object I used only has one DOF in each element. 
Hence, I could switch to the DMDA module easily, and the program now is back to 
normal.

Some snippets are below.

Initialise a DMStag object:
PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, 
M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1, DMSTAG_STENCIL_BOX, 1, NULL, NULL, 
&(s_ctx->dm_P)));
Created a Mat:
PetscCall(DMCreateMatrix(s_ctx->dm_P, A));
Loop:
PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx, &ny, 
&nz, &extrax, &extray, &extraz));
for (ey = starty; ey < starty + ny; ++ey)
for (ex = startx; ex < startx + nx; ++ex)
{
...
PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2, &col[0], 
&val_A[0][0], ADD_VALUES));  // The traceback shows the problem is in here.
}

In addition to the code or MWE, please forward us the complete stack trace / 
error thrown to stdout.

Thanks,
Dave



Best,
Changqing

#include 

#define ELEMENT DMSTAG_ELEMENT

int main(int argc, char **argv)
{
PetscCall(PetscInitialize(&argc, &argv, (char *)0, NULL));
PetscInt M = 4, N = 4;
PetscScalar H_x = 1.0 / (double)M, H_y = 1.0 / (double)N, avg_kappa_e;
DM dm_P;
PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, 
DM_BOUNDARY_NONE, M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1, 
DMSTAG_STENCIL_BOX, 1, NULL, NULL, &dm_P));
// Only one DOF in the center of element.
PetscCall(DMSetUp(dm_P));
Mat A;
PetscCall(DMCreateMatrix(dm_P, &A));
PetscInt ex, ey, ez, startx, starty, startz, nx, ny, nz, extrax, extray, 
extraz;
PetscCall(DMStagGetCorners(dm_P, &startx, &starty, &startz, &nx, &ny, &nz, 
&extrax, &extray, &extraz));
PetscScalar meas_elem = H_x * H_y;
for (ey = starty; ey < starty + ny; ++ey)
for (ex = startx; ex < startx + nx; ++ex)
{
if (ex >= 1)
{
DMStagStencil row[2], col[2];
PetscScalar val_A[2][2];
row[0] = (DMStagStencil){.i = ex - 1, .j = ey, .c = 0, .loc = 
ELEMENT};
row[1] = (DMStagStencil){.i = ex, .j = ey, .c = 0, .loc = 
ELEMENT};
col[0] = (DMStagStencil){.i = ex - 1, .j = ey, .c = 0, .loc = 
ELEMENT};
col[1] = (DMStagStencil){.i = ex, .j = ey, .c = 0, .loc = 
ELEMENT};
avg_kappa_e = 1.0;
val_A[0][0] = H_y * H_y / meas_elem * avg_kappa_e;
val_A[0][1] = -H_y * H_y / meas_elem * avg_kappa_e;
val_A[1][0] = -H_y * H_y / meas_elem * avg_kappa_e;
val_A[1][1] = H_y * H_y / meas_elem * avg_kappa_e;
PetscCall(DMStagMatSetValuesStencil(dm_P, A, 2, &row[0], 2, 
&col[0], &val_A[0][0], ADD_VALUES));
}
if (ey >= 1)
{
DMStagStencil row[2], col[2];
PetscScalar val_A[2][2];
row[0] = (DMStagStencil){.i = ex, .j = ey - 1, .c = 0, .loc = 
ELEMENT};
row[1] = (DMStagStencil){.i = ex, .j = ey, .c = 0, .loc = 
ELEMENT};
col[0] = (DMStagStencil){.i = ex, .j = ey - 1, .c = 0, .loc = 
ELEMENT};
col[1] = (DMStagStencil){.i = ex, .j = ey, .c = 0, .loc = 
ELEMENT};
avg_kappa_e = 2.0;
val_A[0][0] = H_x * H_x / meas_elem * avg_kappa_e;
val_A[0][1] = -H_x * H_x / meas_elem * avg_kappa_e;
val_A[1][0] = H_x * H_x / meas_elem * avg_kappa_e;
val_A[1][1] = H_x * H_x / meas_elem * avg_kappa_e;
PetscCall(DMStagMatSetValuesStencil(dm_P, A, 2, &row[0], 2, 
&col[0], &val_A[0][0], ADD_VALUES));
}
}
PetscCall(MatAssemblyBegin(A, MAT_FINAL_ASSEMBLY));
PetscCall(MatAssemblyEnd(A, MAT_FINAL_ASSEMBLY));
PetscCall(PetscPrintf(PETSC_COMM_WORLD, "Successfully construct the 
Mat!\n"));
PetscCall(MatDestroy(&A));
PetscCall(DMDestroy(&dm_P));

return 0;
}

BugReportParallel.output
Description: BugReportParallel.output


BugReportSerial.output
Description: BugReportSerial.output


Re: [petsc-users] Accelerating eigenvalue computation / removing portion of spectrum

2022-05-31 Thread Jose E. Roman
Probably MUMPS is taking most of the time...

If the matrices are not too large, send them to my personal email and I will 
have a look.

Jose


> El 31 may 2022, a las 22:13, Lucas Banting  escribió:
> 
> Thanks for the sharing the article. 
> For my application, I think using an interval region to exclude the unneeded 
> eigenvalues will still be faster than forming a larger constrained system. 
> Specifying an interval appears to run in a similar amount of time.
> 
> Lucas
> From: Jose E. Roman 
> Sent: Tuesday, May 31, 2022 2:08 PM
> To: Lucas Banting 
> Cc: PETSc 
> Subject: Re: [petsc-users] Accelerating eigenvalue computation / removing 
> portion of spectrum
>  
> Caution: This message was sent from outside the University of Manitoba.
> 
> 
> Please respond to the list also.
> 
> The problem with EPSSetDeflationSpace() is that it internally orthogonalizes 
> the vectors that you pass in, so it is not viable for thousands of vectors.
> 
> You can try implementing any of the alternative schemes described in 
> https://doi.org/10.1002/nla.307
> 
> Another thing you can try is to use a region for filtering, as explained in 
> section 2.6.4 of the users manual. Use a region that excludes -1.0 and you 
> will have more chances to get the wanted eigenvalues faster. But still 
> convergence may be slow.
> 
> Jose
> 
> 
> > El 31 may 2022, a las 20:52, Lucas Banting  
> > escribió:
> >
> > Thanks for the response Jose,
> >
> > There is an analytical solution for these modes actually, however there are 
> > thousands of them and they are all sparse.
> > I assume it is a non-trivial thing for EPSSetDeflationSpace() to take 
> > something like a MATAIJ as input?
> >
> > Lucas
> > From: Jose E. Roman 
> > Sent: Tuesday, May 31, 2022 1:11 PM
> > To: Lucas Banting 
> > Cc: petsc-users@mcs.anl.gov 
> > Subject: Re: [petsc-users] Accelerating eigenvalue computation / removing 
> > portion of spectrum
> >
> > Caution: This message was sent from outside the University of Manitoba.
> >
> >
> > If you know how to cheaply compute a basis of the nullspace of S, then you 
> > can try passing it to the solver via 
> > EPSSetDeflationSpace()https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDeflationSpace.html
> >
> > Jose
> >
> >
> > > El 31 may 2022, a las 19:28, Lucas Banting  
> > > escribió:
> > >
> > > Hello,
> > >
> > > I have a general non hermitian eigenvalue problem arising from the 3D 
> > > helmholtz equation.
> > > The form of the helmholtz equaton is:
> > >
> > > (S - k^2M)v = lambda k^2 M v
> > >
> > > Where S is the stiffness/curl-curl matrix and M is the mass matrix 
> > > associated with edge elements used to discretize the problem.
> > > The helmholtz equation creates eigenvalues of -1.0, which I believe are 
> > > eigenvectors that are part of the null space of the curl-curl operator S.
> > >
> > > For my application, I would like to compute eigenvalues > -1.0, and avoid 
> > > computation of eigenvalues of -1.0.
> > > I am currently using shift invert ST with mumps LU direct solver. By 
> > > increasing the shift away from lambda=-1.0. I get faster computation of 
> > > eigenvectors, and the lambda=-1.0 eigenvectors appear to slow down the 
> > > computation by about a factor of two.
> > > Is there a way to avoid these lambda = -1.0 eigenpairs with a GNHEP 
> > > problem type?
> > >
> > > Regards,
> > > Lucas



Re: [petsc-users] Accelerating eigenvalue computation / removing portion of spectrum

2022-05-31 Thread Lucas Banting
Thanks for the sharing the article.
For my application, I think using an interval region to exclude the unneeded 
eigenvalues will still be faster than forming a larger constrained system. 
Specifying an interval appears to run in a similar amount of time.

Lucas

From: Jose E. Roman 
Sent: Tuesday, May 31, 2022 2:08 PM
To: Lucas Banting 
Cc: PETSc 
Subject: Re: [petsc-users] Accelerating eigenvalue computation / removing 
portion of spectrum

Caution: This message was sent from outside the University of Manitoba.


Please respond to the list also.

The problem with EPSSetDeflationSpace() is that it internally orthogonalizes 
the vectors that you pass in, so it is not viable for thousands of vectors.

You can try implementing any of the alternative schemes described in 
https://doi.org/10.1002/nla.307

Another thing you can try is to use a region for filtering, as explained in 
section 2.6.4 of the users manual. Use a region that excludes -1.0 and you will 
have more chances to get the wanted eigenvalues faster. But still convergence 
may be slow.

Jose


> El 31 may 2022, a las 20:52, Lucas Banting  escribió:
>
> Thanks for the response Jose,
>
> There is an analytical solution for these modes actually, however there are 
> thousands of them and they are all sparse.
> I assume it is a non-trivial thing for EPSSetDeflationSpace() to take 
> something like a MATAIJ as input?
>
> Lucas
> From: Jose E. Roman 
> Sent: Tuesday, May 31, 2022 1:11 PM
> To: Lucas Banting 
> Cc: petsc-users@mcs.anl.gov 
> Subject: Re: [petsc-users] Accelerating eigenvalue computation / removing 
> portion of spectrum
>
> Caution: This message was sent from outside the University of Manitoba.
>
>
> If you know how to cheaply compute a basis of the nullspace of S, then you 
> can try passing it to the solver via 
> EPSSetDeflationSpace()https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDeflationSpace.html
>
> Jose
>
>
> > El 31 may 2022, a las 19:28, Lucas Banting  
> > escribió:
> >
> > Hello,
> >
> > I have a general non hermitian eigenvalue problem arising from the 3D 
> > helmholtz equation.
> > The form of the helmholtz equaton is:
> >
> > (S - k^2M)v = lambda k^2 M v
> >
> > Where S is the stiffness/curl-curl matrix and M is the mass matrix 
> > associated with edge elements used to discretize the problem.
> > The helmholtz equation creates eigenvalues of -1.0, which I believe are 
> > eigenvectors that are part of the null space of the curl-curl operator S.
> >
> > For my application, I would like to compute eigenvalues > -1.0, and avoid 
> > computation of eigenvalues of -1.0.
> > I am currently using shift invert ST with mumps LU direct solver. By 
> > increasing the shift away from lambda=-1.0. I get faster computation of 
> > eigenvectors, and the lambda=-1.0 eigenvectors appear to slow down the 
> > computation by about a factor of two.
> > Is there a way to avoid these lambda = -1.0 eigenpairs with a GNHEP problem 
> > type?
> >
> > Regards,
> > Lucas



Re: [petsc-users] Accelerating eigenvalue computation / removing portion of spectrum

2022-05-31 Thread Jose E. Roman
Please respond to the list also.

The problem with EPSSetDeflationSpace() is that it internally orthogonalizes 
the vectors that you pass in, so it is not viable for thousands of vectors.

You can try implementing any of the alternative schemes described in 
https://doi.org/10.1002/nla.307

Another thing you can try is to use a region for filtering, as explained in 
section 2.6.4 of the users manual. Use a region that excludes -1.0 and you will 
have more chances to get the wanted eigenvalues faster. But still convergence 
may be slow.

Jose
 

> El 31 may 2022, a las 20:52, Lucas Banting  escribió:
> 
> Thanks for the response Jose,
> 
> There is an analytical solution for these modes actually, however there are 
> thousands of them and they are all sparse.
> I assume it is a non-trivial thing for EPSSetDeflationSpace() to take 
> something like a MATAIJ as input?
> 
> Lucas
> From: Jose E. Roman 
> Sent: Tuesday, May 31, 2022 1:11 PM
> To: Lucas Banting 
> Cc: petsc-users@mcs.anl.gov 
> Subject: Re: [petsc-users] Accelerating eigenvalue computation / removing 
> portion of spectrum
>  
> Caution: This message was sent from outside the University of Manitoba.
> 
> 
> If you know how to cheaply compute a basis of the nullspace of S, then you 
> can try passing it to the solver via 
> EPSSetDeflationSpace()https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDeflationSpace.html
> 
> Jose
> 
> 
> > El 31 may 2022, a las 19:28, Lucas Banting  
> > escribió:
> >
> > Hello,
> >
> > I have a general non hermitian eigenvalue problem arising from the 3D 
> > helmholtz equation.
> > The form of the helmholtz equaton is:
> >
> > (S - k^2M)v = lambda k^2 M v
> >
> > Where S is the stiffness/curl-curl matrix and M is the mass matrix 
> > associated with edge elements used to discretize the problem.
> > The helmholtz equation creates eigenvalues of -1.0, which I believe are 
> > eigenvectors that are part of the null space of the curl-curl operator S.
> >
> > For my application, I would like to compute eigenvalues > -1.0, and avoid 
> > computation of eigenvalues of -1.0.
> > I am currently using shift invert ST with mumps LU direct solver. By 
> > increasing the shift away from lambda=-1.0. I get faster computation of 
> > eigenvectors, and the lambda=-1.0 eigenvectors appear to slow down the 
> > computation by about a factor of two.
> > Is there a way to avoid these lambda = -1.0 eigenpairs with a GNHEP problem 
> > type?
> >
> > Regards,
> > Lucas



Re: [petsc-users] Question about DMPlexDistribute & distribute mesh over processes

2022-05-31 Thread Sami BEN ELHAJ SALAH
Thank you very much ! 

I have succeed to distribute the mesh by this using this routine. 

PetscDM dmdist = NULL; 
PetscPartitioner part; 
DMPlexGetPartitioner(dm_mesh, &part);
 PetscPartitionerSetType(part, PETSCPARTITIONERPARMETIS); 
PetscPartitionerSetFromOptions(part); 
DMPlexDistribute(dm_mesh, 0, NULL, &dmdist);
 if (dmdist)
 { 
DMDestroy(&dm_mesh); 
dm_mesh = dmdist; 
}
DMView(dm_mesh, PETSC_VIEWER_STDOUT_WORLD);

Th result is given as below:
DM Object: Parallel Mesh 4 MPI processes
  type: plex
Parallel Mesh in 3 dimensions:
  0-cells: 12 12 12 12
  3-cells: 2 2 2 2
Labels:
  depth: 2 strata with value/size (0 (12), 1 (2))
  material-id: 1 strata with value/size (68 (2))

Thank you a lot for your help and have a good day,

Sami

--
Dr. Sami BEN ELHAJ SALAH
Ingénieur de Recherche (CNRS)
Institut Pprime - ISAE - ENSMA
Mobile: 06.62.51.26.74
Email: sami.ben-elhaj-sa...@ensma.fr
www.samibenelhajsalah.com 




> Le 31 mai 2022 à 18:20, Matthew Knepley  a écrit :
> 
> On Tue, May 31, 2022 at 12:08 PM Sami BEN ELHAJ SALAH 
> mailto:sami.ben-elhaj-sa...@ensma.fr>> wrote:
> Hi Matthew,
> 
> Two question in this mail:
> 
> I tested my code with 2 processes.  I use gmsh mesh( example:  8 hexahedral 
> elements, 27 nodes, and Global Size of my jacobian matrix = 81 ). Then I used 
> DMView(dm_mesh, PETSC_VIEWER_STDOUT_WORLD) to visualize my DM and see if it 
> is correctly distributed over processes, so I got this:
> 
> (***)
> DM Object: DM_0x3_0 2 MPI processes
>   type: plex
> DM_0x3_0 in 3 dimensions:
>   0-cells: 27 0
>   3-cells: 8 0
> 
> Notice here that your whole mesh is on process 0. You would probably call 
> DMPlexDistribute() to rebalance it.
>  
> Labels:
>   material-id: 3 strata with value/size (66 (4), 67 (2), 68 (2))
>   depth: 2 strata with value/size (0 (27), 1 (8))
>   celltype: 2 strata with value/size (7 (8), 0 (27))
> 
> As you see, I created some material-id labels, so I found them over my 8 
> cells when using DMView. So it seems good to me. The SNES viewer is shown 
> below:
> 
> SNES Object: 2 MPI processes
>   type: ksponly
>   SNES has not been set up so information may be incomplete
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=0
>   norm schedule ALWAYS
> SNES Object: 2 MPI processes
>   type: ksponly
>   SNES has not been set up so information may be incomplete
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=0
>   norm schedule ALWAYS
>   SNESLineSearch Object: 2 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
>   SNESLineSearch Object: 2 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15, 
> lambda=1.00e-08
> maximum iterations=40
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15, 
> lambda=1.00e-08
> maximum iterations=40
>   KSP Object: 2 MPI processes
> type: gmres
>   restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> with no iterative refinement
>   KSP Object: 2 MPI processes
> type: gmres
>   restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
> with no iterative refinement
>   happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using DEFAULT norm type for convergence test
>   PC Object: 2 MPI processes
>   happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using DEFAULT norm type for convergence test
>   PC Object: 2 MPI processes
> type: bjacobi
> PC has not been set up so information may be incomplete
>   number of blocks = -1
>   Local solve is same for all blocks, in the following KSP and PC objects:
> linear system matrix = precond matrix:
> type: bjacobi
> PC has not been set up so information may be incomplete
>   number of blocks = -1
>   Local solve is same for all blocks, in the following KSP and PC objects:
> linear system matrix = precond matrix:
> Mat Object: 2 MPI processes
>   type: mpiaij
> Mat Object: 2 MPI processes
>   type: mpiaij
>   rows=81, cols=81, bs=3
>   rows=81, cols=81, bs=3
>   total: nonzeros=3087, allocated nonzeros=3087
>   total number of m

Re: [petsc-users] Accelerating eigenvalue computation / removing portion of spectrum

2022-05-31 Thread Jose E. Roman
If you know how to cheaply compute a basis of the nullspace of S, then you can 
try passing it to the solver via EPSSetDeflationSpace() 
https://slepc.upv.es/documentation/current/docs/manualpages/EPS/EPSSetDeflationSpace.html

Jose


> El 31 may 2022, a las 19:28, Lucas Banting  escribió:
> 
> Hello,
> 
> I have a general non hermitian eigenvalue problem arising from the 3D 
> helmholtz equation.
> The form of the helmholtz equaton is:
> 
> (S - k^2M)v = lambda k^2 M v
> 
> Where S is the stiffness/curl-curl matrix and M is the mass matrix associated 
> with edge elements used to discretize the problem.
> The helmholtz equation creates eigenvalues of -1.0, which I believe are 
> eigenvectors that are part of the null space of the curl-curl operator S. 
> 
> For my application, I would like to compute eigenvalues > -1.0, and avoid 
> computation of eigenvalues of -1.0.
> I am currently using shift invert ST with mumps LU direct solver. By 
> increasing the shift away from lambda=-1.0. I get faster computation of 
> eigenvectors, and the lambda=-1.0 eigenvectors appear to slow down the 
> computation by about a factor of two.
> Is there a way to avoid these lambda = -1.0 eigenpairs with a GNHEP problem 
> type?
> 
> Regards,
> Lucas



[petsc-users] Accelerating eigenvalue computation / removing portion of spectrum

2022-05-31 Thread Lucas Banting
Hello,

I have a general non hermitian eigenvalue problem arising from the 3D helmholtz 
equation.
The form of the helmholtz equaton is:

(S - k^2M)v = lambda k^2 M v

Where S is the stiffness/curl-curl matrix and M is the mass matrix associated 
with edge elements used to discretize the problem.
The helmholtz equation creates eigenvalues of -1.0, which I believe are 
eigenvectors that are part of the null space of the curl-curl operator S.

For my application, I would like to compute eigenvalues > -1.0, and avoid 
computation of eigenvalues of -1.0.
I am currently using shift invert ST with mumps LU direct solver. By increasing 
the shift away from lambda=-1.0. I get faster computation of eigenvectors, and 
the lambda=-1.0 eigenvectors appear to slow down the computation by about a 
factor of two.
Is there a way to avoid these lambda = -1.0 eigenpairs with a GNHEP problem 
type?

Regards,
Lucas




Re: [petsc-users] Question about DMPlexDistribute & distribute mesh over processes

2022-05-31 Thread Matthew Knepley
On Tue, May 31, 2022 at 12:08 PM Sami BEN ELHAJ SALAH <
sami.ben-elhaj-sa...@ensma.fr> wrote:

> Hi Matthew,
>
> *Two question in this mail:*
>
> I tested my code *with 2 processes*.  I use gmsh mesh( example:  8
> hexahedral elements, 27 nodes, and Global Size of my jacobian matrix = 81
> ). Then I used DMView(dm_mesh, PETSC_VIEWER_STDOUT_WORLD) to visualize my
> DM and see if it is correctly distributed over processes, so I got this:
>
> *(***)*
> *DM Object: DM_0x3_0 2 MPI processes*
> *  type: plex*
> *DM_0x3_0 in 3 dimensions:*
> *  0-cells: 27 0*
> *  3-cells: 8 0*
>

Notice here that your whole mesh is on process 0. You would probably call
DMPlexDistribute() to rebalance it.


> *Labels:*
> *  material-id: 3 strata with value/size (66 (4), 67 (2), 68 (2))*
> *  depth: 2 strata with value/size (0 (27), 1 (8))*
> *  celltype: 2 strata with value/size (7 (8), 0 (27))*
>
> As you see, I created some material-id labels, so I found them over my 8
> cells when using DMView. So it seems good to me. The SNES viewer is shown
> below:
>
> SNES Object: 2 MPI processes
>   type: ksponly
>   SNES has not been set up so information may be incomplete
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=0
>   norm schedule ALWAYS
> SNES Object: 2 MPI processes
>   type: ksponly
>   SNES has not been set up so information may be incomplete
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=0
>   norm schedule ALWAYS
>   SNESLineSearch Object: 2 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
>   SNESLineSearch Object: 2 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> maximum iterations=40
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> maximum iterations=40
>   KSP Object: 2 MPI processes
> type: gmres
>   restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>   KSP Object: 2 MPI processes
> type: gmres
>   restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>   happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using DEFAULT norm type for convergence test
>   PC Object: 2 MPI processes
>   happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using DEFAULT norm type for convergence test
>   PC Object: 2 MPI processes
> type: bjacobi
> PC has not been set up so information may be incomplete
>   number of blocks = -1
>   Local solve is same for all blocks, in the following KSP and PC
> objects:
> linear system matrix = precond matrix:
> type: bjacobi
> PC has not been set up so information may be incomplete
>   number of blocks = -1
>   Local solve is same for all blocks, in the following KSP and PC
> objects:
> linear system matrix = precond matrix:
> Mat Object: 2 MPI processes
>   type: mpiaij
> Mat Object: 2 MPI processes
>   type: mpiaij
>   rows=81, cols=81, bs=3
>   rows=81, cols=81, bs=3
>   total: nonzeros=3087, allocated nonzeros=3087
>   total number of mallocs used during MatSetValues calls=0
>   total: nonzeros=3087, allocated nonzeros=3087
>   total number of mallocs used during MatSetValues calls=0
> using I-node (on process 0) routines: found 27 nodes, limit used
> is 5
> not using I-node (on process 0) routines
>
> *Question 1:*
> *B**ased on the result given by DMView (see **(***)* *), **I didn't
> understand if my mesh is correctly distributed **? or my code is missing
> something ? because when I visualize the local and global sizes of my
> Jacobian matrix, I found*
>
> *PETSc::NonLinearSolver::INIT Size from Jac Matrix: M=81 m =0   //(M:
> global size, m: local size) this result is given by the proc 1*
> *and *
> *PETSc::NonLinearSolver::INIT Size from Jac Matrix: M=81 m =81 // **this
> result is given by the proc 2*
>

Yes, your mesh in only on process 0.


> Let me give other information:
> I create my jacobian matrix using:
> PetscErrorCode err = DMCreateMatrix(dm_mesh, &_matrix);
> *and I use the PetscSection to tell the DM to use this data layout.*
>
> In my code I wrote this line:
> DMSe

Re: [petsc-users] Question about DMPlexDistribute & distribute mesh over processes

2022-05-31 Thread Sami BEN ELHAJ SALAH
Hi Matthew,

Two question in this mail:

I tested my code with 2 processes.  I use gmsh mesh( example:  8 hexahedral 
elements, 27 nodes, and Global Size of my jacobian matrix = 81 ). Then I used 
DMView(dm_mesh, PETSC_VIEWER_STDOUT_WORLD) to visualize my DM and see if it is 
correctly distributed over processes, so I got this:

(***)
DM Object: DM_0x3_0 2 MPI processes
  type: plex
DM_0x3_0 in 3 dimensions:
  0-cells: 27 0
  3-cells: 8 0
Labels:
  material-id: 3 strata with value/size (66 (4), 67 (2), 68 (2))
  depth: 2 strata with value/size (0 (27), 1 (8))
  celltype: 2 strata with value/size (7 (8), 0 (27))

As you see, I created some material-id labels, so I found them over my 8 cells 
when using DMView. So it seems good to me. The SNES viewer is shown below:

SNES Object: 2 MPI processes
  type: ksponly
  SNES has not been set up so information may be incomplete
  maximum iterations=50, maximum function evaluations=1
  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
  total number of linear solver iterations=0
  total number of function evaluations=0
  norm schedule ALWAYS
SNES Object: 2 MPI processes
  type: ksponly
  SNES has not been set up so information may be incomplete
  maximum iterations=50, maximum function evaluations=1
  tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
  total number of linear solver iterations=0
  total number of function evaluations=0
  norm schedule ALWAYS
  SNESLineSearch Object: 2 MPI processes
type: bt
  interpolation: cubic
  alpha=1.00e-04
  SNESLineSearch Object: 2 MPI processes
type: bt
  interpolation: cubic
  alpha=1.00e-04
maxstep=1.00e+08, minlambda=1.00e-12
tolerances: relative=1.00e-08, absolute=1.00e-15, 
lambda=1.00e-08
maximum iterations=40
maxstep=1.00e+08, minlambda=1.00e-12
tolerances: relative=1.00e-08, absolute=1.00e-15, 
lambda=1.00e-08
maximum iterations=40
  KSP Object: 2 MPI processes
type: gmres
  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
with no iterative refinement
  KSP Object: 2 MPI processes
type: gmres
  restart=30, using Classical (unmodified) Gram-Schmidt Orthogonalization 
with no iterative refinement
  happy breakdown tolerance 1e-30
maximum iterations=1, initial guess is zero
tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
left preconditioning
using DEFAULT norm type for convergence test
  PC Object: 2 MPI processes
  happy breakdown tolerance 1e-30
maximum iterations=1, initial guess is zero
tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
left preconditioning
using DEFAULT norm type for convergence test
  PC Object: 2 MPI processes
type: bjacobi
PC has not been set up so information may be incomplete
  number of blocks = -1
  Local solve is same for all blocks, in the following KSP and PC objects:
linear system matrix = precond matrix:
type: bjacobi
PC has not been set up so information may be incomplete
  number of blocks = -1
  Local solve is same for all blocks, in the following KSP and PC objects:
linear system matrix = precond matrix:
Mat Object: 2 MPI processes
  type: mpiaij
Mat Object: 2 MPI processes
  type: mpiaij
  rows=81, cols=81, bs=3
  rows=81, cols=81, bs=3
  total: nonzeros=3087, allocated nonzeros=3087
  total number of mallocs used during MatSetValues calls=0
  total: nonzeros=3087, allocated nonzeros=3087
  total number of mallocs used during MatSetValues calls=0
using I-node (on process 0) routines: found 27 nodes, limit used is 5
not using I-node (on process 0) routines

Question 1:
Based on the result given by DMView (see (***) ), I didn't understand if my 
mesh is correctly distributed ? or my code is missing something ? because when 
I visualize the local and global sizes of my Jacobian matrix, I found

PETSc::NonLinearSolver::INIT Size from Jac Matrix: M=81 m =0   //(M: global 
size, m: local size) this result is given by the proc 1
and 
PETSc::NonLinearSolver::INIT Size from Jac Matrix: M=81 m =81 // this result is 
given by the proc 2


Let me give other information:
I create my jacobian matrix using:
PetscErrorCode err = DMCreateMatrix(dm_mesh, &_matrix);
and I use the PetscSection to tell the DM to use this data layout.

In my code I wrote this line: 
DMSetFromOptions(dm_mesh);
Then, to run my code I use
mpirun -np 2 /home/benelhasa/fox_petsc/build_test/bin/Debug/FoXtroT 
-snes_test_jacobian_view -snes_converged_reason -snes_monitor -ksp_monitor 
-ksp_xmonitor -dm_view -petscpartitioner_type parmetis -dm_distribute 
-dm_refine 0 cub_8C3D8.fxt

Question 2:
I have another question, just for the comprehension. I understand that 
DMSetFromOptions(dm_mesh) allows me to use the parameter dm_distribute but I 
didn't understand how I can use -petscpartitioner_type parmet

Re: [petsc-users] Mat created by DMStag cannot access ghost points

2022-05-31 Thread Dave May
On Tue 31. May 2022 at 16:28, Ye Changqing  wrote:

> Dear developers of PETSc,
>
> I encountered a problem when using the DMStag module. The program could be
> executed perfectly in serial, while errors are thrown out in parallel
> (using mpiexec). Some rows in Mat cannot be accessed in local processes
> when looping all elements in DMStag. The DM object I used only has one DOF
> in each element. Hence, I could switch to the DMDA module easily, and the
> program now is back to normal.
>
> Some snippets are below.
>
> Initialise a DMStag object:
> PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE,
> DM_BOUNDARY_NONE, M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1,
> DMSTAG_STENCIL_BOX, 1, NULL, NULL, &(s_ctx->dm_P)));
> Created a Mat:
> PetscCall(DMCreateMatrix(s_ctx->dm_P, A));
> Loop:
> PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx,
> &ny, &nz, &extrax, &extray, &extraz));
> for (ey = starty; ey < starty + ny; ++ey)
> for (ex = startx; ex < startx + nx; ++ex)
> {
> ...
> PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2,
> &col[0], &val_A[0][0], ADD_VALUES));  // The traceback shows the problem is
> in here.
> }
>

In addition to the code or MWE, please forward us the complete stack trace
/ error thrown to stdout.

Thanks,
Dave



> Best,
> Changqing
>
>


Re: [petsc-users] Mat created by DMStag cannot access ghost points

2022-05-31 Thread Matthew Knepley
On Tue, May 31, 2022 at 10:28 AM Ye Changqing 
wrote:

> Dear developers of PETSc,
>
> I encountered a problem when using the DMStag module. The program could be
> executed perfectly in serial, while errors are thrown out in parallel
> (using mpiexec). Some rows in Mat cannot be accessed in local processes
> when looping all elements in DMStag. The DM object I used only has one DOF
> in each element. Hence, I could switch to the DMDA module easily, and the
> program now is back to normal.
>

Is it possible to send a simple example? We have parallel examples with
DMStag in PETSc, just from this description I cannot see what is going
wrong for you.

  Thanks,

 Matt


> Some snippets are below.
>
> Initialise a DMStag object:
> PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE,
> DM_BOUNDARY_NONE, M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1,
> DMSTAG_STENCIL_BOX, 1, NULL, NULL, &(s_ctx->dm_P)));
> Created a Mat:
> PetscCall(DMCreateMatrix(s_ctx->dm_P, A));
> Loop:
> PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx,
> &ny, &nz, &extrax, &extray, &extraz));
> for (ey = starty; ey < starty + ny; ++ey)
> for (ex = startx; ex < startx + nx; ++ex)
> {
> ...
> PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2,
> &col[0], &val_A[0][0], ADD_VALUES));  // The traceback shows the problem is
> in here.
> }
>
> Best,
> Changqing
>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] PetscSF Object on Distributed DMPlex for Halo Data Exchange

2022-05-31 Thread Matthew Knepley
On Tue, May 31, 2022 at 10:26 AM Mike Michell  wrote:

> Thank you. But, it seems that PetscSFCreateSectionSF() also requires
> petscsf.h file. Which header file I should include to call
> PetscSFCreateSectionSF() from Fortran?
>

I will have to write a binding. I will send you the MR when I finish.

  THanks,

Matt


> Thanks,
> Mike
>
>
> On Tue, May 31, 2022 at 10:04 AM Mike Michell 
>> wrote:
>>
>>> As a follow-up question on your example, is it possible to call
>>> PetscSFCreateRemoteOffsets() from Fortran?
>>>
>>> My code is written in .F90 and in "petsc/finclude/" there is no
>>> petscsf.h so that the code currently cannot find
>>> PetscSFCreateRemoteOffsets().
>>>
>>
>> I believe if you pass in NULL for remoteOffsets, that function will be
>> called internally.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Thanks,
>>> Mike
>>>
>>>
>>> I will also point out that Toby has created a nice example showing how
 to create an SF for halo exchange between local vectors.

   https://gitlab.com/petsc/petsc/-/merge_requests/5267

   Thanks,

  Matt

 On Sun, May 22, 2022 at 9:47 PM Matthew Knepley 
 wrote:

> On Sun, May 22, 2022 at 4:28 PM Mike Michell 
> wrote:
>
>> Thanks for the reply. The diagram makes sense and is helpful for
>> understanding 1D representation.
>>
>> However, something is still unclear. From your diagram, the number of
>> roots per process seems to vary according to run arguments, such as
>> "-dm_distribute_overlap", because "the number of roots for a DMPlex is 
>> the
>> number of mesh points in the local portion of the mesh (cited from your
>> answer to my question (1))" will end up change according to that 
>> argument.
>> However, from my mock-up code, number of roots is independent to
>> -dm_distribute_overlap argument. The summation of "number of roots" 
>> through
>> processes was always equal to number of physical vertex on my mesh, if I
>> define the section layout on vertex with 1DOF. But in your diagram 
>> example,
>> the summation of "nroots" is larger than the actual number of mesh 
>> points,
>> which is 13.
>>
>
> I do not understand your question. Notice the -dm_distribute_overlap
> does _not_ change the owned points for any process. It only puts in new
> leaves, so it also never
> changes the roots for this way of using the SF.
>
>
>> Also, it is still unclear how to get the size of "roots" from the
>> PetscSection & PetscSF on distributed DMPlex?
>>
>
> For an SF mapping ghost dofs in a global vector, the number of roots
> is just the size of the local portion of the vector.
>
>
>> In your diagram, how can you tell your code and make it allocate the
>> "nroots=7 for P0, nroots=9 for P1, and nroots=7 for P2" arrays before you
>> call PetscSFBcastBegin/End()? It seems that we need to define arrays 
>> having
>> the size of nroots & nleaves before calling PetscSFBcastBegin/End().
>>
>
> I just want to note that this usage is different from the canonical
> usage in Plex. It is fine to do this, but this will not match what I do in
> the library if you look.
> In Plex, I distinguish two linear spaces:
>
>   1) Global space: This is the vector space for the solvers. Each
> point is uniquely represented and owned by some process
>
>   2) Local space: This is the vector space for assembly. Some points
> are represented multiple times.
>
> I create an SF that maps from the global space (roots) to the local
> space (leaves), and it is called in DMGlobalToLocal() (and
> associated functions). This
> is more natural in FEM. You seem to want an SF that maps between
> global vectors. This will also work. The roots would be the local dofs, 
> and
> the leaves
> would be shared dofs.
>
>   Does this make sense?
>
>  Thanks,
>
>Matt
>
>
>> Thanks,
>> Mike
>>
>> Here's a diagram of a 1D mesh with overlap and 3 partitions, showing
>>> what the petscsf data is for each.  The number of roots is the number of
>>> mesh points in the local representation, and the number of leaves is the
>>> number of mesh points that are duplicates of mesh points on other
>>> processes.  With that in mind, answering your questions
>>>
>>> > (1) It seems that the "roots" means the number of vertex not
>>> considering overlap layer, and "leaves" seems the number of distributed
>>> vertex for each processor that includes overlap layer. Can you 
>>> acknowledge
>>> that this is correct understanding? I have tried to find clearer 
>>> examples
>>> from PETSc team's articles relevant to Star Forest, but I am still 
>>> unclear
>>> about the exact relation & graphical notation of roots & leaves in SF if
>

[petsc-users] Mat created by DMStag cannot access ghost points

2022-05-31 Thread Ye Changqing
Dear developers of PETSc,

I encountered a problem when using the DMStag module. The program could be 
executed perfectly in serial, while errors are thrown out in parallel (using 
mpiexec). Some rows in Mat cannot be accessed in local processes when looping 
all elements in DMStag. The DM object I used only has one DOF in each element. 
Hence, I could switch to the DMDA module easily, and the program now is back to 
normal.

Some snippets are below.

Initialise a DMStag object:
PetscCall(DMStagCreate2d(PETSC_COMM_WORLD, DM_BOUNDARY_NONE, DM_BOUNDARY_NONE, 
M, N, PETSC_DECIDE, PETSC_DECIDE, 0, 0, 1, DMSTAG_STENCIL_BOX, 1, NULL, NULL, 
&(s_ctx->dm_P)));
Created a Mat:
PetscCall(DMCreateMatrix(s_ctx->dm_P, A));
Loop:
PetscCall(DMStagGetCorners(s_ctx->dm_V, &startx, &starty, &startz, &nx, &ny, 
&nz, &extrax, &extray, &extraz));
for (ey = starty; ey < starty + ny; ++ey)
for (ex = startx; ex < startx + nx; ++ex)
{
...
PetscCall(DMStagMatSetValuesStencil(s_ctx->dm_P, *A, 2, &row[0], 2, &col[0], 
&val_A[0][0], ADD_VALUES));  // The traceback shows the problem is in here.
}

Best,
Changqing



Re: [petsc-users] PetscSF Object on Distributed DMPlex for Halo Data Exchange

2022-05-31 Thread Mike Michell
Thank you. But, it seems that PetscSFCreateSectionSF() also requires
petscsf.h file. Which header file I should include to call
PetscSFCreateSectionSF() from Fortran?

Thanks,
Mike


On Tue, May 31, 2022 at 10:04 AM Mike Michell  wrote:
>
>> As a follow-up question on your example, is it possible to call
>> PetscSFCreateRemoteOffsets() from Fortran?
>>
>> My code is written in .F90 and in "petsc/finclude/" there is no petscsf.h
>> so that the code currently cannot find PetscSFCreateRemoteOffsets().
>>
>
> I believe if you pass in NULL for remoteOffsets, that function will be
> called internally.
>
>   Thanks,
>
>  Matt
>
>
>> Thanks,
>> Mike
>>
>>
>> I will also point out that Toby has created a nice example showing how to
>>> create an SF for halo exchange between local vectors.
>>>
>>>   https://gitlab.com/petsc/petsc/-/merge_requests/5267
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>> On Sun, May 22, 2022 at 9:47 PM Matthew Knepley 
>>> wrote:
>>>
 On Sun, May 22, 2022 at 4:28 PM Mike Michell 
 wrote:

> Thanks for the reply. The diagram makes sense and is helpful for
> understanding 1D representation.
>
> However, something is still unclear. From your diagram, the number of
> roots per process seems to vary according to run arguments, such as
> "-dm_distribute_overlap", because "the number of roots for a DMPlex is the
> number of mesh points in the local portion of the mesh (cited from your
> answer to my question (1))" will end up change according to that argument.
> However, from my mock-up code, number of roots is independent to
> -dm_distribute_overlap argument. The summation of "number of roots" 
> through
> processes was always equal to number of physical vertex on my mesh, if I
> define the section layout on vertex with 1DOF. But in your diagram 
> example,
> the summation of "nroots" is larger than the actual number of mesh points,
> which is 13.
>

 I do not understand your question. Notice the -dm_distribute_overlap
 does _not_ change the owned points for any process. It only puts in new
 leaves, so it also never
 changes the roots for this way of using the SF.


> Also, it is still unclear how to get the size of "roots" from the
> PetscSection & PetscSF on distributed DMPlex?
>

 For an SF mapping ghost dofs in a global vector, the number of roots is
 just the size of the local portion of the vector.


> In your diagram, how can you tell your code and make it allocate the
> "nroots=7 for P0, nroots=9 for P1, and nroots=7 for P2" arrays before you
> call PetscSFBcastBegin/End()? It seems that we need to define arrays 
> having
> the size of nroots & nleaves before calling PetscSFBcastBegin/End().
>

 I just want to note that this usage is different from the canonical
 usage in Plex. It is fine to do this, but this will not match what I do in
 the library if you look.
 In Plex, I distinguish two linear spaces:

   1) Global space: This is the vector space for the solvers. Each point
 is uniquely represented and owned by some process

   2) Local space: This is the vector space for assembly. Some points
 are represented multiple times.

 I create an SF that maps from the global space (roots) to the local
 space (leaves), and it is called in DMGlobalToLocal() (and
 associated functions). This
 is more natural in FEM. You seem to want an SF that maps between global
 vectors. This will also work. The roots would be the local dofs, and the
 leaves
 would be shared dofs.

   Does this make sense?

  Thanks,

Matt


> Thanks,
> Mike
>
> Here's a diagram of a 1D mesh with overlap and 3 partitions, showing
>> what the petscsf data is for each.  The number of roots is the number of
>> mesh points in the local representation, and the number of leaves is the
>> number of mesh points that are duplicates of mesh points on other
>> processes.  With that in mind, answering your questions
>>
>> > (1) It seems that the "roots" means the number of vertex not
>> considering overlap layer, and "leaves" seems the number of distributed
>> vertex for each processor that includes overlap layer. Can you 
>> acknowledge
>> that this is correct understanding? I have tried to find clearer examples
>> from PETSc team's articles relevant to Star Forest, but I am still 
>> unclear
>> about the exact relation & graphical notation of roots & leaves in SF if
>> it's the case of DMPlex solution arrays.
>>
>> No, the number of roots for a DMPlex is the number of mesh points in
>> the local portion of the mesh
>>
>> > (2) If it is so, there is an issue that I cannot define "root data"
>> and "leave data" generally. I am trying to following
>> "src/vec/i

Re: [petsc-users] PetscSF Object on Distributed DMPlex for Halo Data Exchange

2022-05-31 Thread Mike Michell
As a follow-up question on your example, is it possible to call
PetscSFCreateRemoteOffsets() from Fortran?

My code is written in .F90 and in "petsc/finclude/" there is no petscsf.h
so that the code currently cannot find PetscSFCreateRemoteOffsets().

Thanks,
Mike

2022년 5월 24일 (화) 오후 8:46, Matthew Knepley 님이 작성:

> I will also point out that Toby has created a nice example showing how to
> create an SF for halo exchange between local vectors.
>
>   https://gitlab.com/petsc/petsc/-/merge_requests/5267
>
>   Thanks,
>
>  Matt
>
> On Sun, May 22, 2022 at 9:47 PM Matthew Knepley  wrote:
>
>> On Sun, May 22, 2022 at 4:28 PM Mike Michell 
>> wrote:
>>
>>> Thanks for the reply. The diagram makes sense and is helpful for
>>> understanding 1D representation.
>>>
>>> However, something is still unclear. From your diagram, the number of
>>> roots per process seems to vary according to run arguments, such as
>>> "-dm_distribute_overlap", because "the number of roots for a DMPlex is the
>>> number of mesh points in the local portion of the mesh (cited from your
>>> answer to my question (1))" will end up change according to that argument.
>>> However, from my mock-up code, number of roots is independent to
>>> -dm_distribute_overlap argument. The summation of "number of roots" through
>>> processes was always equal to number of physical vertex on my mesh, if I
>>> define the section layout on vertex with 1DOF. But in your diagram example,
>>> the summation of "nroots" is larger than the actual number of mesh points,
>>> which is 13.
>>>
>>
>> I do not understand your question. Notice the -dm_distribute_overlap does
>> _not_ change the owned points for any process. It only puts in new leaves,
>> so it also never
>> changes the roots for this way of using the SF.
>>
>>
>>> Also, it is still unclear how to get the size of "roots" from the
>>> PetscSection & PetscSF on distributed DMPlex?
>>>
>>
>> For an SF mapping ghost dofs in a global vector, the number of roots is
>> just the size of the local portion of the vector.
>>
>>
>>> In your diagram, how can you tell your code and make it allocate the
>>> "nroots=7 for P0, nroots=9 for P1, and nroots=7 for P2" arrays before you
>>> call PetscSFBcastBegin/End()? It seems that we need to define arrays having
>>> the size of nroots & nleaves before calling PetscSFBcastBegin/End().
>>>
>>
>> I just want to note that this usage is different from the canonical usage
>> in Plex. It is fine to do this, but this will not match what I do in the
>> library if you look.
>> In Plex, I distinguish two linear spaces:
>>
>>   1) Global space: This is the vector space for the solvers. Each point
>> is uniquely represented and owned by some process
>>
>>   2) Local space: This is the vector space for assembly. Some points are
>> represented multiple times.
>>
>> I create an SF that maps from the global space (roots) to the local space
>> (leaves), and it is called in DMGlobalToLocal() (and associated functions).
>> This
>> is more natural in FEM. You seem to want an SF that maps between global
>> vectors. This will also work. The roots would be the local dofs, and the
>> leaves
>> would be shared dofs.
>>
>>   Does this make sense?
>>
>>  Thanks,
>>
>>Matt
>>
>>
>>> Thanks,
>>> Mike
>>>
>>> Here's a diagram of a 1D mesh with overlap and 3 partitions, showing
 what the petscsf data is for each.  The number of roots is the number of
 mesh points in the local representation, and the number of leaves is the
 number of mesh points that are duplicates of mesh points on other
 processes.  With that in mind, answering your questions

 > (1) It seems that the "roots" means the number of vertex not
 considering overlap layer, and "leaves" seems the number of distributed
 vertex for each processor that includes overlap layer. Can you acknowledge
 that this is correct understanding? I have tried to find clearer examples
 from PETSc team's articles relevant to Star Forest, but I am still unclear
 about the exact relation & graphical notation of roots & leaves in SF if
 it's the case of DMPlex solution arrays.

 No, the number of roots for a DMPlex is the number of mesh points in
 the local portion of the mesh

 > (2) If it is so, there is an issue that I cannot define "root data"
 and "leave data" generally. I am trying to following
 "src/vec/is/sf/tutorials/ex1f.F90", however, in that example, size of roots
 and leaves are predefined as 6. How can I generalize that? Because I can
 get size of leaves using DAG depth(or height), which is equal to number of
 vertices each proc has. But, how can I get the size of my "roots" region
 from SF? Any example about that? This question is connected to how can I
 define "rootdata" for "PetscSFBcastBegin/End()".

 Does the diagram help you generalize?

 > (3) More importantly, with the attached PetscSection & SF layout, my
 vector is only

Re: [petsc-users] PetscSF Object on Distributed DMPlex for Halo Data Exchange

2022-05-31 Thread Matthew Knepley
On Tue, May 31, 2022 at 10:04 AM Mike Michell  wrote:

> As a follow-up question on your example, is it possible to call
> PetscSFCreateRemoteOffsets() from Fortran?
>
> My code is written in .F90 and in "petsc/finclude/" there is no petscsf.h
> so that the code currently cannot find PetscSFCreateRemoteOffsets().
>

I believe if you pass in NULL for remoteOffsets, that function will be
called internally.

  Thanks,

 Matt


> Thanks,
> Mike
>
> 2022년 5월 24일 (화) 오후 8:46, Matthew Knepley 님이 작성:
>
>> I will also point out that Toby has created a nice example showing how to
>> create an SF for halo exchange between local vectors.
>>
>>   https://gitlab.com/petsc/petsc/-/merge_requests/5267
>>
>>   Thanks,
>>
>>  Matt
>>
>> On Sun, May 22, 2022 at 9:47 PM Matthew Knepley 
>> wrote:
>>
>>> On Sun, May 22, 2022 at 4:28 PM Mike Michell 
>>> wrote:
>>>
 Thanks for the reply. The diagram makes sense and is helpful for
 understanding 1D representation.

 However, something is still unclear. From your diagram, the number of
 roots per process seems to vary according to run arguments, such as
 "-dm_distribute_overlap", because "the number of roots for a DMPlex is the
 number of mesh points in the local portion of the mesh (cited from your
 answer to my question (1))" will end up change according to that argument.
 However, from my mock-up code, number of roots is independent to
 -dm_distribute_overlap argument. The summation of "number of roots" through
 processes was always equal to number of physical vertex on my mesh, if I
 define the section layout on vertex with 1DOF. But in your diagram example,
 the summation of "nroots" is larger than the actual number of mesh points,
 which is 13.

>>>
>>> I do not understand your question. Notice the -dm_distribute_overlap
>>> does _not_ change the owned points for any process. It only puts in new
>>> leaves, so it also never
>>> changes the roots for this way of using the SF.
>>>
>>>
 Also, it is still unclear how to get the size of "roots" from the
 PetscSection & PetscSF on distributed DMPlex?

>>>
>>> For an SF mapping ghost dofs in a global vector, the number of roots is
>>> just the size of the local portion of the vector.
>>>
>>>
 In your diagram, how can you tell your code and make it allocate the
 "nroots=7 for P0, nroots=9 for P1, and nroots=7 for P2" arrays before you
 call PetscSFBcastBegin/End()? It seems that we need to define arrays having
 the size of nroots & nleaves before calling PetscSFBcastBegin/End().

>>>
>>> I just want to note that this usage is different from the canonical
>>> usage in Plex. It is fine to do this, but this will not match what I do in
>>> the library if you look.
>>> In Plex, I distinguish two linear spaces:
>>>
>>>   1) Global space: This is the vector space for the solvers. Each point
>>> is uniquely represented and owned by some process
>>>
>>>   2) Local space: This is the vector space for assembly. Some points are
>>> represented multiple times.
>>>
>>> I create an SF that maps from the global space (roots) to the local
>>> space (leaves), and it is called in DMGlobalToLocal() (and
>>> associated functions). This
>>> is more natural in FEM. You seem to want an SF that maps between global
>>> vectors. This will also work. The roots would be the local dofs, and the
>>> leaves
>>> would be shared dofs.
>>>
>>>   Does this make sense?
>>>
>>>  Thanks,
>>>
>>>Matt
>>>
>>>
 Thanks,
 Mike

 Here's a diagram of a 1D mesh with overlap and 3 partitions, showing
> what the petscsf data is for each.  The number of roots is the number of
> mesh points in the local representation, and the number of leaves is the
> number of mesh points that are duplicates of mesh points on other
> processes.  With that in mind, answering your questions
>
> > (1) It seems that the "roots" means the number of vertex not
> considering overlap layer, and "leaves" seems the number of distributed
> vertex for each processor that includes overlap layer. Can you acknowledge
> that this is correct understanding? I have tried to find clearer examples
> from PETSc team's articles relevant to Star Forest, but I am still unclear
> about the exact relation & graphical notation of roots & leaves in SF if
> it's the case of DMPlex solution arrays.
>
> No, the number of roots for a DMPlex is the number of mesh points in
> the local portion of the mesh
>
> > (2) If it is so, there is an issue that I cannot define "root data"
> and "leave data" generally. I am trying to following
> "src/vec/is/sf/tutorials/ex1f.F90", however, in that example, size of 
> roots
> and leaves are predefined as 6. How can I generalize that? Because I can
> get size of leaves using DAG depth(or height), which is equal to number of
> vertices each proc has. But, how can I get the size o

Re: [petsc-users] How to read and write HDF5 in parallel withPETSc

2022-05-31 Thread Matthew Knepley
On Tue, May 31, 2022 at 1:02 AM 冯宏磊 <12132...@mail.sustech.edu.cn> wrote:

> my code is below:
> ierr = PetscViewerCreate(PETSC_COMM_WORLD,&h5);CHKERRQ(ierr);
> ierr = PetscViewerHDF5Open(PETSC_COMM_WORLD,"explicit.h5",
> FILE_MODE_WRITE, &h5);CHKERRQ(ierr);
> ierr = PetscObjectSetName((PetscObject) z,
> "explicit-vector");CHKERRQ(ierr);
> ierr = PetscObjectSetName((PetscObject) tem,
> "explicit-necess-data");CHKERRQ(ierr);
> ierr = VecView(tem, h5);CHKERRQ(ierr);
> ierr = VecView(z, h5);CHKERRQ(ierr);
> ierr = PetscViewerDestroy(&h5);CHKERRQ(ierr);
>
> when I use 1 core run this, it can save the right answer.
>
> when I use 3 cores run this, it prints that :
> [0]PETSC ERROR: No support for this operation for this object type
> [0]PETSC ERROR: Cannot use parallel HDF5 viewer since the given HDF5 does
> not support parallel I/O (H5_HAVE_PARALLEL is unset)
>

If you want parallel HDF5, the package has to be compiled for parallelism.
One way to do this is to have PETSc build it for you:

   --download-hdf5

in configure.

  Thanks,

 Matt


> I don't know how to solve this problem.
>
> By the way, I use PETSc-3.16.6, HDF5-1.12.1, C for programming.
>
>
>
>
>
> 冯宏磊
>
> 南方科技大学/研究生/研究生2021级
>
> 广东省深圳市南山区学苑大道1088号
>
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] Sparse linear system solving

2022-05-31 Thread Matthew Knepley
I have looked at the local logs. First, you have run problems of size 12
and 24. As a rule of thumb, you need 10,000
variables per process in order to see good speedup.

  Thanks,

 Matt

On Tue, May 31, 2022 at 8:19 AM Matthew Knepley  wrote:

> On Tue, May 31, 2022 at 7:39 AM Lidia  wrote:
>
>> Matt, Mark, thank you much for your answers!
>>
>>
>> Now we have run example # 5 on our computer cluster and on the local
>> server and also have not seen any performance increase, but by unclear
>> reason running times on the local server are much better than on the
>> cluster.
>>
> I suspect that you are trying to get speedup without increasing the memory
> bandwidth:
>
>
> https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup
>
>   Thanks,
>
>  Matt
>
>> Now we will try to run petsc #5 example inside a docker container on our
>> server and see if the problem is in our environment. I'll write you the
>> results of this test as soon as we get it.
>>
>> The ksp_monitor outs for the 5th test at the current local server
>> configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3
>> mpi processes) are attached .
>>
>>
>> And one more question. Potentially we can use 10 nodes and 96 threads at
>> each node on our cluster. What do you think, which combination of numbers
>> of mpi processes and openmp threads may be the best for the 5th example?
>>
>> Thank you!
>>
>>
>> Best,
>> Lidiia
>>
>> On 31.05.2022 05:42, Mark Adams wrote:
>>
>> And if you see "NO" change in performance I suspect the solver/matrix is
>> all on one processor.
>> (PETSc does not use threads by default so threads should not change
>> anything).
>>
>> As Matt said, it is best to start with a PETSc example that does
>> something like what you want (parallel linear solve, see
>> src/ksp/ksp/tutorials for examples), and then add your code to it.
>> That way you get the basic infrastructure in place for you, which is
>> pretty obscure to the uninitiated.
>>
>> Mark
>>
>> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley 
>> wrote:
>>
>>> On Mon, May 30, 2022 at 10:12 PM Lidia 
>>> wrote:
>>>
 Dear colleagues,

 Is here anyone who have solved big sparse linear matrices using PETSC?

>>>
>>> There are lots of publications with this kind of data. Here is one
>>> recent one: https://arxiv.org/abs/2204.01722
>>>
>>>
 We have found NO performance improvement while using more and more mpi
 processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did
 anyone
 faced to this problem? Does anyone know any possible reasons of such
 behaviour?

>>>
>>> Solver behavior is dependent on the input matrix. The only
>>> general-purpose solvers
>>> are direct, but they do not scale linearly and have high memory
>>> requirements.
>>>
>>> Thus, in order to make progress you will have to be specific about your
>>> matrices.
>>>
>>>
 We use AMG preconditioner and GMRES solver from KSP package, as our
 matrix is large (from 100 000 to 1e+6 rows and columns), sparse,
 non-symmetric and includes both positive and negative values. But
 performance problems also exist while using CG solvers with symmetric
 matrices.

>>>
>>> There are many PETSc examples, such as example 5 for the Laplacian, that
>>> exhibit
>>> good scaling with both AMG and GMG.
>>>
>>>
 Could anyone help us to set appropriate options of the preconditioner
 and solver? Now we use default parameters, maybe they are not the best,
 but we do not know a good combination. Or maybe you could suggest any
 other pairs of preconditioner+solver for such tasks?

 I can provide more information: the matrices that we solve, c++ script
 to run solving using petsc and any statistics obtained by our runs.

>>>
>>> First, please provide a description of the linear system, and the output
>>> of
>>>
>>>   -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view
>>>
>>> for each test case.
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
 Thank you in advance!

 Best regards,
 Lidiia Varshavchik,
 Ioffe Institute, St. Petersburg, Russia

>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>> https://www.cse.buffalo.edu/~knepley/
>>> 
>>>
>>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> 
>


-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffa

Re: [petsc-users] Sparse linear system solving

2022-05-31 Thread Matthew Knepley
On Tue, May 31, 2022 at 7:39 AM Lidia  wrote:

> Matt, Mark, thank you much for your answers!
>
>
> Now we have run example # 5 on our computer cluster and on the local
> server and also have not seen any performance increase, but by unclear
> reason running times on the local server are much better than on the
> cluster.
>
I suspect that you are trying to get speedup without increasing the memory
bandwidth:


https://petsc.org/main/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup

  Thanks,

 Matt

> Now we will try to run petsc #5 example inside a docker container on our
> server and see if the problem is in our environment. I'll write you the
> results of this test as soon as we get it.
>
> The ksp_monitor outs for the 5th test at the current local server
> configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 3
> mpi processes) are attached .
>
>
> And one more question. Potentially we can use 10 nodes and 96 threads at
> each node on our cluster. What do you think, which combination of numbers
> of mpi processes and openmp threads may be the best for the 5th example?
>
> Thank you!
>
>
> Best,
> Lidiia
>
> On 31.05.2022 05:42, Mark Adams wrote:
>
> And if you see "NO" change in performance I suspect the solver/matrix is
> all on one processor.
> (PETSc does not use threads by default so threads should not change
> anything).
>
> As Matt said, it is best to start with a PETSc example that does something
> like what you want (parallel linear solve, see src/ksp/ksp/tutorials for
> examples), and then add your code to it.
> That way you get the basic infrastructure in place for you, which is
> pretty obscure to the uninitiated.
>
> Mark
>
> On Mon, May 30, 2022 at 10:18 PM Matthew Knepley 
> wrote:
>
>> On Mon, May 30, 2022 at 10:12 PM Lidia  wrote:
>>
>>> Dear colleagues,
>>>
>>> Is here anyone who have solved big sparse linear matrices using PETSC?
>>>
>>
>> There are lots of publications with this kind of data. Here is one recent
>> one: https://arxiv.org/abs/2204.01722
>>
>>
>>> We have found NO performance improvement while using more and more mpi
>>> processes (1-2-3) and open-mp threads (from 1 to 72 threads). Did anyone
>>> faced to this problem? Does anyone know any possible reasons of such
>>> behaviour?
>>>
>>
>> Solver behavior is dependent on the input matrix. The only
>> general-purpose solvers
>> are direct, but they do not scale linearly and have high memory
>> requirements.
>>
>> Thus, in order to make progress you will have to be specific about your
>> matrices.
>>
>>
>>> We use AMG preconditioner and GMRES solver from KSP package, as our
>>> matrix is large (from 100 000 to 1e+6 rows and columns), sparse,
>>> non-symmetric and includes both positive and negative values. But
>>> performance problems also exist while using CG solvers with symmetric
>>> matrices.
>>>
>>
>> There are many PETSc examples, such as example 5 for the Laplacian, that
>> exhibit
>> good scaling with both AMG and GMG.
>>
>>
>>> Could anyone help us to set appropriate options of the preconditioner
>>> and solver? Now we use default parameters, maybe they are not the best,
>>> but we do not know a good combination. Or maybe you could suggest any
>>> other pairs of preconditioner+solver for such tasks?
>>>
>>> I can provide more information: the matrices that we solve, c++ script
>>> to run solving using petsc and any statistics obtained by our runs.
>>>
>>
>> First, please provide a description of the linear system, and the output
>> of
>>
>>   -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view
>>
>> for each test case.
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> Thank you in advance!
>>>
>>> Best regards,
>>> Lidiia Varshavchik,
>>> Ioffe Institute, St. Petersburg, Russia
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>> https://www.cse.buffalo.edu/~knepley/
>> 
>>
>

-- 
What most experimenters take for granted before they begin their
experiments is infinitely more interesting than any results to which their
experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/ 


Re: [petsc-users] Sparse linear system solving

2022-05-31 Thread Lidia

Matt, Mark, thank you much for your answers!


Now we have run example # 5 on our computer cluster and on the local 
server and also have not seen any performance increase, but by unclear 
reason running times on the local server are much better than on the 
cluster.


Now we will try to run petsc #5 example inside a docker container on our 
server and see if the problem is in our environment. I'll write you the 
results of this test as soon as we get it.


The ksp_monitor outs for the 5th test at the current local server 
configuration (for 2 and 4 mpi processes) and for the cluster (for 1 and 
3 mpi processes) are attached .



And one more question. Potentially we can use 10 nodes and 96 threads at 
each node on our cluster. What do you think, which combination of 
numbers of mpi processes and openmp threads may be the best for the 5th 
example?


Thank you!


Best,
Lidiia

On 31.05.2022 05:42, Mark Adams wrote:
And if you see "NO" change in performance I suspect the solver/matrix 
is all on one processor.
(PETSc does not use threads by default so threads should not change 
anything).


As Matt said, it is best to start with a PETSc example that does 
something like what you want (parallel linear solve, see 
src/ksp/ksp/tutorials for examples), and then add your code to it.
That way you get the basic infrastructure in place for you, which is 
pretty obscure to the uninitiated.


Mark

On Mon, May 30, 2022 at 10:18 PM Matthew Knepley  
wrote:


On Mon, May 30, 2022 at 10:12 PM Lidia 
wrote:

Dear colleagues,

Is here anyone who have solved big sparse linear matrices
using PETSC?


There are lots of publications with this kind of data. Here is one
recent one: https://arxiv.org/abs/2204.01722

We have found NO performance improvement while using more and
more mpi
processes (1-2-3) and open-mp threads (from 1 to 72 threads).
Did anyone
faced to this problem? Does anyone know any possible reasons
of such
behaviour?


Solver behavior is dependent on the input matrix. The only
general-purpose solvers
are direct, but they do not scale linearly and have high memory
requirements.

Thus, in order to make progress you will have to be specific about
your matrices.

We use AMG preconditioner and GMRES solver from KSP package,
as our
matrix is large (from 100 000 to 1e+6 rows and columns), sparse,
non-symmetric and includes both positive and negative values. But
performance problems also exist while using CG solvers with
symmetric
matrices.


There are many PETSc examples, such as example 5 for the
Laplacian, that exhibit
good scaling with both AMG and GMG.

Could anyone help us to set appropriate options of the
preconditioner
and solver? Now we use default parameters, maybe they are not
the best,
but we do not know a good combination. Or maybe you could
suggest any
other pairs of preconditioner+solver for such tasks?

I can provide more information: the matrices that we solve,
c++ script
to run solving using petsc and any statistics obtained by our
runs.


First, please provide a description of the linear system, and the
output of

  -ksp_view -ksp_monitor_true_residual -ksp_converged_reason -log_view

for each test case.

  Thanks,

     Matt

Thank you in advance!

Best regards,
Lidiia Varshavchik,
Ioffe Institute, St. Petersburg, Russia



-- 
What most experimenters take for granted before they begin their

experiments is infinitely more interesting than any results to
which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/

user@kitesrv:/data/raid1/tmp/petsc/src/ksp/ksp/tutorials$ 
LD_LIBRARY_PATH=/data/raid1/tmp/petsc/install/lib time 
/data/raid1/tmp/petsc/install/bin/mpirun -n 4 ./ex5 -ksp_view 
-ksp_monitor_true_residual -ksp_converged_reason -log_view
  0 KSP preconditioned resid norm 5.925618307774e+02 true resid norm 
1.489143042155e+03 ||r(i)||/||b|| 1.e+00
  1 KSP preconditioned resid norm 2.022444592869e+02 true resid norm 
5.516335411837e+02 ||r(i)||/||b|| 3.704369060378e-01
  2 KSP preconditioned resid norm 1.058407283487e+02 true resid norm 
2.759478800294e+02 ||r(i)||/||b|| 1.853064965673e-01
  3 KSP preconditioned resid norm 3.693395834711e+01 true resid norm 
1.275254917089e+02 ||r(i)||/||b|| 8.563683145197e-02
  4 KSP preconditioned resid norm 1.408997772215e+01 true resid norm 
4.766518770804e+01 ||r(i)||/||b|| 3.200846819863e-02
  5 KSP preconditioned resid norm 5.361143512330e+00 true resid norm 
2.037934685918e+01 ||r(i)||/||b|| 1.368528494730e-02
  6 KSP preconditioned resid norm 1.510583748885e+00 true resid norm 
6.398081426940e+00 ||r(i)||/||b|| 4.296485