Re: [petsc-users] Extremely slow DMNetwork Jacobian assembly

2019-05-08 Thread Justin Chang via petsc-users
Hi everyone,

Yes I have these lines in my code:

ierr = DMCreateMatrix(networkdm,);CHKERRQ(ierr);
ierr =
MatSetOption(J,MAT_NEW_NONZERO_ALLOCATION_ERR,PETSC_FALSE);CHKERRQ(ierr);

I tried -info and here's my output:

[0] PetscInitialize(): PETSc successfully started: number of processors = 1
[0] PetscInitialize(): Running on machine: jchang31606s.domain
[0] PetscCommDuplicate(): Duplicating a communicator 4436504608
140550815662944 max tags = 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
140550815662944
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
140550815662944
Base power = 0.17, numbus = 115000, numgen = 5000, numyl = 75000, numdl
= 5000, numlbr = 10, numtbr = 5000

 Power flow dist case 

Base power = 0.17, nbus = 115000, ngen = 5000, nwye = 75000, ndelta =
5000, nbranch = 114999
[0] PetscCommDuplicate(): Duplicating a communicator 4436505120
140550815683104 max tags = 2147483647
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] MatAssemblyEnd_SeqAIJ(): Matrix size: 62 X 62; storage space: 0
unneeded,10799928 used
[0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
[0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 28
[0] MatCheckCompressedRow(): Found the ratio (num_zerorows
0)/(num_localrows 62) < 0.6. Do not use CompressedRow routines.
[0] MatSeqAIJCheckInode(): Found 205000 nodes of 62. Limit used: 5.
Using Inode routines
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436504608
140550815662944
[0] DMGetDMSNES(): Creating new DMSNES
[0] DMGetDMKSP(): Creating new DMKSP
[0] PetscCommDuplicate(): Using internal PETSc communicator 4436505120
140550815683104
  0 SNES Function norm 1155.45

nothing else -info related shows up as I'm iterating through the vertex
loop.

I'll have a MWE for you guys to play with shortly.

Thanks,
Justin

On Wed, May 8, 2019 at 12:10 PM Smith, Barry F.  wrote:

>
>   Justin,
>
>  Are you providing matrix entries that connect  directly one vertex to
> another vertex ACROSS an edge? I don't think that is supported by the
> DMNetwork model. The assumption is that edges are only connected to
> vertices and vertices are only connected to neighboring edges.
>
>   Everyone,
>
>   I second Matt's reply.
>
>   How is the DMNetwork preallocating for the Jacobian? Does it take into
> account coupling between neighboring vertices/edges? Or does it assume no
> coupling. Or assume full coupling. If it assumes no coupling and the user
> has a good amount of coupling it will be very slow.
>
>   There would need to be a way for the user provide the coupling
> information between neighboring vertices/edges if it assumes no coupling.
>
> Barry
>
>
> > On May 8, 2019, at 7:44 AM, Matthew Knepley via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
> >
> > On Wed, May 8, 2019 at 4:45 AM Justin Chang via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
> > Hi guys,
> >
> > I have a fully working distribution system solver written using
> DMNetwork, The idea is that each electrical bus can have up to three phase
> nodes, and each phase node has two unknowns: voltage magnitude and angle.
> In a completely balanced system, each bus has three nodes, but in an
> unbalanced system some of the buses can be either single phase or two-phase.
> >
> > The working DMNetwork code I developed, loosely based on the SNES
> network/power.c, essentially represents each vertex as a bus.
> DMNetworkAddNumVariables() function will add either 2, 4, or 6 unknowns to
> each vertex. If every single bus had the same number of variables, the mat
> block size = 2, 4, or 6, and my code is both fast and scalable. However, if
> the unknowns per D

Re: [petsc-users] Confusing Schur preconditioner behaviour

2019-03-18 Thread Justin Chang via petsc-users
Colin,

1) What equations are you solving?

2) In your second case, you set the outer ksp to preonly, thus we are
unable to see the ksp_monitor for the (firedrake_0_) solver. Set it to
gmres and see if you have a similar output to your first case:

0 KSP preconditioned resid norm 4.985448866758e+00 true resid norm
1.086016610848e-03 ||r(i)||/||b|| 1.e+00
1 KSP preconditioned resid norm 1.245615753306e-13 true resid norm
2.082000915439e-14 ||r(i)||/||b|| 1.917098591903e-11

Because according to the first ksp_view output, after one lu sweep for the
(firedrake_0_fieldsplit_1_) solver. That is, going from:

0 KSP preconditioned resid norm 8.819238435108e-02 true resid norm
1.797571993221e-01 ||r(i)||/||b|| 1.e+00

to

1 KSP preconditioned resid norm 1.025167319984e-02 true resid norm
3.340583874349e-02 ||r(i)||/||b|| 1.858386694356e-01

appeared to give you an exact schur complement.

Justin

On Mon, Mar 18, 2019 at 2:18 PM Cotter, Colin J via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Sorry, just to clarify, in the second case I see several *inner*
> iterations, even though I'm using LU on a supposedly exact Schur complement
> as the preconditioner for the Schur system.
> --
> *From:* petsc-users  on behalf of
> Cotter, Colin J via petsc-users 
> *Sent:* 18 March 2019 20:14:48
> *To:* petsc-users@mcs.anl.gov
> *Subject:* [petsc-users] Confusing Schur preconditioner behaviour
>
>
> Dear petsc-users,
>
>   I'm solving a 2x2 block system, for which I can construct the Schur
> complement analytically (through compatible FEM stuff),
>
> which I can pass as the preconditioning matrix.
>
>
> When using gmres on the outer iteration, and preonly+lu on the inner
> iterations with a Schur complement preconditioner,
>
> I see convergence in 1 iteration as expected. However, when I set gmres+lu
> on the inner iteration for S, I see several iterations.
>
>
> This seems strange to me, as the first result seems to confirm that I have
> an exact Schur complement, but the second result
>
> implies not.
>
> What could be going on here?
>
>
> I've appended output to the bottom of this message, first the preonly+lu
> and then for gmres+lu.
>
>
> all the best
>
> --Colin
>
>
>Linear firedrake_0_fieldsplit_1_ solve converged due to CONVERGED_ITS
> iterations 1
> Residual norms for firedrake_0_ solve.
> 0 KSP preconditioned resid norm 4.985448866758e+00 true resid norm
> 1.086016610848e-03 ||r(i)||/||b|| 1.e+00
> Linear firedrake_0_fieldsplit_1_ solve converged due to CONVERGED_ITS
> iterations 1
> 1 KSP preconditioned resid norm 1.245615753306e-13 true resid norm
> 2.082000915439e-14 ||r(i)||/||b|| 1.917098591903e-11
> KSP Object: (firedrake_0_) 1 MPI processes
>   type: gmres
> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-07, absolute=1e-50, divergence=1.
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object: (firedrake_0_) 1 MPI processes
>   type: fieldsplit
> FieldSplit with Schur preconditioner, factorization FULL
> Preconditioner for the Schur complement formed from A11
> Split info:
> Split number 0 Defined by IS
> Split number 1 Defined by IS
> KSP solver for A00 block
>   KSP Object: (firedrake_0_fieldsplit_0_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
> left preconditioning
> using NONE norm type for convergence test
>   PC Object: (firedrake_0_fieldsplit_0_) 1 MPI processes
> type: lu
>   out-of-place factorization
>   tolerance for zero pivot 2.22045e-14
>   matrix ordering: nd
>   factor fill ratio given 5., needed 1.
> Factored matrix follows:
>   Mat Object: 1 MPI processes
> type: seqaij
> rows=6144, cols=6144
> package used to perform factorization: petsc
> total: nonzeros=18432, allocated nonzeros=18432
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 2048 nodes, limit used is 5
> linear system matrix = precond matrix:
> Mat Object: (firedrake_0_fieldsplit_0_) 1 MPI processes
>   type: seqaij
>   rows=6144, cols=6144
>   total: nonzeros=18432, allocated nonzeros=18432
>   total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 2048 nodes, limit used is 5
> KSP solver for S = A11 - A10 inv(A00) A01
>   KSP Object: (firedrake_0_fieldsplit_1_) 1 MPI processes
> type: preonly
> maximum iterations=1, initial guess is 

Re: [petsc-users] BJACOBI with FIELDSPLIT

2019-03-18 Thread Justin Chang via petsc-users
Use -ksp_type fgmres if your inner ksp solvers are gmres. Maybe that will
help?

On Mon, Mar 18, 2019 at 1:33 PM Rossi, Simone via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Thanks Barry.
>
> Let me know if you can spot anything out of the ksp_view
>
>
> KSP Object: 1 MPI processes
>
>   type: gmres
>
> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>
> happy breakdown tolerance 1e-30
>
>   maximum iterations=5000, nonzero initial guess
>
>   tolerances:  relative=0.001, absolute=1e-50, divergence=1.
>
>   left preconditioning
>
>   using PRECONDITIONED norm type for convergence test
>
> PC Object: 1 MPI processes
>
>   type: fieldsplit
>
> FieldSplit with MULTIPLICATIVE composition: total splits = 2,
> blocksize = 2
>
> Solver info for each split is in the following KSP objects:
>
> Split number 0 Fields  0
>
> KSP Object: (fieldsplit_0_) 1 MPI processes
>
>   type: gmres
>
> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>
> happy breakdown tolerance 1e-30
>
>   maximum iterations=1, initial guess is zero
>
>   tolerances:  relative=1e-12, absolute=1e-50, divergence=1.
>
>   left preconditioning
>
>   using PRECONDITIONED norm type for convergence test
>
> PC Object: (fieldsplit_0_) 1 MPI processes
>
>   type: hypre
>
> HYPRE BoomerAMG preconditioning
>
>   Cycle type V
>
>   Maximum number of levels 25
>
>   Maximum number of iterations PER hypre call 1
>
>   Convergence tolerance PER hypre call 0.
>
>   Threshold for strong coupling 0.25
>
>   Interpolation truncation factor 0.
>
>   Interpolation: max elements per row 0
>
>   Number of levels of aggressive coarsening 0
>
>   Number of paths for aggressive coarsening 1
>
>   Maximum row sums 0.9
>
>   Sweeps down 1
>
>   Sweeps up   1
>
>   Sweeps on coarse1
>
>   Relax down  symmetric-SOR/Jacobi
>
>   Relax upsymmetric-SOR/Jacobi
>
>   Relax on coarse Gaussian-elimination
>
>   Relax weight  (all)  1.
>
>   Outer relax weight (all) 1.
>
>   Using CF-relaxation
>
>   Not using more complex smoothers.
>
>   Measure typelocal
>
>   Coarsen typeFalgout
>
>   Interpolation type  classical
>
>   linear system matrix = precond matrix:
>
>   Mat Object: (fieldsplit_0_) 1 MPI processes
>
> type: seqaij
>
> rows=35937, cols=35937
>
> total: nonzeros=912673, allocated nonzeros=912673
>
> total number of mallocs used during MatSetValues calls =0
>
>   not using I-node routines
>
> Split number 1 Fields  1
>
> KSP Object: (fieldsplit_1_) 1 MPI processes
>
>   type: gmres
>
> restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>
> happy breakdown tolerance 1e-30
>
>   maximum iterations=1, initial guess is zero
>
>   tolerances:  relative=1e-12, absolute=1e-50, divergence=1.
>
>   left preconditioning
>
>   using PRECONDITIONED norm type for convergence test
>
> PC Object: (fieldsplit_1_) 1 MPI processes
>
>   type: hypre
>
> HYPRE BoomerAMG preconditioning
>
>   Cycle type V
>
>   Maximum number of levels 25
>
>   Maximum number of iterations PER hypre call 1
>
>   Convergence tolerance PER hypre call 0.
>
>   Threshold for strong coupling 0.25
>
>   Interpolation truncation factor 0.
>
>   Interpolation: max elements per row 0
>
>   Number of levels of aggressive coarsening 0
>
>   Number of paths for aggressive coarsening 1
>
>   Maximum row sums 0.9
>
>   Sweeps down 1
>
>   Sweeps up   1
>
>   Sweeps on coarse1
>
>   Relax down  symmetric-SOR/Jacobi
>
>   Relax upsymmetric-SOR/Jacobi
>
>   Relax on coarse Gaussian-elimination
>
>   Relax weight  (all)  1.
>
>   Outer relax weight (all) 1.
>
>   Using CF-relaxation
>
>   Not using more complex smoothers.
>
>   Measure typelocal
>
>   Coarsen typeFalgout
>
>   Interpolation type  classical
>
>   linear system matrix = precond matrix:
>
>   Mat Object: (fieldsplit_1_) 1 MPI processes
>
> type: seqaij
>
> rows=35937, cols=35937
>
> total: nonzeros=912673, allocated nonzeros=912673
>
> total number of mallocs used during MatSetValues calls =0
>
>   not using I-node routines
>
>   linear system matrix = precond matrix:
>
>   Mat Object: () 1 MPI processes
>
> type: seqaij
>
> rows=71874, cols=71874
>
> 

Re: [petsc-users] Meaning of PETSc error code 77

2019-02-08 Thread Justin Chang via petsc-users
 Hi Aditya,

I wouldn't trust FEniCS' wrappers around PETSc. It might also depend on the
version of DOLFIN you're working with.

Shameless plug: Try using pfibs

https://github.com/NREL/pfibs

 We wrote a slightly better PETSc interface to FEniCS. It does require
FEniCS 2018.1.0 or higher. This should at least give better error logs for
your problem.

Also I know a lot of PETSc solvers rely on some sort of DM, which FEniCS
(at the moment) does not automatically provide. We have a pFibs branch
which attaches a DMShell to your functionspace.

https://github.com/NREL/pfibs/tree/dm-nested-solver-feature

It's still in its early stages, but it should at least be able to handle
the PETScOptions you've set above.

Hopefully this helps,
Justin

On Fri, Feb 8, 2019 at 10:41 AM aditya kumar via petsc-users <
petsc-users@mcs.anl.gov> wrote:

> Hello,
>
> I am using PETSc with FEniCS project libraries to solve a nonlinear
> problem. I am using PETSc Krylov solver with the following configuration
>
> pc = PETScPreconditioner("petsc_amg")
> PETScOptions.set("mg_levels_ksp_type", "chebyshev")
> PETScOptions.set("mg_levels_pc_type", "jacobi")
> PETScOptions.set("mg_levels_esteig_ksp_type", "cg")
> PETScOptions.set("mg_levels_ksp_chebyshev_esteig_steps", 50) solver_u =
> PETScKrylovSolver("cg", pc)
>
> I am encountering this error sometimes during the solution of the linear
> system:
> Error: Unable to successfully call PETSc function 'KSPSolve'. *** Reason:
> PETSc error code is: 77 (Petsc has generated inconsistent data). *** Where:
> This error was encountered inside
> /tmp/dolfin/dolfin/la/PETScKrylovSolver.cpp.
>
> *** Process: 130
>
> Can anybody help me in understanding the possible reasons for this error
> code? Or refer me to any available documentation? It will be really
> helpful.
>
> Also, I will like to know if there is a way to prevent the code from
> stopping due to such PETSc errors, so that I can tweak some parameters to
> hopefully make it progress.
>
> Thanks and Regards,
> Aditya
>
>


Re: [petsc-users] Preconditioning systems of equations with complex numbers

2019-02-04 Thread Justin Chang via petsc-users
Thanks everyone for your suggestions/feedback. So a few things:

1) When I examined larger distribution networks (~150k buses) my eigenvalue
estimates from the chebyshev method get enormous indeed. See below:

[0] PCSetUp_GAMG(): level 0) N=320745, n data rows=1, n data cols=1,
nnz/row (ave)=6, np=1
[0] PCGAMGFilterGraph():   98.5638% nnz after filtering, with threshold 0.,
6.01293 nnz ave. (N=320745)
[0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
[0] PCGAMGProlongator_AGG(): New grid 44797 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.403335e+00
min=4.639523e-02 PC=jacobi
[0] PCSetUp_GAMG(): 1) N=44797, n data cols=1, nnz/row (ave)=7, 1 active pes
[0] PCGAMGFilterGraph():   99.9753% nnz after filtering, with threshold 0.,
7.32435 nnz ave. (N=44797)
[0] PCGAMGProlongator_AGG(): New grid 13043 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=8.173298e+00
min=9.687506e-01 PC=jacobi
[0] PCSetUp_GAMG(): 2) N=13043, n data cols=1, nnz/row (ave)=22, 1 active
pes
[0] PCGAMGFilterGraph():   99.684% nnz after filtering, with threshold 0.,
22.5607 nnz ave. (N=13043)
[0] PCGAMGProlongator_AGG(): New grid 2256 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=5.696594e+00
min=6.150856e-01 PC=jacobi
[0] PCSetUp_GAMG(): 3) N=2256, n data cols=1, nnz/row (ave)=79, 1 active pes
[0] PCGAMGFilterGraph():   93.859% nnz after filtering, with threshold 0.,
79.5142 nnz ave. (N=2256)
[0] PCGAMGProlongator_AGG(): New grid 232 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=1.454120e+00
min=6.780909e-01 PC=jacobi
[0] PCSetUp_GAMG(): 4) N=232, n data cols=1, nnz/row (ave)=206, 1 active pes
[0] PCGAMGFilterGraph():   99.1729% nnz after filtering, with threshold 0.,
206.379 nnz ave. (N=232)
[0] PCGAMGProlongator_AGG(): New grid 9 nodes
[0] PCGAMGOptProlongator_AGG(): Smooth P0: max eigen=2.443612e+00
min=2.153627e-01 PC=jacobi
[0] PCSetUp_GAMG(): 5) N=9, n data cols=1, nnz/row (ave)=9, 1 active pes
[0] PCSetUp_GAMG(): 6 levels, grid complexity = 1.44058

2) I tried all the suggestions mentioned before: setting
-pc_gamg_agg_nsmooths 0  -pc_gamg_square_graph 10 did not improve my
convergence. Neither did explicitly setting -mg_coarse_pc_type lu or more
iterations of richardson/sor.

3a) -pc_type asm works only if I set -sub_pc_type lu. Basically I'm just
solving LU on the whole system.

3b)The problem is that I can't get it to even speedup across 2 MPI
processes for my 150k bus case (~300k dofs) - and I already checked that
this problem could in theory be parallelized by setting the -ksp_max_it to
something low and observing that the KSPSolve time decreases as MPI
concurrency increases. The potential speedup is countered by the fact that
the algorithmic convergence rate blows up when either more MPI processes
are added or when I tune the block_size/overlap parameters.

Which leads me to one of two conclusions/questions:

A) Is my 300 DOF problem still too small? Do I need to look at a problem
with, say, 10 Million DOF or more to see that increasing the asm_block_size
will have better performance (despite having more KSP iterations) than a
single giant asm_block?

B) Or is there something even more sinister about my system of equations in
complex form?

Thanks,
Justin

On Sat, Feb 2, 2019 at 1:41 PM Matthew Knepley  wrote:

> The coarse grid is getting set to SOR rather than LU.
>
>Matt
>
> On Fri, Feb 1, 2019 at 3:54 PM Justin Chang  wrote:
>
>> I tried these options:
>>
>> -ksp_view
>> -ksp_monitor_true_residual
>> -ksp_type gmres
>> -ksp_max_it 50
>> -pc_type gamg
>> -mg_coarse_pc_type sor
>> -mg_levels_1_ksp_type richardson
>> -mg_levels_2_ksp_type richardson
>> -mg_levels_3_ksp_type richardson
>> -mg_levels_4_ksp_type richardson
>> -mg_levels_1_pc_type sor
>> -mg_levels_2_pc_type sor
>> -mg_levels_3_pc_type sor
>> -mg_levels_4_pc_type sor
>>
>> And still have a non-converging solution:
>>
>>   0 KSP preconditioned resid norm 1.573625743885e+05 true resid norm
>> 1.2075e+08 ||r(i)||/||b|| 1.e+00
>>   1 KSP preconditioned resid norm 7.150979785488e+00 true resid norm
>> 1.218223835173e+04 ||r(i)||/||b|| 1.008881022917e-04
>>   2 KSP preconditioned resid norm 5.627140225062e+00 true resid norm
>> 1.209346465397e+04 ||r(i)||/||b|| 1.001529163889e-04
>>   3 KSP preconditioned resid norm 5.077324098998e+00 true resid norm
>> 1.211293532189e+04 ||r(i)||/||b|| 1.003141641564e-04
>>   4 KSP preconditioned resid norm 4.324840589068e+00 true resid norm
>> 1.213758867766e+04 ||r(i)||/||b|| 1.005183327342e-04
>>   5 KSP preconditioned resid norm 4.107764194462e+00 true resid norm
>> 1.212180975158e+04 ||r(i)||/||b|| 1.003876583981e-04
>>   6 KSP preconditioned resid norm 4.033152911227e+00 true resid norm
>> 1.216543340208e+04 ||r(i)||/||b|| 1.

Re: [petsc-users] Preconditioning systems of equations with complex numbers

2019-02-01 Thread Justin Chang via petsc-users
-smoother)
  Down solver (pre-smoother) on level 3 ---
KSP Object: (mg_levels_3_) 1 MPI processes
  type: richardson
damping factor=1.
  maximum iterations=2, nonzero initial guess
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object: (mg_levels_3_) 1 MPI processes
  type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega
= 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaij
rows=1541, cols=1541
total: nonzeros=8039, allocated nonzeros=8039
total number of mallocs used during MatSetValues calls =0
  not using I-node routines
  Up solver (post-smoother) same as down solver (pre-smoother)
  Down solver (pre-smoother) on level 4 ---
KSP Object: (mg_levels_4_) 1 MPI processes
  type: richardson
damping factor=1.
  maximum iterations=2, nonzero initial guess
  tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
  left preconditioning
  using NONE norm type for convergence test
PC Object: (mg_levels_4_) 1 MPI processes
  type: sor
type = local_symmetric, iterations = 1, local iterations = 1, omega
= 1.
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaij
rows=8541, cols=8541
total: nonzeros=46299, allocated nonzeros=46299
total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 5464 nodes, limit used is 5
  Up solver (post-smoother) same as down solver (pre-smoother)
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
type: seqaij
rows=8541, cols=8541
total: nonzeros=46299, allocated nonzeros=46299
total number of mallocs used during MatSetValues calls =0
  using I-node routines: found 5464 nodes, limit used is 5

Am I doing this right? Did I miss anything?



On Fri, Feb 1, 2019 at 1:22 PM Matthew Knepley  wrote:

> On Fri, Feb 1, 2019 at 3:05 PM Justin Chang  wrote:
>
>> Hi Mark,
>>
>> 1) So with these options:
>>
>> -ksp_type gmres
>> -ksp_rtol 1e-15
>> -ksp_monitor_true_residual
>> -ksp_converged_reason
>> -pc_type bjacobi
>>
>> This is what I get:
>>
>>   0 KSP preconditioned resid norm 1.900749341028e+04 true resid norm
>> 1.603849239146e+07 ||r(i)||/||b|| 1.e+00
>>   1 KSP preconditioned resid norm 1.363172190058e+03 true resid norm
>> 8.144804150077e+05 ||r(i)||/||b|| 5.078285384488e-02
>>   2 KSP preconditioned resid norm 1.744431536443e+02 true resid norm
>> 2.668646207476e+04 ||r(i)||/||b|| 1.663900909350e-03
>>   3 KSP preconditioned resid norm 2.798960448093e+01 true resid norm
>> 2.142171349084e+03 ||r(i)||/||b|| 1.335643835343e-04
>>   4 KSP preconditioned resid norm 2.938319245576e+00 true resid norm
>> 1.457432872748e+03 ||r(i)||/||b|| 9.087093956061e-05
>>   5 KSP preconditioned resid norm 1.539484356450e-12 true resid norm
>> 6.667274739274e-09 ||r(i)||/||b|| 4.157045797412e-16
>> Linear solve converged due to CONVERGED_RTOL iterations 5
>>
>> 2) With richardson/sor:
>>
>
> Okay, its looks like Richardson/SOR solves this just fine. You can use
> this as the smoother for GAMG instead
> of Cheby/Jacobi, and probably see better results on the larger problems.
>
>   Matt
>
>
>> -ksp_type richardson
>> -ksp_rtol 1e-15
>> -ksp_monitor_true_residual
>> -pc_type sor
>>
>> This is what I get:
>>
>>   0 KSP preconditioned resid norm 1.772935756018e+04 true resid norm
>> 1.603849239146e+07 ||r(i)||/||b|| 1.e+00
>>   1 KSP preconditioned resid norm 1.206881305953e+03 true resid norm
>> 4.507533687463e+03 ||r(i)||/||b|| 2.810447252426e-04
>>   2 KSP preconditioned resid norm 4.166906741810e+02 true resid norm
>> 1.221098715634e+03 ||r(i)||/||b|| 7.613550487354e-05
>>   3 KSP preconditioned resid norm 1.540698682668e+02 true resid norm
>> 4.241154731706e+02 ||r(i)||/||b|| 2.644359973612e-05
>>   4 KSP preconditioned resid norm 5.904921520051e+01 true resid norm
>> 1.587778309916e+02 ||r(i)||/||b|| 9.899797756316e-06
>>   5 KSP preconditioned resid norm 2.327938633860e+01 true resid norm
>> 6.161476099609e+01 ||r(i)||/||b|| 3.841680345773e-06
>>   6 KSP preconditioned resid norm 9.409043410169e+00 true resid norm
>> 2.458600025207e+01 ||r(i)||/||b|| 1.532937114786e-06
>>   7 KSP preconditioned resid norm 3.888365194933e+00 true resid norm
>> 1.005868527582e+01 ||r(i)||/||b|| 6.271590265661e-07
>>   8 KSP preconditioned resid norm 1.638018293396e

Re: [petsc-users] Preconditioning systems of equations with complex numbers

2019-01-31 Thread Justin Chang via petsc-users
Here's IMHO the simplest explanation of the equations I'm trying to solve:

http://home.eng.iastate.edu/~jdm/ee458_2011/PowerFlowEquations.pdf

Right now we're just trying to solve eq(5) (in section 1), inverting the
linear Y-bus matrix. Eventually we have to be able to solve equations like
those in the next section.

On Thu, Jan 31, 2019 at 1:47 PM Matthew Knepley  wrote:

> On Thu, Jan 31, 2019 at 3:20 PM Justin Chang via petsc-users <
> petsc-users@mcs.anl.gov> wrote:
>
>> Hi all,
>>
>> I'm working with some folks to extract a linear system of equations from
>> an external software package that solves power flow equations in complex
>> form. Since that external package uses serial direct solvers like KLU from
>> suitesparse, I want a proof-of-concept where the same matrix can be solved
>> in PETSc using its parallel solvers.
>>
>> I got mumps to achieve a very minor speedup across two MPI processes on a
>> single node (went from solving a 300k dog system in 1.8 seconds to 1.5
>> seconds). However I want to use iterative solvers and preconditioners but I
>> have never worked with complex numbers so I am not sure what the "best"
>> options are given PETSc's capabilities.
>>
>> So far I tried GMRES/BJACOBI and it craps out (unsurprisingly). I believe
>> I also tried BICG with BJACOBI and while it did converge it converged
>> slowly. Does anyone have recommendations on how one would go about
>> preconditioning PETSc matrices with complex numbers? I was originally
>> thinking about converting it to cartesian form: Declaring all voltages =
>> sqrt(real^2+imaginary^2) and all angles to be something like a conditional
>> arctan(imaginary/real) because all the papers I've seen in literature that
>> claim to successfully precondition power flow equations operate in this
>> form.
>>
>
> 1) We really need to see the (simplified) equations
>
> 2) All complex equations can be converted to a system of real equations
> twice as large, but this is not necessarily the best way to go
>
>  Thanks,
>
> Matt
>
>
>> Justin
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>
> https://www.cse.buffalo.edu/~knepley/
> <http://www.cse.buffalo.edu/~knepley/>
>


Re: [petsc-users] Create a DM given sets of IS's

2019-01-02 Thread Justin Chang via petsc-users
Sorry, I forgot three more lines describing the 3rd level of split:

-pc_type fieldsplit
-pc_fieldsplit_0_fields 0,1,2
-pc_fieldsplit_1_fields 3,4,5
-pc_fieldsplit_2_fields 6,7,8
-fieldsplit_0_pc_type fieldsplit
-fieldsplit_1_pc_type fieldsplit
-fieldsplit_2_pc_type fieldsplit
-fieldsplit_0_pc_fieldsplit_0_fields 0,1
-fieldsplit_0_pc_fieldsplit_1_fields 2
-fieldsplit_1_pc_fieldsplit_0_fields 3,4
-fieldsplit_1_pc_fieldsplit_1_fields 5
-fieldsplit_2_pc_fieldsplit_0_fields 6,7
-fieldsplit_2_pc_fieldsplit_1_fields 8

-fieldsplit_0_fieldsplit_0_pc_type fieldsplit
-fieldsplit_1_fieldsplit_0_pc_type fieldsplit
-fieldsplit_2_fieldsplit_0_pc_type fieldsplit

On Wed, Jan 2, 2019 at 2:54 PM Justin Chang  wrote:

> Okay that's fine.
>
> Now consider one more case:
>
> Suppose I create a PetscSection with 9 fields:
>
> section = PETSc.Section().create()
> section.setNumFields(9)
> section.setFieldName(0,'u1')
> section.setFieldName(1,'c1')
> section.setFieldName(2,'t1')
> section.setFieldName(3,'u2')
> section.setFieldName(4,'c2')
> section.setFieldName(5,'t2')
> section.setFieldName(6,'u3')
> section.setFieldName(7,'c3')
> section.setFieldName(8,'t3')
> section.setFieldComponents(0,1)
> section.setFieldComponents(1,1)
> section.setFieldComponents(2,1)
> section.setFieldComponents(3,1)
> section.setFieldComponents(4,1)
> section.setFieldComponents(5,1)
> section.setFieldComponents(6,1)
> section.setFieldComponents(7,1)
> section.setFieldComponents(8,1)
>
> 
>
> where u is potential, c is concentration, and t is temperature. The
> numbers refer to different domains of a battery (anode, electrolyte,
> cathode).
>
> I want three levels of splits:
> 1) Split by domain
> 2) Split potential and concentration from temperature.
> 3) Split potential and concentration for schur complementing purpose.
>
> Do the following PETSc command line options describe the split I mentioned
> above:
>
> -pc_type fieldsplit
> -pc_fieldsplit_0_fields 0,1,2
> -pc_fieldsplit_1_fields 3,4,5
> -pc_fieldsplit_2_fields 6,7,8
> -fieldsplit_0_pc_type fieldsplit
> -fieldsplit_1_pc_type fieldsplit
> -fieldsplit_2_pc_type fieldsplit
> -fieldsplit_0_pc_fieldsplit_0_fields 0,1
> -fieldsplit_0_pc_fieldsplit_1_fields 2
> -fieldsplit_1_pc_fieldsplit_0_fields 3,4
> -fieldsplit_1_pc_fieldsplit_1_fields 5
> -fieldsplit_2_pc_fieldsplit_0_fields 6,7
> -fieldsplit_2_pc_fieldsplit_1_fields 8
>
> I don't have an actual code to test these out with yet as I've never done
> anything beyond two levels of split. So I am wondering if the above will
> break anything as far as you can tell.
>
> Thanks,
> Justin
>
> On Tue, Jan 1, 2019 at 7:10 AM Matthew Knepley  wrote:
>
>> On Tue, Jan 1, 2019 at 5:06 AM Justin Chang  wrote:
>>
>>> Okay so I managed to successfully do this by translating the IS's into a
>>> PetscSection, assigning it to an empty DMShell, and then assigning that to
>>> KSP.
>>>
>>> However, I seem to be unable to rename the outer fields that aggregate
>>> two or more inner fields.
>>>
>>
>> Right now you cannot do that because of the way that I check for the
>> option:
>>
>>
>> https://bitbucket.org/petsc/petsc/src/0a5f382fff62a328ec919e3e9fe959e6cbbcf413/src/ksp/pc/impls/fieldsplit/fieldsplit.c#lines-369
>>
>> I guess we could replace this by some search code that looked for any
>> option of that form and then read
>> out the splitname using scanf. Right now, I do not see a pattern search
>> function for the options database.
>>
>>   Thanks,
>>
>> Matt
>>
>>
>>> Consider this snippet of a four-field dual porosity/permeability
>>> FEniCS/petsc4py code:
>>>
>>> ...
>>>
>>> ## Extract FEniCS dof layout, global indices ##
>>> dof_total = np.array(W.dofmap().dofs())
>>> dof_v1 = np.array(W.sub(0).dofmap().dofs())
>>> dof_p1 = np.array(W.sub(1).dofmap().dofs())
>>> dof_v2 = np.array(W.sub(2).dofmap().dofs())
>>> dof_p2 = np.array(W.sub(3).dofmap().dofs())
>>> offset = np.min(dof_total)
>>>
>>> ## Create PetscSection ##
>>> section = PETSc.Section().create()
>>> section.setNumFields(4)
>>> section.setFieldName(0,'v1')
>>> section.setFieldName(1,'p1')
>>> section.setFieldName(2,'v2')
>>> section.setFieldName(3,'p2')
>>> section.setFieldComponents(0,1)
>>> section.setFieldComponents(1,1)
>>> section.setFieldComponents(2,1)
>>> section.setFieldComponents(3,1)
>>> section.setChart(0,len(dof_total))
>>> for i in np.nditer(dof_v1):
>>>   section

[petsc-users] Create a DM given sets of IS's

2018-12-30 Thread Justin Chang via petsc-users
Hi all,

I am solving a six field battery problem (concentration and potential for
each of the two solid and one electrolyte domains) and I want to experiment
with nested/recursice fieldsplitting. I have the IS's and am able to use
these to define my PCFieldSplitsSetIS(). However, I can imagine this
getting really messy from a programming standpoint, especially once I need
to add temperature into the mix, so it is my hope that I can translate
these index sets and fields into a DM (maybe DMShell?) so that I can just
rely on command line options to play around with various combinations of
field assignments and splits (e.g. -pc_fieldsplit_X_fields)

However, it doesn't seem clear to me how I would create a DM when you
already know the IS's for each fields. If I understand functions like
DMCreateFieldDecomposition() correctly, it seems that it returns to you the
IS's and sub DM's associated with the original DM, whereas I want to do it
the other way around. Perhaps the "reversal" of something like
DMCreateFieldIS()
,
where you convert the IS into a PetscSection or is there an easier/better
way?

Any thoughts/help appreciated!

Justin


Re: [petsc-users] Setting SNESVI

2018-08-17 Thread Justin Chang
Have you tried SNESVINEWTONRSLS? In my experience, I seem to have slightly
better luck with that one instead of SNESVINEWTONSSLS

On Sat, Aug 18, 2018 at 12:56 AM Amir  wrote:

> Hi
> I need to apply constraint to primitive variables. I am troubling to set
> up SNESVI. After running the code, I feel SNES is not actually solving.
> After one iteration of SNES, I got DIVERGED_LINESEARCH, with NO changing in
> function norm.
> I think the way I set VI maybe wrong. Sorry for my simple explanation.
> Very thankful for your time.
> Amir
> SNESCreate(PETSC_COMM_WORLD,);
> SNESSetApplicationContext(snes,);
> SNESSetDM(snes,user.dm);
> SNESSetFunction(snes,NULL,FormFunction,);
> FormInitialSolution(user.X,);
> SNESSetSolution(snes,user.X);
> SNESMonitorSet(snes,MySNESMonitor,,NULL);
> SNESSetType(snes,SNESVINEWTONSSLS);
> SNESSetNormSchedule(snes, SNES_NORM_ALWAYS);
> SNESVISetVariableBounds(snes,user.XLOWER,user.XUPPER);
> SNESGetLineSearch(snes,);
> SNESLineSearchSetType(linesearch,SNESLINESEARCHBT);
> SNESLineSearchSetVIFunctions(linesearch, NULL, SNESNormFunction);
> SNESSetFromOptions(snes);
> SNESSolve(snes,NULL,user.X);
> [image: Open Tracking]


Re: [petsc-users] Good performance For small problem on GALILEO

2017-10-20 Thread Justin Chang
600 unknowns is way too small to parallelize. Need at least 10,000 unknowns
per MPI process:
https://www.mcs.anl.gov/petsc/documentation/faq.html#slowerparallel

What problem are you solving? Sounds like you either compiled PETSc with
debugging mode on or you just have a really terrible solver. Show us the
output of -log_view.

On Fri, Oct 20, 2017 at 12:47 AM Luca Verzeroli <
l.verzer...@studenti.unibg.it> wrote:

> Good morning,
> For my thesis I'm dealing with GALILEO, one of the clusters owned by
> Cineca. http://www.hpc.cineca.it/hardware/galileo
> The first question is: What is the best configuration to run petsc on this
> kind of cluster? My code is only a MPI program and I would like to know if
> it's better to use more nodes or more CPUs with mpirun.
> This question comes from the speed up of my code using that cluster. I
> have a small problem. The global matrices are 600x600. Are they too small
> to see a speed up with more mpiprocess? I notice that a single core
> simulation and a multi cores one take a similar time (multi core a second
> more). The real problem comes when I have to run multiple simulation of the
> same code changing some parameters. So I would like to speed up the single
> simulation.
> Any advices?
>
>
> Luca Verzeroli
>


Re: [petsc-users] ex12 poisson solver

2017-10-09 Thread Justin Chang
You need this additional command line argument: -petscspace_order 1



On Mon, Oct 9, 2017 at 9:02 AM, David Fuentes  wrote:

>
> Hi,
>
> I'm trying to use petsc 3.8.0 with ex12.c example to setup a poisson
> solver: http://www.mcs.anl.gov/petsc/petsc-current/src/snes/
> examples/tutorials/ex12.c.html
>
> I seem to be getting zeros in my jacobian with this example?
> I attached a debugger and the assembly routines seems ok... but am getting
> zero jacobian somewhere along the way to the MatAssembly...
> Am I missing something with the command line arguments ?
>
>
>
> $ ./ex12 -snes_type ksponly  -snes_monitor -snes_converged_reason
> -ksp_converged_reason -ksp_monitor -ksp_rtol 1.e-6 -pc_type jacobi -info
> -info_exclude null,pc,vec
> --
> [[55310,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
>
> Module: OpenFabrics (openib)
>   Host: SCRGP2
>
> Another transport will be used instead, although this may result in
> lower performance.
> --
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8 X 8; storage space: 0
> unneeded,8 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 8) < 0.6. Do not use CompressedRow routines.
> [0] MatSeqAIJCheckInode(): Found 8 nodes out of 8 rows. Not using Inode
> routines
> [0] DMGetDMSNES(): Creating new DMSNES
> [0] DMGetDMKSP(): Creating new DMKSP
>   0 SNES Function norm 1.414213562373e+00
> [0] MatAssemblyEnd_SeqAIJ(): Matrix size: 8 X 8; storage space: 0
> unneeded,8 used
> [0] MatAssemblyEnd_SeqAIJ(): Number of mallocs during MatSetValues() is 0
> [0] MatAssemblyEnd_SeqAIJ(): Maximum nonzeros in any row is 1
> [0] MatCheckCompressedRow(): Found the ratio (num_zerorows
> 0)/(num_localrows 8) < 0.6. Do not use CompressedRow routines.
> [0] SNESComputeJacobian(): Rebuilding preconditioner
> 0 KSP Residual norm 1.414213562373e+00
> 1 KSP Residual norm 1.414213562373e+00
> [0] KSPGMRESBuildSoln(): Likely your matrix or preconditioner is singular.
> HH(it,it) is identically zero; it = 0 GRS(it) = 1.41421
>   Linear solve did not converge due to DIVERGED_BREAKDOWN iterations 1
> [0] SNESSolve_KSPONLY(): iter=0, number linear solve failures 1 greater
> than current SNES allowed, stopping solve
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> Number of SNES iterations = 0
> L_2 Error: 0.751285
>
>
> configured with: ./config/configure.py --with-shared-libraries
> --with-clanguage=c++ --CFLAGS='-g -O0' --CXXFLAGS='-g -O0'
> --download-ctetgen  --download-triangle --with-debugging=yes
> --with-exodusii-lib=[/usr/lib/x86_64-linux-gnu/libexoIIv2.so]
> --with-netcdf-lib=[/usr/lib/libnetcdf.so] --with-netcdf-include=/usr/include
> --with-hdf5-include=/usr/include/hdf5/serial/
> --with-hdf5-lib=[/usr/lib/x86_64-linux-gnu/hdf5/serial/libhdf5.so]
> --with-c2html=0 --with-exodusii-include=/usr/include
>
>
> thanks!
> David
>
>
>


Re: [petsc-users] PETSC profiling on 1536 cores

2017-06-15 Thread Justin Chang
Using no preconditioner is a bad bad idea and anyone with the gall to do
this deserves to be spanked.

For the Poisson equation, why not use PETSc's native algebraic multigrid
solver?

-pc_type gamg

On Thu, Jun 15, 2017 at 3:09 PM Pietro Incardona 
wrote:

> Dear All
>
>
> I tried PETSC version 3.6.5 to solve a linear system with 256 000 000
> unknown. The equation is Finite differences Poisson equation.
>
>
> I am using Conjugate gradient (the matrix is symmetric) with no
> preconditioner. Visualizing the solution is reasonable.
>
> Unfortunately the Conjugate-Gradient does not scale at all and I am
> extremely concerned about this problem in paticular about the profiling
> numbers.
>
> Looking at the profiler it seem that
>
>
> 1536 cores = 24 cores x 64
>
>
> VecScatterBegin  348 1.0 2.3975e-01 1.8 0.00e+00 0.0 7.7e+06 3.1e+04
> 0.0e+00  0  0 85 99  0   0  0 85 99  0 0
> VecScatterEnd348 1.0 2.8680e+00 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0 0
> MatMult  348 1.0 4.1088e+00 1.4 8.18e+08 1.3 7.7e+06 3.1e+04
> 0.0e+00  2 52 85 99  0   2 52 85 99  0 281866
>
>
> I was expecting that this part was the most expensive and take around 4
> second in total that sound reasonable
>
>
> Unfortunately
>
>
> on 1536 cores = 24 cores x 64
>
>
> VecTDot  696 1.0 3.4442e+01 1.4 2.52e+08 1.3 0.0e+00 0.0e+00
> 7.0e+02 12 16  0  0 65  12 16  0  0 65 10346
> VecNorm  349 1.0 1.1101e+02 1.1 1.26e+08 1.3 0.0e+00 0.0e+00
> 3.5e+02 46  8  0  0 33  46  8  0  0 33  1610
> VecAXPY  696 1.0 8.3134e+01 1.1 2.52e+08 1.3 0.0e+00 0.0e+00
> 0.0e+00 34 16  0  0  0  34 16  0  0  0  4286
>
>
> Take over 228 seconds. Considering that doing some test on the cluster I
> do not see any problem with MPI_Reduce I do not understand how these
> numbers are possible
>
>
>
> // I also attach to the profiling part the
> inversion on 48 cores /
>
>
> VecTDot  696 1.0 1.4684e+01 1.3 3.92e+09 1.1 0.0e+00 0.0e+00
> 7.0e+02  6 16  0  0 65   6 16  0  0 65 24269
> VecNorm  349 1.0 4.9612e+01 1.3 1.96e+09 1.1 0.0e+00 0.0e+00
> 3.5e+02 22  8  0  0 33  22  8  0  0 33  3602
> VecCopy  351 1.0 8.8359e+00 7.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0 0
> VecSet 2 1.0 1.6177e-02 2.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> VecAXPY  696 1.0 8.8559e+01 1.1 3.92e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00 42 16  0  0  0  42 16  0  0  0  4024
> VecAYPX  347 1.0 4.6790e+00 1.2 1.95e+09 1.1 0.0e+00 0.0e+00
> 0.0e+00  2  8  0  0  0   2  8  0  0  0 37970
> VecAssemblyBegin   2 1.0 5.0942e-02 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
> 6.0e+00  0  0  0  0  1   0  0  0  0  1 0
> VecAssemblyEnd 2 1.0 1.9073e-05 6.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> VecScatterBegin  348 1.0 1.2763e+00 1.5 0.00e+00 0.0 4.6e+05 2.0e+05
> 0.0e+00  0  0 97100  0   0  0 97100  0 0
> VecScatterEnd348 1.0 4.6741e+00 5.6 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  1  0  0  0  0   1  0  0  0  0 0
> MatMult  348 1.0 2.8440e+01 1.1 1.27e+10 1.1 4.6e+05 2.0e+05
> 0.0e+00 13 52 97100  0  13 52 97100  0 40722
> MatAssemblyBegin   1 1.0 7.4749e-0124.5 0.00e+00 0.0 0.0e+00 0.0e+00
> 2.0e+00  0  0  0  0  0   0  0  0  0  0 0
> MatAssemblyEnd 1 1.0 8.3194e-01 1.0 0.00e+00 0.0 2.7e+03 5.1e+04
> 8.0e+00  0  0  1  0  1   0  0  1  0  1 0
> KSPSetUp   1 1.0 8.2883e-02 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> KSPSolve   1 1.0 1.7964e+02 1.0 2.45e+10 1.1 4.6e+05 2.0e+05
> 1.0e+03 87100 97100 98  87100 97100 98 12398
> PCSetUp1 1.0 1.1921e-06 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
> PCApply  349 1.0 8.8166e+00 7.8 0.00e+00 0.0 0.0e+00 0.0e+00
> 0.0e+00  2  0  0  0  0   2  0  0  0  0 0
>
>
>
>
> /
>
>
> If you need more information or test please let me know.
>
>
> Thanks in advance
>
>
> Here the log of 1536 cores
>
>
> 345 KSP Residual norm 1.007085286893e-02
> 346 KSP Residual norm 1.010054402040e-02
> 347 KSP Residual norm 1.002139574355e-02
> 348 KSP Residual norm 9.775851299055e-03
> Max div for vorticity 1.84572e-05   Integral: 6.62466e-09  130.764   -132
>
> 
> *** WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r
> -fCourier9' to print this document***
>
> 
>
> 

Re: [petsc-users] examples of DMPlex*FVM methods

2017-04-06 Thread Justin Chang
There are many flavors of FEM and FVM. If by FEM you mean the Continuous
Galerkin FEM, then yes it is a far from ideal method for solving
advection-diffusion equations, especially when advection is the dominating
effect. The Discontinuous Galerkin (DG) FEM on the other hand is much
better for advection-diffusion equations (though far from perfect). It has
properties very similar to the FVM as it also ensures local/element-wise
mass conservation. Ferziger's book is quite biased in favor of FVM and
doesn't discuss other numerical methods in depth.

I don't think there are any PETSc DMPlex DG examples at the moment (though
I could be wrong). But this paper gives a nice introduction/overview to DG:
http://epubs.siam.org/doi/abs/10.1137/S0036142901384162

Now if you are interested in a really sophisticated "PETSc example" of
solving the advection-diffusion equation using the two-point flux FVM,
there's PFLOTRAN: http://www.pflotran.org

Justin

PS - Is there any particular reason why PFLOTRAN is not listed as on the
PETSc homepage? It seems to be a pretty major "Related packages that use
PETSc."

On Thu, Apr 6, 2017 at 1:16 AM, Ingo Gaertner 
wrote:

>
>
> 2017-04-05 19:56 GMT+02:00 Jed Brown :
>
>> Ingo Gaertner  writes:
>>
>> > Hi Matt,
>> > I don't care if FV is suboptimal to solve the Poisson equation. I only
>> want
>> > to better understand the method by getting my hands dirty, and also
>> > implement the general transport equation later. We were told that FVM is
>> > far more efficient for the transport equation than FEM, and this is why
>> > most CFD codes would use FVM. Do you contradict? Do you have benchmarks
>> > that show bad performance for the (parabolic) transport equation
>>
>> What is the "parabolic transport equation"?  Advection-dominated
>> diffusion?  The hyperbolic part is usually the hard part.  FEM can solve
>> these problems, but FV is a good method, particularly if you want local
>> conservation and monotonicity.
>>
>
> By transport equation I mean the advection-diffusion equation. This is
> always parabolic, independent of whether it is advection dominated or
> diffusion dominated. And the elliptic Poisson equation can be solved by
> making it timedependent and converge to steady state, again solving a
> parabolic equation. At least  this is how I learned the terms.
> My impression is that everybody has his hammer, be it FEM or FVM, so that
> every problem looks like a nail. You can also hammer a screw into the wall
> if the wall isn't too hard.
>
> Ingo
>
>
> 
>  Virenfrei.
> www.avast.com
> 
> <#m_-6291878642536978062_DAB4FAD8-2DD7-40BB-A1B8-4E2AA1F9FDF2>
>


Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-04-05 Thread Justin Chang
BTW what are the relevant papers that describe this problem? Is this one

http://epubs.siam.org/doi/abs/10.1137/110834512


On Mon, Apr 3, 2017 at 1:04 PM Justin Chang <jychan...@gmail.com> wrote:

> So my makefile/script is slightly different from the tutorial directory.
> Basically I have a shell for loop that runs the 'make runex48' four times
> where -da_refine is increased each time. It showed Levels 1 0 then 2 1 0
> because the job was in the middle of the loop, and I cancelled it halfway
> when I realized it was returning errors as I didn't want to burn any
> precious SU's :)
>
> Anyway, I ended up using Edison with 16 cores/node and Cori/Haswell with
> 32 cores/node and got some nice numbers for 128x128x16 coarse grid. I am
> however having issues with Cori/KNL, which I think has more to do with how
> I configured PETSc and/or the job scripts.
>
> On Mon, Apr 3, 2017 at 6:23 AM, Jed Brown <j...@jedbrown.org> wrote:
>
> Matthew Knepley <knep...@gmail.com> writes:
> > I can't think why it would fail there, but DMDA really likes old numbers
> of
> > vertices, because it wants
> > to take every other point, 129 seems good. I will see if I can reproduce
> > once I get a chance.
>
> This problem uses periodic boundary conditions so even is right, but
> Justin only defines the coarsest grid and uses -da_refine so it should
> actually be irrelevant.
>
>
>


Re: [petsc-users] Configuring PETSc for KNL

2017-04-05 Thread Justin Chang
I simply ran these KNL simulations in flat mode with the following options:

srun -n 64 -c 4 --cpu_bind=cores numactl -p 1 ./ex48 

Basically I told it that MCDRAM usage in NUMA domain 1 is preferred. I
followed the last example:
http://www.nersc.gov/users/computational-systems/cori/configuration/knl-processor-modes/

On Wed, Apr 5, 2017 at 12:00 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Apr 5, 2017 at 11:54 AM, Zhang, Hong <hongzh...@anl.gov> wrote:
>
>>
>> > On Apr 5, 2017, at 10:53 AM, Jed Brown <j...@jedbrown.org> wrote:
>> >
>> > "Zhang, Hong" <hongzh...@anl.gov> writes:
>> >
>> >> On Apr 4, 2017, at 10:45 PM, Justin Chang <jychan...@gmail.com> jychan...@gmail.com>> wrote:
>> >>
>> >> So I tried the following options:
>> >>
>> >> -M 40
>> >> -N 40
>> >> -P 5
>> >> -da_refine 1/2/3/4
>> >> -log_view
>> >> -mg_coarse_pc_type gamg
>> >> -mg_levels_0_pc_type gamg
>> >> -mg_levels_1_sub_pc_type cholesky
>> >> -pc_type mg
>> >> -thi_mat_type baij
>> >>
>> >> Performance improved dramatically. However, Haswell still beats out
>> KNL but only by a little. Now it seems like MatSOR is taking some time
>> (though I can't really judge whether it's significant or not). Attached are
>> the log files.
>> >>
>> >>
>> >> MatSOR takes only 3% of the total time. Most of the time is spent on
>> PCSetUp (~30%) and PCApply (~11%).
>> >
>> > I don't see any of your conclusions in the actual data, unless you only
>> > looked at the smallest size that Justin tested.  For example, from the
>> > largest problem size in Justin's logs:
>>
>> My mistake. I did not see the results for the large problem sizes. I was
>> talking about the data for the smallest case.
>>
>> Now I am very surprised by the performance of MatSOR:
>>
>> -da_refine 1 ~2x slower on KNL
>> -da_refine 2 ~2x faster on KNL
>> -da_refine 3 ~2x faster on KNL
>> -da_refine 4 almost the same
>>
>> KNL
>>
>> -da_refine 1 MatSOR  1185 1.0 2.8965e-01 1.1 7.01e+07 1.0
>> 0.0e+00 0.0e+00 0.0e+00  3 41  0  0  0   3 41  0  0  0 15231
>> -da_refine 2 MatSOR  1556 1.0 1.6883e+00 1.0 5.82e+08 1.0
>> 0.0e+00 0.0e+00 0.0e+00 11 44  0  0  0  11 44  0  0  0 22019
>> -da_refine 3 MatSOR  2240 1.0 1.4959e+01 1.0 5.51e+09 1.0
>> 0.0e+00 0.0e+00 0.0e+00 22 45  0  0  0  22 45  0  0  0 23571
>> -da_refine 4 MatSOR  2688 1.0 2.3942e+02 1.1 4.47e+10 1.0
>> 0.0e+00 0.0e+00 0.0e+00 36 45  0  0  0  36 45  0  0  0 11946
>>
>>
>> Haswell
>> -da_refine 1 MatSOR  1167 1.0 1.4839e-01 1.1 1.42e+08 1.0
>> 0.0e+00 0.0e+00 0.0e+00  3 42  0  0  0   3 42  0  0  0 30450
>> -da_refine 2 MatSOR  1532 1.0 2.9772e+00 1.0 1.17e+09 1.0
>> 0.0e+00 0.0e+00 0.0e+00 28 44  0  0  0  28 44  0  0  0 12539
>> -da_refine 3 MatSOR  1915 1.0 2.7142e+01 1.1 9.51e+09 1.0
>> 0.0e+00 0.0e+00 0.0e+00 45 45  0  0  0  45 45  0  0  0 11216
>> -da_refine 4 MatSOR  2262 1.0 2.2116e+02 1.1 7.56e+10 1.0
>> 0.0e+00 0.0e+00 0.0e+00 48 45  0  0  0  48 45  0  0  0 10936
>>
>
> SOR should track memory bandwidth, so it seems to me either
>
>   a) We fell out of MCDRAM
>
> or
>
>   b) We saturated the KNL node, but not the Haswell configuration
>
> I think these are all runs with identical parallelism, so its not b).
> Justin, did you tell it to fall back to DRAM, or fail?
>
>   Thanks,
>
> Matt
>
>
>
>> Hong (Mr.)
>>
>>
>> > KNL:
>> > MatSOR  2688 1.0 2.3942e+02 1.1 4.47e+10 1.0 0.0e+00
>> 0.0e+00 0.0e+00 36 45  0  0  0  36 45  0  0  0 11946
>> > KSPSolve   8 1.0 4.3837e+02 1.0 9.87e+10 1.0 1.5e+06
>> 8.8e+03 5.0e+03 68 99 98 61 98  68 99 98 61 98 14409
>> > SNESSolve  1 1.0 6.1583e+02 1.0 9.95e+10 1.0 1.6e+06
>> 1.4e+04 5.1e+03 96100100100 99  96100100100 99 10338
>> > SNESFunctionEval   9 1.0 3.8730e+01 1.0 0.00e+00 0.0 9.2e+03
>> 3.2e+04 0.0e+00  6  0  1  1  0   6  0  1  1  0 0
>> > SNESJacobianEval  40 1.0 1.5628e+02 1.0 0.00e+00 0.0 4.4e+04
>> 2.5e+05 1.4e+02 24  0  3 49  3  24  0  3 49  3 0
>> > PCSetUp   16 1.0 3.4525e+01 1.0 6.52e+07 1.0 2.8e+05
>> 1.0e+04 3.8e+03  5  0 18 13 74   5  0 18 13 74   119
>> > PCSetUpOnBlocks   60 1.0 9.5716e-01 1.1 1.41e+05 0.0 0.0e+00
>> 0.0e+00 0.0e+00  0  0  0

Re: [petsc-users] Configuring PETSc for KNL

2017-04-04 Thread Justin Chang
Attached are the job output files (which include -log_view) for SNES ex48
run on a single haswell and knl node (32 and 64 cores respectively).
Started off with a coarse grid of size 40x40x5 and ran three different
tests with -da_refine 1/2/3 and -pc_type mg

What's interesting/strange is that if i try to do -da_refine 4 on KNL, i
get a slurm error that says: "slurmstepd: error: Step 4408401.0 exceeded
memory limit (96737652 > 94371840), being killed" but it runs perfectly
fine on Haswell. Adding -pc_mg_levels 7 enables KNL to run on -da_refine 4
but the performance still does not beat out haswell.

The performance spectrum (dofs/sec) for 1-3 levels of refinement looks like
this:

Haswell:
2.416e+03
1.490e+04
5.188e+04

KNL:
9.308e+02
7.257e+03
3.838e+04

Which might suggest to me that KNL performs better with larger problem
sizes.

On Tue, Apr 4, 2017 at 11:05 AM, Matthew Knepley <knep...@gmail.com> wrote:

> On Tue, Apr 4, 2017 at 10:57 AM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Thanks everyone for the helpful advice. So I tried all the suggestions
>> including using libsci. The performance did not improve for my particular
>> runs, which I think suggests the problem parameters chosen for my tests
>> (SNES ex48) are not optimal for KNL. Does anyone have example test runs I
>> could reproduce that compare the performance between KNL and
>> Haswell/Ivybridge/etc?
>>
>
> Lets try to see what is going on with your existing data first.
>
> First, I think that main thing is to make sure we are using MCDRAM.
> Everything else in KNL
> is window dressing (IMHO). All we have to look at is something like MAXPY.
> You can get the
> bandwidth estimate from the flop rate and problem size (I think), and we
> can at least get
> bandwidth ratios between Haswell and KNL with that number.
>
>Matt
>
>
>> On Mon, Apr 3, 2017 at 3:06 PM Richard Mills <richardtmi...@gmail.com>
>> wrote:
>>
>>> Yes, one should rely on MKL (or Cray LibSci, if using the Cray
>>> toolchain) on Cori.  But I'm guessing that this will make no noticeable
>>> difference for what Justin is doing.
>>>
>>> --Richard
>>>
>>> On Mon, Apr 3, 2017 at 12:57 PM, murat keçeli <kec...@gmail.com> wrote:
>>>
>>> How about replacing --download-fblaslapack with vendor specific
>>> BLAS/LAPACK?
>>>
>>> Murat
>>>
>>> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills <richardtmi...@gmail.com>
>>> wrote:
>>>
>>> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong <hongzh...@anl.gov> wrote:
>>>
>>>
>>> On Apr 3, 2017, at 1:44 PM, Justin Chang <jychan...@gmail.com> wrote:
>>>
>>> Richard,
>>>
>>> This is what my job script looks like:
>>>
>>> #!/bin/bash
>>> #SBATCH -N 16
>>> #SBATCH -C knl,quad,flat
>>> #SBATCH -p regular
>>> #SBATCH -J knlflat1024
>>> #SBATCH -L SCRATCH
>>> #SBATCH -o knlflat1024.o%j
>>> #SBATCH --mail-type=ALL
>>> #SBATCH --mail-user=jychan...@gmail.com
>>> #SBATCH -t 00:20:00
>>>
>>> #run the application:
>>> cd $SCRATCH/Icesheet
>>> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori
>>> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N
>>> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine
>>> 1
>>>
>>>
>>> Maybe it is a typo. It should be numactl -m 1.
>>>
>>>
>>> "-p 1" will also work.  "-p" means to "prefer" NUMA node 1 (the MCDRAM),
>>> whereas "-m" means to use only NUMA node 1.  In the former case, MCDRAM
>>> will be used for allocations until the available memory there has been
>>> exhausted, and then things will spill over into the DRAM.  One would think
>>> that "-m" would be better for doing performance studies, but on systems
>>> where the nodes have swap space enabled, you can get terrible performance
>>> if your code's working set exceeds the size of the MCDRAM, as the system
>>> will obediently obey your wishes to not use the DRAM and go straight to the
>>> swap disk!  I assume the Cori nodes don't have swap space, though I could
>>> be wrong.
>>>
>>>
>>> According to the NERSC info pages, they say to add the "numactl" if
>>> using flat mode. Previously I tried cache mode but the performance seems to
>>> be unaffected.
>>>
>>>
>>> Using cache mode should give similar performance as using flat mode

Re: [petsc-users] Configuring PETSc for KNL

2017-04-04 Thread Justin Chang
Thanks everyone for the helpful advice. So I tried all the suggestions
including using libsci. The performance did not improve for my particular
runs, which I think suggests the problem parameters chosen for my tests
(SNES ex48) are not optimal for KNL. Does anyone have example test runs I
could reproduce that compare the performance between KNL and
Haswell/Ivybridge/etc?


On Mon, Apr 3, 2017 at 3:06 PM Richard Mills <richardtmi...@gmail.com>
wrote:

> Yes, one should rely on MKL (or Cray LibSci, if using the Cray toolchain)
> on Cori.  But I'm guessing that this will make no noticeable difference for
> what Justin is doing.
>
> --Richard
>
> On Mon, Apr 3, 2017 at 12:57 PM, murat keçeli <kec...@gmail.com> wrote:
>
> How about replacing --download-fblaslapack with vendor specific
> BLAS/LAPACK?
>
> Murat
>
> On Mon, Apr 3, 2017 at 2:45 PM, Richard Mills <richardtmi...@gmail.com>
> wrote:
>
> On Mon, Apr 3, 2017 at 12:24 PM, Zhang, Hong <hongzh...@anl.gov> wrote:
>
>
> On Apr 3, 2017, at 1:44 PM, Justin Chang <jychan...@gmail.com> wrote:
>
> Richard,
>
> This is what my job script looks like:
>
> #!/bin/bash
> #SBATCH -N 16
> #SBATCH -C knl,quad,flat
> #SBATCH -p regular
> #SBATCH -J knlflat1024
> #SBATCH -L SCRATCH
> #SBATCH -o knlflat1024.o%j
> #SBATCH --mail-type=ALL
> #SBATCH --mail-user=jychan...@gmail.com
> #SBATCH -t 00:20:00
>
> #run the application:
> cd $SCRATCH/Icesheet
> sbcast --compress=lz4 ./ex48cori /tmp/ex48cori
> srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N
> 128 -P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine
> 1
>
>
> Maybe it is a typo. It should be numactl -m 1.
>
>
> "-p 1" will also work.  "-p" means to "prefer" NUMA node 1 (the MCDRAM),
> whereas "-m" means to use only NUMA node 1.  In the former case, MCDRAM
> will be used for allocations until the available memory there has been
> exhausted, and then things will spill over into the DRAM.  One would think
> that "-m" would be better for doing performance studies, but on systems
> where the nodes have swap space enabled, you can get terrible performance
> if your code's working set exceeds the size of the MCDRAM, as the system
> will obediently obey your wishes to not use the DRAM and go straight to the
> swap disk!  I assume the Cori nodes don't have swap space, though I could
> be wrong.
>
>
> According to the NERSC info pages, they say to add the "numactl" if using
> flat mode. Previously I tried cache mode but the performance seems to be
> unaffected.
>
>
> Using cache mode should give similar performance as using flat mode with
> the numactl option. But both approaches should be significant faster than
> using flat mode without the numactl option. I usually see over 3X speedup.
> You can also do such comparison to see if the high-bandwidth memory is
> working properly.
>
> I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly
> 4-5x faster. Though I suspect this drastic change has much to do with the
> initial coarse grid size now being extremely small.
>
> I think you may be right about why you see such a big difference.  The KNL
> nodes need enough work to be able to use the SIMD lanes effectively.  Also,
> if your problem gets small enough, then it's going to be able to fit in the
> Haswell's L3 cache.  Although KNL has MCDRAM and this delivers *a lot* more
> memory bandwidth than the DDR4 memory, it will deliver a lot less bandwidth
> than the Haswell's L3.
>
> I'll give the COPTFLAGS a try and see what happens
>
>
> Make sure to use --with-memalign=64 for data alignment when configuring
> PETSc.
>
>
> Ah, yes, I forgot that.  Thanks for mentioning it, Hong!
>
>
> The option -xMIC-AVX512 would improve the vectorization performance. But
> it may cause problems for the MPIBAIJ format for some unknown reason.
> MPIAIJ should work fine with this option.
>
>
> Hmm.  Try both, and, if you see worse performance with MPIBAIJ, let us
> know and I'll try to figure this out.
>
> --Richard
>
>
>
> Hong (Mr.)
>
> Thanks,
> Justin
>
> On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills <richardtmi...@gmail.com>
> wrote:
>
> Hi Justin,
>
> How is the MCDRAM (on-package "high-bandwidth memory") configured for your
> KNL runs?  And if it is in "flat" mode, what are you doing to ensure that
> you use the MCDRAM?  Doing this wrong seems to be one of the most common
> reasons for unexpected poor performance on KNL.
>
> I'm not that familiar with the environment on Cori, but I think that if
> you are building 

Re: [petsc-users] Configuring PETSc for KNL

2017-04-03 Thread Justin Chang
Richard,

This is what my job script looks like:

#!/bin/bash
#SBATCH -N 16
#SBATCH -C knl,quad,flat
#SBATCH -p regular
#SBATCH -J knlflat1024
#SBATCH -L SCRATCH
#SBATCH -o knlflat1024.o%j
#SBATCH --mail-type=ALL
#SBATCH --mail-user=jychan...@gmail.com
#SBATCH -t 00:20:00

#run the application:
cd $SCRATCH/Icesheet
sbcast --compress=lz4 ./ex48cori /tmp/ex48cori
srun -n 1024 -c 4 --cpu_bind=cores numactl -p 1 /tmp/ex48cori -M 128 -N 128
-P 16 -thi_mat_type baij -pc_type mg -mg_coarse_pc_type gamg -da_refine 1

According to the NERSC info pages, they say to add the "numactl" if using
flat mode. Previously I tried cache mode but the performance seems to be
unaffected.

I also comparerd 256 haswell nodes vs 256 KNL nodes and haswell is nearly
4-5x faster. Though I suspect this drastic change has much to do with the
initial coarse grid size now being extremely small.

I'll give the COPTFLAGS a try and see what happens

Thanks,
Justin

On Mon, Apr 3, 2017 at 1:36 PM, Richard Mills <richardtmi...@gmail.com>
wrote:

> Hi Justin,
>
> How is the MCDRAM (on-package "high-bandwidth memory") configured for your
> KNL runs?  And if it is in "flat" mode, what are you doing to ensure that
> you use the MCDRAM?  Doing this wrong seems to be one of the most common
> reasons for unexpected poor performance on KNL.
>
> I'm not that familiar with the environment on Cori, but I think that if
> you are building for KNL, you should add "-xMIC-AVX512" to your compiler
> flags to explicitly instruct the compiler to use the AVX512 instruction
> set.  I usually use something along the lines of
>
>   'COPTFLAGS=-g -O3 -fp-model fast -xMIC-AVX512'
>
> (The "-g" just adds symbols, which make the output from performance
> profiling tools much more useful.)
>
> That said, I think that if you are comparing 1024 Haswell cores vs. 1024
> KNL cores (so double the number of Haswell nodes), I'm not surprised that
> the simulations are almost twice as fast using the Haswell nodes.  Keep in
> mind that individual KNL cores are much less powerful than an individual
> Haswell node.  You are also using roughly twice the power footprint (dual
> socket Haswell node should be roughly equivalent to a KNL node, I
> believe).  How do things look on when you compare equal nodes?
>
> Cheers,
> Richard
>
> On Mon, Apr 3, 2017 at 11:13 AM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> On NERSC's Cori I have the following configure options for PETSc:
>>
>> ./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0
>> --with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn
>> --with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1
>> COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt
>>
>> Where I swapped out the default Intel programming environment with that
>> of Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I
>> want to document the performance difference between Cori's Haswell and KNL
>> processors.
>>
>> When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and
>> 16 KNL nodes), the simulations are almost twice as fast on Haswell nodes.
>> Which leads me to suspect that I am not doing something right for KNL. Does
>> anyone know what are some "optimal" configure options for running PETSc on
>> KNL?
>>
>> Thanks,
>> Justin
>>
>
>


[petsc-users] Configuring PETSc for KNL

2017-04-03 Thread Justin Chang
Hi all,

On NERSC's Cori I have the following configure options for PETSc:

./configure --download-fblaslapack --with-cc=cc --with-clib-autodetect=0
--with-cxx=CC --with-cxxlib-autodetect=0 --with-debugging=0 --with-fc=ftn
--with-fortranlib-autodetect=0 --with-mpiexec=srun --with-64-bit-indices=1
COPTFLAGS=-O3 CXXOPTFLAGS=-O3 FOPTFLAGS=-O3 PETSC_ARCH=arch-cori-opt

Where I swapped out the default Intel programming environment with that of
Cray (e.g., 'module switch PrgEnv-intel/6.0.3 PrgEnv-cray/6.0.3'). I want
to document the performance difference between Cori's Haswell and KNL
processors.

When I run a PETSc example like SNES ex48 on 1024 cores (32 Haswell and 16
KNL nodes), the simulations are almost twice as fast on Haswell nodes.
Which leads me to suspect that I am not doing something right for KNL. Does
anyone know what are some "optimal" configure options for running PETSc on
KNL?

Thanks,
Justin


Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-04-03 Thread Justin Chang
So my makefile/script is slightly different from the tutorial directory.
Basically I have a shell for loop that runs the 'make runex48' four times
where -da_refine is increased each time. It showed Levels 1 0 then 2 1 0
because the job was in the middle of the loop, and I cancelled it halfway
when I realized it was returning errors as I didn't want to burn any
precious SU's :)

Anyway, I ended up using Edison with 16 cores/node and Cori/Haswell with 32
cores/node and got some nice numbers for 128x128x16 coarse grid. I am
however having issues with Cori/KNL, which I think has more to do with how
I configured PETSc and/or the job scripts.

On Mon, Apr 3, 2017 at 6:23 AM, Jed Brown  wrote:

> Matthew Knepley  writes:
> > I can't think why it would fail there, but DMDA really likes old numbers
> of
> > vertices, because it wants
> > to take every other point, 129 seems good. I will see if I can reproduce
> > once I get a chance.
>
> This problem uses periodic boundary conditions so even is right, but
> Justin only defines the coarsest grid and uses -da_refine so it should
> actually be irrelevant.
>


Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-04-03 Thread Justin Chang
I fixed the KNL issue - apparently "larger" jobs need to have the
executable copied into the /tmp directory to speedup loading/startup time
so I did that instead of executing the program via makefile.

On Mon, Apr 3, 2017 at 12:45 AM, Justin Chang <jychan...@gmail.com> wrote:

> So if I begin with a 128x128x8 grid on 1032 procs, it works fine for the
> first two levels of da_refine. However, on the third level I get this error:
>
> Level 3 domain size (m)1e+04 x1e+04 x1e+03, num elements 1024
> x 1024 x 57 (59768832), size (m) 9.76562 x 9.76562 x 17.8571
> Level 2 domain size (m)1e+04 x1e+04 x1e+03, num elements 512 x
> 512 x 29 (7602176), size (m) 19.5312 x 19.5312 x 35.7143
> Level 1 domain size (m)1e+04 x1e+04 x1e+03, num elements 256 x
> 256 x 15 (983040), size (m) 39.0625 x 39.0625 x 71.4286
> Level 0 domain size (m)1e+04 x1e+04 x1e+03, num elements 128 x
> 128 x 8 (131072), size (m) 78.125 x 78.125 x 142.857
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Petsc has generated inconsistent data
> [0]PETSC ERROR: Eigen estimator failed: DIVERGED_NANORINF at iteration 0
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Development GIT revision: v3.7.5-3418-ge372536  GIT
> Date: 2017-03-30 13:35:15 -0500
> [0]PETSC ERROR: /scratch2/scratchdirs/jychang/Icesheet/./ex48edison on a
> arch-edison-c-opt named nid00865 by jychang Sun Apr  2 21:44:44 2017
> [0]PETSC ERROR: Configure options --download-fblaslapack --with-cc=cc
> --with-clib-autodetect=0 --with-cxx=CC --with-cxxlib-autodetect=0
> --with-debugging=0 --with-fc=ftn --with-fortranlib-autodetect=0
> --with-mpiexec=srun --with-64-bit-indices=1 COPTFLAGS=-O3 CXXOPTFLAGS=-O3
> FOPTFLAGS=-O3 PETSC_ARCH=arch-edison-c-opt
> [0]PETSC ERROR: #1 KSPSolve_Chebyshev() line 380 in
> /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/cheby/cheby.c
> [0]PETSC ERROR: #2 KSPSolve() line 655 in /global/u1/j/jychang/Software/
> petsc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #3 PCMGMCycle_Private() line 19 in
> /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c
> [0]PETSC ERROR: #4 PCMGMCycle_Private() line 53 in
> /global/u1/j/jychang/Software/petsc/src/ksp/pc/impls/mg/mg.c
> [0]PETSC ERROR: #5 PCApply_MG() line 331 in /global/u1/j/jychang/Software/
> petsc/src/ksp/pc/impls/mg/mg.c
> [0]PETSC ERROR: #6 PCApply() line 458 in /global/u1/j/jychang/Software/
> petsc/src/ksp/pc/interface/precon.c
> [0]PETSC ERROR: #7 KSP_PCApply() line 251 in /global/homes/j/jychang/Softwa
> re/petsc/include/petsc/private/kspimpl.h
> [0]PETSC ERROR: #8 KSPInitialResidual() line 67 in
> /global/u1/j/jychang/Software/petsc/src/ksp/ksp/interface/itres.c
> [0]PETSC ERROR: #9 KSPSolve_GMRES() line 233 in
> /global/u1/j/jychang/Software/petsc/src/ksp/ksp/impls/gmres/gmres.c
> [0]PETSC ERROR: #10 KSPSolve() line 655 in /global/u1/j/jychang/Software/
> petsc/src/ksp/ksp/interface/itfunc.c
> [0]PETSC ERROR: #11 SNESSolve_NEWTONLS() line 224 in
> /global/u1/j/jychang/Software/petsc/src/snes/impls/ls/ls.c
> [0]PETSC ERROR: #12 SNESSolve() line 3967 in /global/u1/j/jychang/Software/
> petsc/src/snes/interface/snes.c
> [0]PETSC ERROR: #13 main() line 1548 in /scratch2/scratchdirs/jychang/
> Icesheet/ex48.c
> [0]PETSC ERROR: PETSc Option Table entries:
> [0]PETSC ERROR: -M 128
> [0]PETSC ERROR: -N 128
> [0]PETSC ERROR: -P 8
> [0]PETSC ERROR: -da_refine 3
> [0]PETSC ERROR: -mg_coarse_pc_type gamg
> [0]PETSC ERROR: -pc_mg_levels 4
> [0]PETSC ERROR: -pc_type mg
> [0]PETSC ERROR: -thi_mat_type baij
> [0]PETSC ERROR: End of Error Message ---send entire
> error message to petsc-ma...@mcs.anl.gov--
>
> If I changed the coarse grid to 129x129x8, no error whatsoever for up to 4
> levels of refinement.
>
> However, I am having trouble getting this started up on Cori's KNL...
>
> I am using a coarse grid 136x136x8 across 1088 cores, and slurm is simply
> cancelling the job. No other PETSc error was given. This is literally what
> my log files say:
>
> Level 1 domain size (m)1e+04 x1e+04 x1e+03, num elements 272 x
> 272 x 15 (1109760), size (m) 36.7647 x 36.7647 x 71.4286
> Level 0 domain size (m)1e+04 x1e+04 x1e+03, num elements 136 x
> 136 x 8 (147968), size (m) 73.5294 x 73.5294 x 142.857
> makefile:25: recipe for target 'runcori' failed
> Level 2 domain size (m)1e+04 x1e+04 x1e+03, num elements 544 x
> 544 x 29 (8582144), size (m) 18.3824 x 18.3824 x 35.7143
> Level 1 domain size (m)1e+04 x1e+04 x1e+03, num elements 272 x
> 272 x 15 (1109760), size (

Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-04-02 Thread Justin Chang
 660: Killed
srun: error: nid04139: task 539: Killed
srun: error: nid03873: task 63: Killed
srun: error: nid03960: task 170: Killed
srun: error: nid08164: task 821: Killed
srun: error: nid04139: task 507: Killed
srun: error: nid04005: task 299: Killed
srun: error: nid03960: tasks 136-169,171-198,200-201: Killed
srun: error: nid04005: task 310: Killed
srun: error: nid08166: task 1008: Killed
srun: error: nid04141: task 671: Killed
srun: error: nid03873: task 18: Killed
srun: error: nid04139: tasks 476-479,481-506,508-538,540-543: Killed
srun: error: nid04005: tasks 272-298,300-309,311-338: Killed
srun: error: nid04140: tasks 544-611: Killed
srun: error: nid04142: tasks 680-747: Killed
srun: error: nid04138: tasks 408-475: Killed
srun: error: nid04006: tasks 340-407: Killed
srun: error: nid08163: tasks 748-815: Killed
srun: error: nid08166: tasks 952-1007,1009-1019: Killed
srun: error: nid03873: tasks 0-2,4-17,19-31,33-62,64-67: Killed
srun: error: nid08165: tasks 884-951: Killed
srun: error: nid03883: tasks 68-135: Killed
srun: error: nid08164: tasks 816-820,822-883: Killed
srun: error: nid08167: tasks 1020-1087: Killed
srun: error: nid04141: tasks 612-659,661-670,672-679: Killed
srun: error: nid04004: tasks 204-263,265-271: Killed
make: [runcori] Error 137 (ignored)
[257]PETSC ERROR:

[257]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end
[257]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[257]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[257]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
X to find memory corruption errors
[257]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
and run
[257]PETSC ERROR: to get more information on the crash.
[878]PETSC ERROR:

[878]PETSC ERROR: Caught signal number 15 Terminate: Some process (or the
batch system) has told this process to end
[878]PETSC ERROR: Try option -start_in_debugger or -on_error_attach_debugger
[878]PETSC ERROR: or see
http://www.mcs.anl.gov/petsc/documentation/faq.html#valgrind
[878]PETSC ERROR: or try http://valgrind.org on GNU/linux and Apple Mac OS
X to find memory corruption errors
[878]PETSC ERROR: configure using --with-debugging=yes, recompile, link,
and run
[878]PETSC ERROR: to get more information on the crash.

[clipped]




my job script for KNL looks like this:

#!/bin/bash
#SBATCH -N 16
#SBATCH -C knl,quad,cache
#SBATCH -p regular
#SBATCH -J knl1024
#SBATCH -L SCRATCH
#SBATCH -o knl1088.o%j
#SBATCH -e knl1088.e%j
#SBATCH --mail-type=ALL
#SBATCH --mail-user=jychan...@gmail.com
#SBATCH -t 00:20:00

srun -n 1088 -c 4 --cpu_bind=cores ./ex48 

Any ideas why this is happening? Or do I need to contact the NERSC folks?

Thanks,
Justin

On Sun, Apr 2, 2017 at 2:15 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Sun, Apr 2, 2017 at 2:13 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>
>>
>> > On Apr 2, 2017, at 9:25 AM, Justin Chang <jychan...@gmail.com> wrote:
>> >
>> > Thanks guys,
>> >
>> > So I want to run SNES ex48 across 1032 processes on Edison, but I keep
>> getting segmentation violations. These are the parameters I am trying:
>> >
>> > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg
>> -thi_mat_type baij -mg_coarse_pc_type gamg
>> >
>> > The above works perfectly fine if I used 96 processes. I also tried to
>> use a finer coarse mesh on 1032 but the error persists.
>> >
>> > Any ideas why this is happening? What are the ideal parameters to use
>> if I want to use 1k+ cores?
>> >
>>
>>Hmm, one should never get segmentation violations. You should only get
>> not completely useful error messages about incompatible sizes etc. Send an
>> example of the segmentation violations. (I sure hope you are checking the
>> error return codes for all functions?).
>
>
> He is just running SNES ex48.
>
>   Matt
>
>
>>
>>   Barry
>>
>> > Thanks,
>> > Justin
>> >
>> > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith <bsm...@mcs.anl.gov>
>> wrote:
>> >
>> > > On Mar 31, 2017, at 10:00 AM, Jed Brown <j...@jedbrown.org> wrote:
>> > >
>> > > Justin Chang <jychan...@gmail.com> writes:
>> > >
>> > >> Yeah based on my experiments it seems setting pc_mg_levels to
>> $DAREFINE + 1
>> > >> has decent performance.
>> > >>
>> > >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In
>> some of
>> &

Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-04-02 Thread Justin Chang
It was sort of arbitrary. I want to conduct a performance spectrum
(dofs/sec) study where at least 1k processors are used on various HPC
machines (and hopefully one more case with 10k procs). Assuming all
available cores on these compute nodes (which I know is not the greatest
idea here), 1032 Ivybridge (24 cores/node) on Edison best matches Cori's
1024 Haswell (32 core/node).

How do I determine the shape of the DMDA? I am guessing the number of MPI
processes needs to be compatible with this?

Thanks,
Justin

On Sun, Apr 2, 2017 at 11:29 AM, Jed Brown <j...@jedbrown.org> wrote:

> Justin Chang <jychan...@gmail.com> writes:
>
> > Thanks guys,
> >
> > So I want to run SNES ex48 across 1032 processes on Edison,
>
> How did you decide on 1032 processes?  What shape did the DMDA produce?
> Of course this should work, but we didn't explicitly test that in the
> paper since we were running on BG/P.
>
>   https://github.com/jedbrown/tme-ice/tree/master/shaheen/b
>
> > but I keep getting segmentation violations. These are the parameters I
> > am trying:
> >
> > srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg
> > -thi_mat_type baij -mg_coarse_pc_type gamg
> >
> > The above works perfectly fine if I used 96 processes. I also tried to
> use
> > a finer coarse mesh on 1032 but the error persists.
> >
> > Any ideas why this is happening? What are the ideal parameters to use if
> I
> > want to use 1k+ cores?
> >
> > Thanks,
> > Justin
> >
> > On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith <bsm...@mcs.anl.gov>
> wrote:
> >
> >>
> >> > On Mar 31, 2017, at 10:00 AM, Jed Brown <j...@jedbrown.org> wrote:
> >> >
> >> > Justin Chang <jychan...@gmail.com> writes:
> >> >
> >> >> Yeah based on my experiments it seems setting pc_mg_levels to
> $DAREFINE
> >> + 1
> >> >> has decent performance.
> >> >>
> >> >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In
> >> some of
> >> >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/
> >> >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide
> 203/227)
> >> >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran
> this, it
> >> >> was almost twice as slow as if $MGLEVELS >= $DAREFINE
> >> >
> >> > Smaller coarse grids are generally more scalable -- when the problem
> >> > data is distributed, multigrid is a good solution algorithm.  But if
> >> > multigrid stops being effective because it is not preserving
> sufficient
> >> > coarse grid accuracy (e.g., for transport-dominated problems in
> >> > complicated domains) then you might want to stop early and use a more
> >> > robust method (like direct solves).
> >>
> >> Basically for symmetric positive definite operators you can make the
> >> coarse problem as small as you like (even 1 point) in theory. For
> >> indefinite and non-symmetric problems the theory says the "coarse grid
> must
> >> be sufficiently fine" (loosely speaking the coarse grid has to resolve
> the
> >> eigenmodes for the eigenvalues to the left of the x = 0).
> >>
> >> https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents
> >>
> >>
> >>
>


Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-04-02 Thread Justin Chang
Thanks guys,

So I want to run SNES ex48 across 1032 processes on Edison, but I keep
getting segmentation violations. These are the parameters I am trying:

srun -n 1032 -c 2 ./ex48 -M 80 -N 80 -P 9 -da_refine 1 -pc_type mg
-thi_mat_type baij -mg_coarse_pc_type gamg

The above works perfectly fine if I used 96 processes. I also tried to use
a finer coarse mesh on 1032 but the error persists.

Any ideas why this is happening? What are the ideal parameters to use if I
want to use 1k+ cores?

Thanks,
Justin

On Fri, Mar 31, 2017 at 12:47 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
> > On Mar 31, 2017, at 10:00 AM, Jed Brown <j...@jedbrown.org> wrote:
> >
> > Justin Chang <jychan...@gmail.com> writes:
> >
> >> Yeah based on my experiments it seems setting pc_mg_levels to $DAREFINE
> + 1
> >> has decent performance.
> >>
> >> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In
> some of
> >> the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/
> >> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide 203/227)
> >> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran this, it
> >> was almost twice as slow as if $MGLEVELS >= $DAREFINE
> >
> > Smaller coarse grids are generally more scalable -- when the problem
> > data is distributed, multigrid is a good solution algorithm.  But if
> > multigrid stops being effective because it is not preserving sufficient
> > coarse grid accuracy (e.g., for transport-dominated problems in
> > complicated domains) then you might want to stop early and use a more
> > robust method (like direct solves).
>
> Basically for symmetric positive definite operators you can make the
> coarse problem as small as you like (even 1 point) in theory. For
> indefinite and non-symmetric problems the theory says the "coarse grid must
> be sufficiently fine" (loosely speaking the coarse grid has to resolve the
> eigenmodes for the eigenvalues to the left of the x = 0).
>
> https://www.jstor.org/stable/2158375?seq=1#page_scan_tab_contents
>
>
>


Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-03-30 Thread Justin Chang
-dm_refine didn't work either

On Thu, Mar 30, 2017 at 4:21 PM, Matthew Knepley <knep...@gmail.com> wrote:

> I think its now -dm_refine
>
>Matt
>
> On Thu, Mar 30, 2017 at 4:17 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Also, this was the output before the error message:
>>
>> Level 0 domain size (m)1e+04 x1e+04 x1e+03, num elements 5 x
>> 5 x 3 (75), size (m) 2000. x 2000. x 500.
>> Level -1 domain size (m)1e+04 x1e+04 x1e+03, num elements 2 x
>> 2 x 2 (8), size (m) 5000. x 5000. x 1000.
>> Level -2 domain size (m)1e+04 x1e+04 x1e+03, num elements 1 x
>> 1 x 1 (1), size (m) 1. x 1. x inf.
>>
>> Which tells me '-da_refine 4' is not registering
>>
>> On Thu, Mar 30, 2017 at 4:15 PM, Justin Chang <jychan...@gmail.com>
>> wrote:
>>
>>> Okay I'll give it a shot.
>>>
>>> Somewhat unrelated, but I tried running this on Cori's Haswell node
>>> (loaded the module 'petsc/3.7.4-64'). But I get these errors:
>>>
>>> [0]PETSC ERROR: - Error Message
>>> --
>>> [0]PETSC ERROR: Argument out of range
>>> [0]PETSC ERROR: Partition in y direction is too fine! 0 1
>>> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
>>> for trouble shooting.
>>> [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016
>>> [0]PETSC ERROR: /global/u1/j/jychang/Icesheet/./ex48 on a
>>> arch-cori-opt64-INTEL-3.7.4-64 named nid00020 by jychang Thu Mar 30
>>> 14:04:35 2017
>>> [0]PETSC ERROR: Configure options --known-sizeof-void-p=8
>>> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
>>> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-size_t=8
>>> --known-mpi-int64_t=1 --known-has-attribute-aligned=1
>>> --prefix=/global/common/cori/software/petsc/3.7.4-64/hsw/intel
>>> PETSC_ARCH=arch-cori-opt64-INTEL-3.7.4-64 --COPTFLAGS="-mkl -O2 -no-ipo
>>> -g -axMIC-AVX512,CORE-AVX2,AVX" --CXXOPTFLAGS="-mkl -O2 -no-ipo -g
>>> -axMIC-AVX512,CORE-AVX2,AVX" --FOPTFLAGS="-mkl -O2 -no-ipo -g
>>> -axMIC-AVX512,CORE-AVX2,AVX" 
>>> --with-hdf5-dir=/opt/cray/pe/hdf5-parallel/1.8.16/INTEL/15.0
>>> --with-hwloc-dir=/global/common/cori/software/hwloc/1.11.4/hsw
>>> --with-scalapack-include=/opt/intel/compilers_and_libraries_2017.1.132/linux/mkl/include
>>> --with-scalapack-lib= --LIBS="-mkl 
>>> -L/global/common/cori/software/petsc/3.7.4-64/hsw/intel/lib
>>> -I/global/common/cori/software/petsc/3.7.4-64/hsw/intel/include
>>> -L/global/common/cori/software/xz/5.2.2/hsw/lib
>>> -I/global/common/cori/software/xz/5.2.2/hsw/include
>>> -L/global/common/cori/software/zlib/1.2.8/hsw/intel/lib
>>> -I/global/common/cori/software/zlib/1.2.8/hsw/intel/include
>>> -L/global/common/cori/software/libxml2/2.9.4/hsw/lib
>>> -I/global/common/cori/software/libxml2/2.9.4/hsw/include
>>> -L/global/common/cori/software/numactl/2.0.11/hsw/lib
>>> -I/global/common/cori/software/numactl/2.0.11/hsw/include
>>> -L/global/common/cori/software/hwloc/1.11.4/hsw/lib
>>> -I/global/common/cori/software/hwloc/1.11.4/hsw/include
>>> -L/global/common/cori/software/openssl/1.1.0a/hsw/lib
>>> -I/global/common/cori/software/openssl/1.1.0a/hsw/include
>>> -L/global/common/cori/software/subversion/1.9.4/hsw/lib
>>> -I/global/common/cori/software/subversion/1.9.4/hsw/include -lhwloc
>>> -lpciaccess -lxml2 -lz -llzma -Wl,--start-group
>>> /opt/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/
>>> intel64/libmkl_scalapack_lp64.a /opt/intel/compilers_and_libra
>>> ries_2017.1.132/linux/mkl/lib/intel64/libmkl_core.a
>>> /opt/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_intel_thread.a
>>> /opt/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/
>>> intel64/libmkl_blacs_intelmpi_lp64.a -Wl,--end-group -lstdc++"
>>> --download-parmetis --download-metis --with-ssl=0 --with-batch
>>> --known-mpi-shared-libraries=0 --with-clib-autodetect=0
>>> --with-cxxlib-autodetect=0 --with-debugging=0
>>> --with-fortranlib-autodetect=0 --with-mpiexec=srun
>>> --with-shared-libraries=0 --with-x=0 --known-mpi-int64-t=0
>>> --known-bits-per-byte=8 --known-sdot-returns-double=0
>>> --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0
>>> --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768
>>> --kn

Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-03-30 Thread Justin Chang
Also, this was the output before the error message:

Level 0 domain size (m)1e+04 x1e+04 x1e+03, num elements 5 x 5
x 3 (75), size (m) 2000. x 2000. x 500.
Level -1 domain size (m)1e+04 x1e+04 x1e+03, num elements 2 x 2
x 2 (8), size (m) 5000. x 5000. x 1000.
Level -2 domain size (m)1e+04 x1e+04 x1e+03, num elements 1 x 1
x 1 (1), size (m) 1. x 1. x inf.

Which tells me '-da_refine 4' is not registering

On Thu, Mar 30, 2017 at 4:15 PM, Justin Chang <jychan...@gmail.com> wrote:

> Okay I'll give it a shot.
>
> Somewhat unrelated, but I tried running this on Cori's Haswell node
> (loaded the module 'petsc/3.7.4-64'). But I get these errors:
>
> [0]PETSC ERROR: - Error Message
> --
> [0]PETSC ERROR: Argument out of range
> [0]PETSC ERROR: Partition in y direction is too fine! 0 1
> [0]PETSC ERROR: See http://www.mcs.anl.gov/petsc/documentation/faq.html
> for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.7.4, Oct, 02, 2016
> [0]PETSC ERROR: /global/u1/j/jychang/Icesheet/./ex48 on a
> arch-cori-opt64-INTEL-3.7.4-64 named nid00020 by jychang Thu Mar 30
> 14:04:35 2017
> [0]PETSC ERROR: Configure options --known-sizeof-void-p=8
> --known-sizeof-short=2 --known-sizeof-int=4 --known-sizeof-long=8
> --known-sizeof-long-long=8 --known-sizeof-float=4 --known-sizeof-size_t=8
> --known-mpi-int64_t=1 --known-has-attribute-aligned=1
> --prefix=/global/common/cori/software/petsc/3.7.4-64/hsw/intel
> PETSC_ARCH=arch-cori-opt64-INTEL-3.7.4-64 --COPTFLAGS="-mkl -O2 -no-ipo
> -g -axMIC-AVX512,CORE-AVX2,AVX" --CXXOPTFLAGS="-mkl -O2 -no-ipo -g
> -axMIC-AVX512,CORE-AVX2,AVX" --FOPTFLAGS="-mkl -O2 -no-ipo -g
> -axMIC-AVX512,CORE-AVX2,AVX" --with-hdf5-dir=/opt/cray/pe/
> hdf5-parallel/1.8.16/INTEL/15.0 --with-hwloc-dir=/global/
> common/cori/software/hwloc/1.11.4/hsw --with-scalapack-include=/opt/
> intel/compilers_and_libraries_2017.1.132/linux/mkl/include
> --with-scalapack-lib= --LIBS="-mkl -L/global/common/cori/
> software/petsc/3.7.4-64/hsw/intel/lib -I/global/common/cori/
> software/petsc/3.7.4-64/hsw/intel/include 
> -L/global/common/cori/software/xz/5.2.2/hsw/lib
> -I/global/common/cori/software/xz/5.2.2/hsw/include -L/global/common/cori/
> software/zlib/1.2.8/hsw/intel/lib -I/global/common/cori/
> software/zlib/1.2.8/hsw/intel/include 
> -L/global/common/cori/software/libxml2/2.9.4/hsw/lib
> -I/global/common/cori/software/libxml2/2.9.4/hsw/include
> -L/global/common/cori/software/numactl/2.0.11/hsw/lib
> -I/global/common/cori/software/numactl/2.0.11/hsw/include
> -L/global/common/cori/software/hwloc/1.11.4/hsw/lib -I/global/common/cori/
> software/hwloc/1.11.4/hsw/include -L/global/common/cori/
> software/openssl/1.1.0a/hsw/lib -I/global/common/cori/
> software/openssl/1.1.0a/hsw/include -L/global/common/cori/
> software/subversion/1.9.4/hsw/lib -I/global/common/cori/
> software/subversion/1.9.4/hsw/include -lhwloc -lpciaccess -lxml2 -lz
> -llzma -Wl,--start-group /opt/intel/compilers_and_
> libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_scalapack_lp64.a
> /opt/intel/compilers_and_libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_core.a
> /opt/intel/compilers_and_libraries_2017.1.132/linux/
> mkl/lib/intel64/libmkl_intel_thread.a /opt/intel/compilers_and_
> libraries_2017.1.132/linux/mkl/lib/intel64/libmkl_blacs_intelmpi_lp64.a
> -Wl,--end-group -lstdc++" --download-parmetis --download-metis --with-ssl=0
> --with-batch --known-mpi-shared-libraries=0 --with-clib-autodetect=0
> --with-cxxlib-autodetect=0 --with-debugging=0
> --with-fortranlib-autodetect=0 --with-mpiexec=srun
> --with-shared-libraries=0 --with-x=0 --known-mpi-int64-t=0
> --known-bits-per-byte=8 --known-sdot-returns-double=0
> --known-snrm2-returns-double=0 --known-level1-dcache-assoc=0
> --known-level1-dcache-linesize=32 --known-level1-dcache-size=32768
> --known-memcmp-ok=1 --known-mpi-c-double-complex=1
> --known-mpi-long-double=1 --known-sizeof-MPI_Comm=4
> --known-sizeof-MPI_Fint=4 --known-sizeof-char=1 --known-sizeof-double=8
> CC=cc MPICC=cc CXX=CC MPICXX=CC FC=ftn F77=ftn F90=ftn MPIF90=ftn
> MPIF77=ftn CFLAGS=-axMIC-AVX512,CORE-AVX2,AVX 
> CXXFLAGS=-axMIC-AVX512,CORE-AVX2,AVX
> FFLAGS=-axMIC-AVX512,CORE-AVX2,AVX CC=cc MPICC=cc CXX=CC MPICXX=CC FC=ftn
> F77=ftn F90=ftn MPIF90=ftn MPIF77=ftn CFLAGS=-fPIC FFLAGS=-fPIC
> LDFLAGS=-fPIE --download-hypre --with-64-bit-indices
> [0]PETSC ERROR: #1 DMSetUp_DA_3D() line 298 in
> /global/cscratch1/sd/swowner/sleak/petsc-3.7.4/src/dm/impls/da/da3.c
> [0]PETSC ERROR: #2 DMSetUp_DA() line 27 in /global/cscratch1/sd/swowner/
> sleak/petsc-3.7.4/src/dm/impls/da/dareg.c
> [0]PETSC ERROR: #3 DMSetUp() line 744 in /gl

Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-03-30 Thread Justin Chang
olve() line 4005 in
/global/cscratch1/sd/swowner/sleak/petsc-3.7.4/src/snes/interface/snes.c
[0]PETSC ERROR: #12 main() line 1548 in
/global/homes/j/jychang/Icesheet/ex48.c
[0]PETSC ERROR: PETSc Option Table entries:
[0]PETSC ERROR: -da_refine 4
[0]PETSC ERROR: -ksp_rtol 1e-7
[0]PETSC ERROR: -M 5
[0]PETSC ERROR: -N 5
[0]PETSC ERROR: -P 3
[0]PETSC ERROR: -pc_mg_levels 5
[0]PETSC ERROR: -pc_type mg
[0]PETSC ERROR: -thi_mat_type baij
[0]PETSC ERROR: End of Error Message ---send entire
error message to petsc-ma...@mcs.anl.gov--
Rank 0 [Thu Mar 30 14:04:35 2017] [c0-0c0s5n0] application called
MPI_Abort(MPI_COMM_WORLD, 63) - process 0
srun: error: nid00020: task 0: Aborted
srun: Terminating job step 4363145.1z

it seems to me the PETSc from this module is not registering the
'-da_refine' entry. This is strange because I have no issue with this on
the latest petsc-dev version, anyone know about this error and/or why it
happens?

On Thu, Mar 30, 2017 at 3:39 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Thu, Mar 30, 2017 at 3:38 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Okay, got it. What are the options for setting GAMG as the coarse solver?
>>
>
> -mg_coarse_pc_type gamg I think
>
>
>> On Thu, Mar 30, 2017 at 3:37 PM, Matthew Knepley <knep...@gmail.com>
>> wrote:
>>
>>> On Thu, Mar 30, 2017 at 3:04 PM, Justin Chang <jychan...@gmail.com>
>>> wrote:
>>>
>>>> Yeah based on my experiments it seems setting pc_mg_levels to $DAREFINE
>>>> + 1 has decent performance.
>>>>
>>>> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In
>>>> some of the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/
>>>> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide
>>>> 203/227) they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran
>>>> this, it was almost twice as slow as if $MGLEVELS >= $DAREFINE
>>>>
>>>
>>> Depending on how big the initial grid is, you may want this. There is a
>>> balance between coarse grid and fine grid work.
>>>
>>>
>>>> 2) So I understand that one cannot refine further than one grid point
>>>> in each direction, but is there any benefit to having $MGLEVELS > $DAREFINE
>>>> by a lot?
>>>>
>>>
>>> Again, it depends on the size of the initial grid.
>>>
>>> On really large problems, you want to use GAMG as the coarse solver,
>>> which will move the problem onto a smaller number of nodes
>>> so that you can coarsen further.
>>>
>>>Matt
>>>
>>>
>>>> Thanks,
>>>> Justin
>>>>
>>>> On Thu, Mar 30, 2017 at 2:35 PM, Barry Smith <bsm...@mcs.anl.gov>
>>>> wrote:
>>>>
>>>>>
>>>>>   -da_refine $DAREFINE  determines how large the final problem will be.
>>>>>
>>>>>   By default if you don't supply pc_mg_levels then it uses $DAREFINE +
>>>>> 1 as the number of levels of MG to use; for example -da_refine 1 would
>>>>> result in 2 levels of multigrid.
>>>>>
>>>>>
>>>>> > On Mar 30, 2017, at 2:17 PM, Justin Chang <jychan...@gmail.com>
>>>>> wrote:
>>>>> >
>>>>> > Hi all,
>>>>> >
>>>>> > Just a general conceptual question: say I am tinkering around with
>>>>> SNES ex48.c and am running the program with these options:
>>>>> >
>>>>> > mpirun -n $NPROCS -pc_type mg -M $XSEED -N $YSEED -P $ZSEED
>>>>> -thi_mat_type baij -da_refine $DAREFINE -pc_mg_levels $MGLEVELS
>>>>> >
>>>>> > I am not too familiar with mg, but it seems to me there is a very
>>>>> strong correlation between $MGLEVELS and $DAREFINE as well as perhaps even
>>>>> the initial coarse grid size (provided by $X/YZSEED).
>>>>> >
>>>>> > Is there a rule of thumb on how these parameters should be? I am
>>>>> guessing it probably is also hardware/architectural dependent?
>>>>> >
>>>>> > Thanks,
>>>>> > Justin
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-03-30 Thread Justin Chang
Okay, got it. What are the options for setting GAMG as the coarse solver?

On Thu, Mar 30, 2017 at 3:37 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Thu, Mar 30, 2017 at 3:04 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Yeah based on my experiments it seems setting pc_mg_levels to $DAREFINE +
>> 1 has decent performance.
>>
>> 1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In some
>> of the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/
>> petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide 203/227)
>> they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran this, it
>> was almost twice as slow as if $MGLEVELS >= $DAREFINE
>>
>
> Depending on how big the initial grid is, you may want this. There is a
> balance between coarse grid and fine grid work.
>
>
>> 2) So I understand that one cannot refine further than one grid point in
>> each direction, but is there any benefit to having $MGLEVELS > $DAREFINE by
>> a lot?
>>
>
> Again, it depends on the size of the initial grid.
>
> On really large problems, you want to use GAMG as the coarse solver, which
> will move the problem onto a smaller number of nodes
> so that you can coarsen further.
>
>Matt
>
>
>> Thanks,
>> Justin
>>
>> On Thu, Mar 30, 2017 at 2:35 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
>>
>>>
>>>   -da_refine $DAREFINE  determines how large the final problem will be.
>>>
>>>   By default if you don't supply pc_mg_levels then it uses $DAREFINE + 1
>>> as the number of levels of MG to use; for example -da_refine 1 would result
>>> in 2 levels of multigrid.
>>>
>>>
>>> > On Mar 30, 2017, at 2:17 PM, Justin Chang <jychan...@gmail.com> wrote:
>>> >
>>> > Hi all,
>>> >
>>> > Just a general conceptual question: say I am tinkering around with
>>> SNES ex48.c and am running the program with these options:
>>> >
>>> > mpirun -n $NPROCS -pc_type mg -M $XSEED -N $YSEED -P $ZSEED
>>> -thi_mat_type baij -da_refine $DAREFINE -pc_mg_levels $MGLEVELS
>>> >
>>> > I am not too familiar with mg, but it seems to me there is a very
>>> strong correlation between $MGLEVELS and $DAREFINE as well as perhaps even
>>> the initial coarse grid size (provided by $X/YZSEED).
>>> >
>>> > Is there a rule of thumb on how these parameters should be? I am
>>> guessing it probably is also hardware/architectural dependent?
>>> >
>>> > Thanks,
>>> > Justin
>>>
>>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] Correlation between da_refine and pg_mg_levels

2017-03-30 Thread Justin Chang
Yeah based on my experiments it seems setting pc_mg_levels to $DAREFINE + 1
has decent performance.

1) is there ever a case where you'd want $MGLEVELS <= $DAREFINE? In some of
the PETSc tutorial slides (e.g., http://www.mcs.anl.gov/
petsc/documentation/tutorials/TutorialCEMRACS2016.pdf on slide 203/227)
they say to use $MGLEVELS = 4 and $DAREFINE = 5, but when I ran this, it
was almost twice as slow as if $MGLEVELS >= $DAREFINE

2) So I understand that one cannot refine further than one grid point in
each direction, but is there any benefit to having $MGLEVELS > $DAREFINE by
a lot?

Thanks,
Justin

On Thu, Mar 30, 2017 at 2:35 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>   -da_refine $DAREFINE  determines how large the final problem will be.
>
>   By default if you don't supply pc_mg_levels then it uses $DAREFINE + 1
> as the number of levels of MG to use; for example -da_refine 1 would result
> in 2 levels of multigrid.
>
>
> > On Mar 30, 2017, at 2:17 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Just a general conceptual question: say I am tinkering around with SNES
> ex48.c and am running the program with these options:
> >
> > mpirun -n $NPROCS -pc_type mg -M $XSEED -N $YSEED -P $ZSEED
> -thi_mat_type baij -da_refine $DAREFINE -pc_mg_levels $MGLEVELS
> >
> > I am not too familiar with mg, but it seems to me there is a very strong
> correlation between $MGLEVELS and $DAREFINE as well as perhaps even the
> initial coarse grid size (provided by $X/YZSEED).
> >
> > Is there a rule of thumb on how these parameters should be? I am
> guessing it probably is also hardware/architectural dependent?
> >
> > Thanks,
> > Justin
>
>


[petsc-users] Correlation between da_refine and pg_mg_levels

2017-03-30 Thread Justin Chang
Hi all,

Just a general conceptual question: say I am tinkering around with SNES
ex48.c and am running the program with these options:

mpirun -n $NPROCS -pc_type mg -M $XSEED -N $YSEED -P $ZSEED -thi_mat_type
baij -da_refine $DAREFINE -pc_mg_levels $MGLEVELS

I am not too familiar with mg, but it seems to me there is a very strong
correlation between $MGLEVELS and $DAREFINE as well as perhaps even the
initial coarse grid size (provided by $X/YZSEED).

Is there a rule of thumb on how these parameters should be? I am guessing
it probably is also hardware/architectural dependent?

Thanks,
Justin


Re: [petsc-users] GAMG huge hash being requested

2017-02-20 Thread Justin Chang
Okay thanks

On Sun, Feb 19, 2017 at 2:32 PM, Lawrence Mitchell <
lawrence.mitch...@imperial.ac.uk> wrote:

>
>
> > On 19 Feb 2017, at 18:55, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Okay, it doesn't seem like the Firedrake fork (which is what I am using)
> has this latest fix. Lawrence, when do you think it's possible you folks
> can incorporate these fixes
>
> I'll fast forward our branch pointer on Monday.
>
> Lawrence
>


Re: [petsc-users] GAMG huge hash being requested

2017-02-19 Thread Justin Chang
Okay, it doesn't seem like the Firedrake fork (which is what I am using)
has this latest fix. Lawrence, when do you think it's possible you folks
can incorporate these fixes?

On Sun, Feb 19, 2017 at 8:56 AM, Matthew Knepley <knep...@gmail.com> wrote:

> Satish fixed this error. I believe the fix is now in master.
>
>   Thanks,
>
>  Matt
>
> On Sun, Feb 19, 2017 at 3:05 AM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> So I am attempting to employ the DG1 finite element method on the poisson
>> equation using GAMG. When I attempt to solve a problem with roughly 4
>> million DOFs across 20 cores, i get this error:
>>
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> _solve_varproblem(*args, **kwargs)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 152, in _solve_varproblem
>> solver.solve()
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/variational_solver.py",
>> line 220, in solve
>> self.snes.solve(None, v)
>>   File "PETSc/SNES.pyx", line 537, in petsc4py.PETSc.SNES.solve
>> (src/petsc4py.PETSc.c:172359)
>> petsc4py.PETSc.Error: error code 63
>> [ 6] SNESSolve() line 4128 in /tmp/pip-FNpsya-build/src/snes
>> /interface/snes.c
>> [ 6] SNESSolve_KSPONLY() line 40 in /tmp/pip-FNpsya-build/src/snes
>> /impls/ksponly/ksponly.c
>> [ 6] KSPSolve() line 620 in /tmp/pip-FNpsya-build/src/ksp/
>> ksp/interface/itfunc.c
>> [ 6] KSPSetUp() line 393 in /tmp/pip-FNpsya-build/src/ksp/
>> ksp/interface/itfunc.c
>> [ 6] PCSetUp() line 968 in /tmp/pip-FNpsya-build/src/ksp/
>> pc/interface/precon.c
>> [ 6] PCSetUp_GAMG() line 524 in /tmp/pip-FNpsya-build/src/ksp/
>> pc/impls/gamg/gamg.c
>> [ 6] PCGAMGCoarsen_AGG() line 955 in /tmp/pip-FNpsya-build/src/ksp/
>> pc/impls/gamg/agg.c
>> [ 6] MatTransposeMatMult() line 9962 in /tmp/pip-FNpsya-build/src/mat/
>> interface/matrix.c
>> [ 6] MatTransposeMatMult_MPIAIJ_MPIAIJ() line 902 in
>> /tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
>> [ 6] MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() line 1676 in
>> /tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
>> [ 6] PetscTableCreate() line 52 in /tmp/pip-FNpsya-build/src/sys/
>> utils/ctable.c
>> [ 6] PetscTableCreateHashSize() line 28 in /tmp/pip-FNpsya-build/src/sys/
>> utils/ctable.c
>> [ 6] Argument out of range
>> [ 6] A really huge hash is being requested.. cannot process: 4096000
>> 
>> --
>> MPI_ABORT was invoked on rank 6 in communicator MPI_COMM_WORLD
>> with errorcode 1.
>>
>> NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
>> You may or may not see output from other processes, depending on
>> exactly when Open MPI kills them.
>> 
>> --
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> Traceback (most recent call last):
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>>   File "pFiredrake.py", line 109, in 
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>> _solve_varproblem(*args, **kwargs)
>> solve(a==L,solution,options_prefix='fe_',solver_parameters=
>> solver_params)
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 152, in _solve_varproblem
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
>>   File 
>> "/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
>> line 122, in solve
>> Traceback (most recent call last):
>>   File "pFiredrake.py", line 109, in 
&g

[petsc-users] GAMG huge hash being requested

2017-02-19 Thread Justin Chang
Hi all,

So I am attempting to employ the DG1 finite element method on the poisson
equation using GAMG. When I attempt to solve a problem with roughly 4
million DOFs across 20 cores, i get this error:

Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 

solve(a==L,solution,options_prefix='fe_',solver_parameters=solver_params)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 122, in solve
_solve_varproblem(*args, **kwargs)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 152, in _solve_varproblem
solver.solve()
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/variational_solver.py",
line 220, in solve
self.snes.solve(None, v)
  File "PETSc/SNES.pyx", line 537, in petsc4py.PETSc.SNES.solve
(src/petsc4py.PETSc.c:172359)
petsc4py.PETSc.Error: error code 63
[ 6] SNESSolve() line 4128 in
/tmp/pip-FNpsya-build/src/snes/interface/snes.c
[ 6] SNESSolve_KSPONLY() line 40 in
/tmp/pip-FNpsya-build/src/snes/impls/ksponly/ksponly.c
[ 6] KSPSolve() line 620 in
/tmp/pip-FNpsya-build/src/ksp/ksp/interface/itfunc.c
[ 6] KSPSetUp() line 393 in
/tmp/pip-FNpsya-build/src/ksp/ksp/interface/itfunc.c
[ 6] PCSetUp() line 968 in
/tmp/pip-FNpsya-build/src/ksp/pc/interface/precon.c
[ 6] PCSetUp_GAMG() line 524 in
/tmp/pip-FNpsya-build/src/ksp/pc/impls/gamg/gamg.c
[ 6] PCGAMGCoarsen_AGG() line 955 in
/tmp/pip-FNpsya-build/src/ksp/pc/impls/gamg/agg.c
[ 6] MatTransposeMatMult() line 9962 in
/tmp/pip-FNpsya-build/src/mat/interface/matrix.c
[ 6] MatTransposeMatMult_MPIAIJ_MPIAIJ() line 902 in
/tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
[ 6] MatTransposeMatMultSymbolic_MPIAIJ_MPIAIJ() line 1676 in
/tmp/pip-FNpsya-build/src/mat/impls/aij/mpi/mpimatmatmult.c
[ 6] PetscTableCreate() line 52 in
/tmp/pip-FNpsya-build/src/sys/utils/ctable.c
[ 6] PetscTableCreateHashSize() line 28 in
/tmp/pip-FNpsya-build/src/sys/utils/ctable.c
[ 6] Argument out of range
[ 6] A really huge hash is being requested.. cannot process: 4096000
--
MPI_ABORT was invoked on rank 6 in communicator MPI_COMM_WORLD
with errorcode 1.

NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--
Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 
Traceback (most recent call last):
Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 
  File "pFiredrake.py", line 109, in 

solve(a==L,solution,options_prefix='fe_',solver_parameters=solver_params)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 122, in solve

solve(a==L,solution,options_prefix='fe_',solver_parameters=solver_params)
Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 
Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 
_solve_varproblem(*args, **kwargs)

solve(a==L,solution,options_prefix='fe_',solver_parameters=solver_params)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 152, in _solve_varproblem
Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 122, in solve
Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 
Traceback (most recent call last):
  File "pFiredrake.py", line 109, in 

solve(a==L,solution,options_prefix='fe_',solver_parameters=solver_params)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 122, in solve

solve(a==L,solution,options_prefix='fe_',solver_parameters=solver_params)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 122, in solve
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 122, in solve

solve(a==L,solution,options_prefix='fe_',solver_parameters=solver_params)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 122, in solve
_solve_varproblem(*args, **kwargs)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 152, in _solve_varproblem
_solve_varproblem(*args, **kwargs)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 152, in _solve_varproblem
_solve_varproblem(*args, **kwargs)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 152, in _solve_varproblem
_solve_varproblem(*args, **kwargs)
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/solving.py",
line 152, in _solve_varproblem
solver.solve()
  File
"/home/jchang23/Software/firedrake/src/firedrake/firedrake/variational_solver.py",
line 220, in solve
_solve_varproblem(*args, **kwargs)
  File

Re: [petsc-users] Use previous solution when encountering "DIVERGED_LINE_SEARCH"

2016-11-30 Thread Justin Chang
Also, -snes_stol does not seem to work for vinewtonrsls?

On Wed, Nov 30, 2016 at 8:54 PM, Justin Chang <jychan...@gmail.com> wrote:

> Generally speaking, yes. But this is VIRS where I am enforcing maximum
> principles. From what I have seen, the residual typically exhibit this
> behavior if there are very few violations in the first place.
>
> How would I "terminate on stagnation"?
>
> On Wed, Nov 30, 2016 at 8:48 PM, Matthew Knepley <knep...@gmail.com>
> wrote:
>
>> On Wed, Nov 30, 2016 at 8:38 PM, Justin Chang <jychan...@gmail.com>
>> wrote:
>>
>>> By manually terminating I meant setting -snes_max_it to 5 if I know 
>>> DIVERGED_LINE_SEARCH
>>> occurs after 6 iterations. In a transient simulation I cannot do this
>>>
>>
>> Let me elaborate. I mean that using DIVERGED_LINE_SEARCH as an indication
>> of convergence is dicey. It may be that in the problem
>> you looked at this was true, but I see no reason to believe its true in
>> general. Why does your residual quit decreasing?
>>
>>   Thanks,
>>
>>  Matt
>>
>>
>>> On Wed, Nov 30, 2016 at 8:35 PM, Matthew Knepley <knep...@gmail.com>
>>> wrote:
>>>
>>>> On Wed, Nov 30, 2016 at 8:27 PM, Justin Chang <jychan...@gmail.com>
>>>> wrote:
>>>>
>>>>> Hi all,
>>>>>
>>>>> I am running some transient simulations using SNESVINEWTONRSLS. At
>>>>> certain timesteps, I get a "DIVERGED_LINE_SEARCH" which essentially
>>>>> "resets" my solution to zero and messes everything up. I notice that this
>>>>> happens when the SNES Function norm no longer decreases, and if I were to
>>>>> manually terminate the solver right before the final iteration I get the
>>>>> answer I want.
>>>>>
>>>>
>>>> Yes, its possible, however isn't that a dangerous way to terminate?
>>>> Couldn't you terminate on stagnation?
>>>>
>>>>   Thanks,
>>>>
>>>>  Matt
>>>>
>>>>
>>>>> Is there a way to "detect" this error and use the solution from the
>>>>> previous non-failing iteration? Setting a fixed maximum iteration doesn't
>>>>> seem reasonble because every time level will require different numbers of
>>>>> iterations to converge.
>>>>>
>>>>> Thanks,
>>>>> Justin
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> What most experimenters take for granted before they begin their
>>>> experiments is infinitely more interesting than any results to which their
>>>> experiments lead.
>>>> -- Norbert Wiener
>>>>
>>>
>>>
>>
>>
>> --
>> What most experimenters take for granted before they begin their
>> experiments is infinitely more interesting than any results to which their
>> experiments lead.
>> -- Norbert Wiener
>>
>
>


Re: [petsc-users] Use previous solution when encountering "DIVERGED_LINE_SEARCH"

2016-11-30 Thread Justin Chang
Generally speaking, yes. But this is VIRS where I am enforcing maximum
principles. From what I have seen, the residual typically exhibit this
behavior if there are very few violations in the first place.

How would I "terminate on stagnation"?

On Wed, Nov 30, 2016 at 8:48 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Nov 30, 2016 at 8:38 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> By manually terminating I meant setting -snes_max_it to 5 if I know 
>> DIVERGED_LINE_SEARCH
>> occurs after 6 iterations. In a transient simulation I cannot do this
>>
>
> Let me elaborate. I mean that using DIVERGED_LINE_SEARCH as an indication
> of convergence is dicey. It may be that in the problem
> you looked at this was true, but I see no reason to believe its true in
> general. Why does your residual quit decreasing?
>
>   Thanks,
>
>  Matt
>
>
>> On Wed, Nov 30, 2016 at 8:35 PM, Matthew Knepley <knep...@gmail.com>
>> wrote:
>>
>>> On Wed, Nov 30, 2016 at 8:27 PM, Justin Chang <jychan...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> I am running some transient simulations using SNESVINEWTONRSLS. At
>>>> certain timesteps, I get a "DIVERGED_LINE_SEARCH" which essentially
>>>> "resets" my solution to zero and messes everything up. I notice that this
>>>> happens when the SNES Function norm no longer decreases, and if I were to
>>>> manually terminate the solver right before the final iteration I get the
>>>> answer I want.
>>>>
>>>
>>> Yes, its possible, however isn't that a dangerous way to terminate?
>>> Couldn't you terminate on stagnation?
>>>
>>>   Thanks,
>>>
>>>  Matt
>>>
>>>
>>>> Is there a way to "detect" this error and use the solution from the
>>>> previous non-failing iteration? Setting a fixed maximum iteration doesn't
>>>> seem reasonble because every time level will require different numbers of
>>>> iterations to converge.
>>>>
>>>> Thanks,
>>>> Justin
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] Use previous solution when encountering "DIVERGED_LINE_SEARCH"

2016-11-30 Thread Justin Chang
By manually terminating I meant setting -snes_max_it to 5 if I know
DIVERGED_LINE_SEARCH
occurs after 6 iterations. In a transient simulation I cannot do this

On Wed, Nov 30, 2016 at 8:35 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Nov 30, 2016 at 8:27 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am running some transient simulations using SNESVINEWTONRSLS. At
>> certain timesteps, I get a "DIVERGED_LINE_SEARCH" which essentially
>> "resets" my solution to zero and messes everything up. I notice that this
>> happens when the SNES Function norm no longer decreases, and if I were to
>> manually terminate the solver right before the final iteration I get the
>> answer I want.
>>
>
> Yes, its possible, however isn't that a dangerous way to terminate?
> Couldn't you terminate on stagnation?
>
>   Thanks,
>
>  Matt
>
>
>> Is there a way to "detect" this error and use the solution from the
>> previous non-failing iteration? Setting a fixed maximum iteration doesn't
>> seem reasonble because every time level will require different numbers of
>> iterations to converge.
>>
>> Thanks,
>> Justin
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


[petsc-users] Use previous solution when encountering "DIVERGED_LINE_SEARCH"

2016-11-30 Thread Justin Chang
Hi all,

I am running some transient simulations using SNESVINEWTONRSLS. At certain
timesteps, I get a "DIVERGED_LINE_SEARCH" which essentially "resets" my
solution to zero and messes everything up. I notice that this happens when
the SNES Function norm no longer decreases, and if I were to manually
terminate the solver right before the final iteration I get the answer I
want.

Is there a way to "detect" this error and use the solution from the
previous non-failing iteration? Setting a fixed maximum iteration doesn't
seem reasonble because every time level will require different numbers of
iterations to converge.

Thanks,
Justin


Re: [petsc-users] pcfieldsplitting not working for VI

2016-11-28 Thread Justin Chang
FWIW I ended up doing this:

rs_solver.setDM(W._dm())

and was left with this error:

Traceback (most recent call last):
  File "sphere.py", line 180, in 
virs()
  File "sphere.py", line 163, in virs
rs_solver.solve(None,sol_vec)
  File "PETSc/SNES.pyx", line 537, in petsc4py.PETSc.SNES.solve
(src/petsc4py.PETSc.c:170012)
petsc4py.PETSc.Error: error code 75
[0] SNESSolve() line 4060 in
/private/var/folders/92/1kh0g4kn2z50fnwsmf8s269wgn/T/pip-gYEj0p-build/src/snes/interface/snes.c
[0] SNESSolve_VINEWTONRSLS() line 424 in
/private/var/folders/92/1kh0g4kn2z50fnwsmf8s269wgn/T/pip-gYEj0p-build/src/snes/impls/vi/rs/virs.c
[0] MatGetSubMatrix() line 7929 in
/private/var/folders/92/1kh0g4kn2z50fnwsmf8s269wgn/T/pip-gYEj0p-build/src/mat/interface/matrix.c
[0] MatGetSubMatrix_Nest() line 451 in
/private/var/folders/92/1kh0g4kn2z50fnwsmf8s269wgn/T/pip-gYEj0p-build/src/mat/impls/nest/matnest.c
[0] MatNestFindSubMat() line 424 in
/private/var/folders/92/1kh0g4kn2z50fnwsmf8s269wgn/T/pip-gYEj0p-build/src/mat/impls/nest/matnest.c
[0] MatNestFindIS() line 363 in
/private/var/folders/92/1kh0g4kn2z50fnwsmf8s269wgn/T/pip-gYEj0p-build/src/mat/impls/nest/matnest.c
[0] Arguments are incompatible
[0] Could not find index set

I must be doing something wrong here.

Justin

On Tue, Nov 29, 2016 at 1:24 AM, Justin Chang <jychan...@gmail.com> wrote:

> Lawrence,
>
> I added the following line:
>
> rs_solver.setDM(a._dm)
>
> And it gives this error:
>
> Traceback (most recent call last):
>   File "sphere.py", line 128, in 
> rs_solver.setDM(a._dm)
> AttributeError: 'Form' object has no attribute '_dm'
>
> Thanks,
> Justin
>
> On Tue, Nov 29, 2016 at 1:12 AM, Lawrence Mitchell <
> lawrence.mitch...@imperial.ac.uk> wrote:
>
>>
>>
>> > On 29 Nov 2016, at 02:02, Justin Chang <jychan...@gmail.com> wrote:
>> >
>> > Why is this happening? It seems to me VINEWTONRSLS is not understanding
>> that I have a two-field formulation.
>>
>> For the built in solvers, we set up a dmshell on the snes that knows how
>> to do field splitting.  Because you set your snes up by hand, it isn't
>> available. You'll need to call snes.setDM. The mixed finctionspace provides
>> the correct dm via a ._dm property I think.
>>
>> Lawrence
>>
>
>


Re: [petsc-users] pcfieldsplitting not working for VI

2016-11-28 Thread Justin Chang
Lawrence,

I added the following line:

rs_solver.setDM(a._dm)

And it gives this error:

Traceback (most recent call last):
  File "sphere.py", line 128, in 
rs_solver.setDM(a._dm)
AttributeError: 'Form' object has no attribute '_dm'

Thanks,
Justin

On Tue, Nov 29, 2016 at 1:12 AM, Lawrence Mitchell <
lawrence.mitch...@imperial.ac.uk> wrote:

>
>
> > On 29 Nov 2016, at 02:02, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Why is this happening? It seems to me VINEWTONRSLS is not understanding
> that I have a two-field formulation.
>
> For the built in solvers, we set up a dmshell on the snes that knows how
> to do field splitting.  Because you set your snes up by hand, it isn't
> available. You'll need to call snes.setDM. The mixed finctionspace provides
> the correct dm via a ._dm property I think.
>
> Lawrence
>


Re: [petsc-users] question

2016-10-24 Thread Justin Chang
Sorry forgot to hit reply all

On Monday, October 24, 2016, Justin Chang <jychan...@gmail.com> wrote:

> It depends on your SNES solver. A SNES iteration could involve more than
> one function evaluation (e.g., line searching). Also, -snes_monitor may say
> 3 iterations whereas -snes_view might indicate 4 function evaluations which
> could suggest that the first call was for computing the initial residual.
>
> On Mon, Oct 24, 2016 at 2:22 PM, Gideon Simpson <gideon.simp...@gmail.com
> <javascript:_e(%7B%7D,'cvml','gideon.simp...@gmail.com');>> wrote:
>
>> I notice that if I use -snes_view,
>>
>> I see lines like:
>>   total number of linear solver iterations=20
>>   total number of function evaluations=5
>> Just to clarify, the number of "function evaluations" corresponds to the
>> number of Newton (or Newton like) steps, and the total "number of linear
>> solver iterations” is the total number of iterations needed to solve the
>> linear problem at each Newton iteration.  Is that correct?  So in the
>> above, there are 5 steps of Newton and a total of 20 iterations of the
>> linear solver across all 5 Newton steps.
>>
>> -gideon
>>
>>
>


Re: [petsc-users] Looking for a quick example of a symmetric KKT system

2016-10-21 Thread Justin Chang
Something like this?

http://www.mcs.anl.gov/petsc/petsc-current/src/tao/constrained/examples/tutorials/toy.c.html

On Friday, October 21, 2016, Patrick Sanan  wrote:

> Are there any examples already in PETSc or TAO that assemble such a
> system (which could thus be dumped)? SNES example ex73f90t assembles a
> non-symmetric KKT system.
>


Re: [petsc-users] DG within DMPlex

2016-10-03 Thread Justin Chang
Advection-diffusion equations. Perhaps SNES ex12 could be modified to
include an advection term?

On Mon, Oct 3, 2016 at 9:52 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
> > On Oct 3, 2016, at 9:45 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > I am just saying the poission problem as an example since that is one of
> the simpler PDEs out there and already exists.
>
>   Sometimes an example for the wrong approach is worse than no example.
> Can you suggest a simple example where Discontinuous Galerkin makes good
> sense instead of when it may not make sense?
>
>Barry
>
> >
> > On Mon, Oct 3, 2016 at 9:28 PM, Matthew Knepley <knep...@gmail.com>
> wrote:
> > On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang <jychan...@gmail.com>
> wrote:
> > Hi all,
> >
> > Is there, or will there be, support for implementing Discontinuous
> Galerkin formulations within the DMPlex framework? I think it would be nice
> to have something such as the SIPG formulation for the poisson problem in
> SNES ex12.c
> >
> > We will have a trial DG in PETSc shortly. However, I don't think DG
> methods make much sense for elliptic
> > problems. Why would I use it there?
> >
> >   Thanks,
> >
> > Matt
> >
> > Thanks,
> > Justin
> > --
> > What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> > -- Norbert Wiener
> >
>
>


Re: [petsc-users] DG within DMPlex

2016-10-03 Thread Justin Chang
I am just saying the poission problem as an example since that is one of
the simpler PDEs out there and already exists.

On Mon, Oct 3, 2016 at 9:28 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Mon, Oct 3, 2016 at 6:36 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> Is there, or will there be, support for implementing Discontinuous
>> Galerkin formulations within the DMPlex framework? I think it would be nice
>> to have something such as the SIPG formulation for the poisson problem in
>> SNES ex12.c
>>
>
> We will have a trial DG in PETSc shortly. However, I don't think DG
> methods make much sense for elliptic
> problems. Why would I use it there?
>
>   Thanks,
>
> Matt
>
>
>> Thanks,
>> Justin
>>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


[petsc-users] DG within DMPlex

2016-10-03 Thread Justin Chang
Hi all,

Is there, or will there be, support for implementing Discontinuous Galerkin
formulations within the DMPlex framework? I think it would be nice to have
something such as the SIPG formulation for the poisson problem in SNES
ex12.c

Thanks,
Justin


Re: [petsc-users] PETSc parallel scalability

2016-09-07 Thread Justin Chang
Lots of topics to discuss here...

- 315,342 unknowns is a very small problem. The PETSc gurus require at
minimum 10,000 unknowns per process
 for
the computation time to outweigh communication time (although 20,000
unknowns or more is preferable). So when using 32 MPI processes and more,
you're going to have ~10k unknowns or less so that's one reason why you're
going to see less speedup.

- Another reason you get poor parallel scalability is that PETSc is limited
by the memory-bandwidth. Meaning you have to use the optimal number of
cores per compute node or whatever it is you're running on. The PETSc gurus
talk about this issue in depth
. So not
only do you need proper MPI process bindings, but it is likely that you
will not want to saturate all available cores on a single node (the STREAMS
Benchmark can tell you this). In other words, 16 cores spread across 2
nodes is going to outperform 16 cores on 1 node.

- If operations like MatMult are not scaling, this is likely due to the
memory bandwidth limitations. If operations like VecDot or VecNorm are not
scaling, this is likely due to the network latency between compute nodes.

- What kind of problem are you solving? CG/BJacobi is a mediocre
solver/preconditioner combination, and solver iterations will increase with
MPI processes if your tolerances are too lax. You can try using CG with any
of the multi-grid preconditioners like GAMG if you have something nice like
the poission equation.

- The best way to improve parallel performance is to make your code really
inefficient and crappy.

- And most importantly, always send -log_view if you want people to help
identify performance related issues with your application :)

Justin

On Wed, Sep 7, 2016 at 8:37 PM, Jinlei Shen  wrote:

> Hi,
>
> I am trying to test the parallel scalablity of iterative solver (CG with
> BJacobi preconditioner) in PETSc.
>
> Since the iteration number increases with more processors, I calculated
> the single iteration time by dividing the total KSPSolve time by number of
> iteration in this test.
>
> The linear system I'm solving has 315342 unknowns. Only KSPSolve cost is
> analyzed.
>
> The results show that the parallelism works well with small number of
> processes (less than 32 in my case), and is almost perfect parallel within
> first 10 processors.
>
> However, the effect of parallelization degrades if I use more processors.
> The wired thing is that with more than 100 processors, the single iteration
> cost is slightly increasing.
>
> To investigate this issue, I then looked into the composition of KSPSolve
> time.
> It seems KSPSolve consists of MatMult, 
> VecTDot(min),VecNorm(min),VecAXPY(max),VecAXPX(max),ApplyPC.
> Please correct me if I'm wrong.
>
> And I found for small number of processors, all these components scale
> well.
> However, using more processors(roughly larger than 40), MatMult,
> VecTDot(min),VecNorm(min) behaves worse, and even increasing after 100
> processors, while the other three parts parallel well even for 1000
> processors.
> Since MatMult composed major cost in single iteration, the total single
> iteration cost increases as well.(See the below figure).
>
> My question:
> 1. Is such situation reasonable? Could anyone explain why MatMult scales
> poor after certain number of processors? I heard some about different
> algorithms for matrix multiplication. Is that the bottleneck?
>
> 2. Is the parallelism dependent of matrix size? If I use larger system
> size,e.g. million , can the solver scale better?
>
> 3. Do you have any idea to improve the parallel performance?
>
> Thank you very much.
>
> JInlei
>
> [image: Inline image 1]
>
>
>


[petsc-users] Coloring of -mat_view draw

2016-09-05 Thread Justin Chang
Hi all,

So i used the following command-line options to view the non-zero structure
of my assembled matrix:

-mat_view draw -draw_pause -1

And I got an image filled with cyan squares and dots. However, if I
right-click on the image a couple times, I now get cyan, blue, and red
squares. What do the different colors mean? Attached are the images of the
two cases:

Thanks,
Justin


Re: [petsc-users] strong-scaling vs weak-scaling

2016-08-31 Thread Justin Chang
Matt,

So is the "solve phase" going to be KSPSolve() - PCSetUp()?

In other words, if I want to look at time/iterations, should it just be
over KSPSolve or should I exclude the PC setup?

Justin


On Wed, Aug 31, 2016 at 5:13 AM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Aug 31, 2016 at 2:01 AM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Attached is the -log_view output (from firedrake). Event Stage 1:
>> Linear_solver is where I assemble and solve the linear system of equations.
>>
>> I am using the HYPRE BoomerAMG preconditioner so log_view cannot "see
>> into" the exact steps, but based on what it can see, how do I distinguish
>> between these various setup and timing phases?
>>
>> For example, when I look at these lines:
>>
>> PCSetUp1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  9  0  0  0  0  11  0  0  0  0 0
>> PCApply   38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00 56  0  0  0  0  66  0  0  0  0 0
>> KSPSetUp   1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
>> 0.0e+00  0  0  0  0  0   0  0  0  0  0 0
>> KSPSolve   1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00
>> 0.0e+00 70  7  0  0  0  82  7  0  0  0   139
>> SNESSolve  1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00 84100  0  0  0  99100  0  0  0  1781
>> SNESFunctionEval   1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00  4 29  0  0  0   5 29  0  0  0  9954
>> SNESJacobianEval   1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00
>> 0.0e+00 10 65  0  0  0  12 65  0  0  0  9937
>>
>> So how do I break down "mesh setup", "matrix setup", and "solve time"
>> phases? I am guessing "PCSetUp" has to do with one of the first two phases,
>> but how would I categorize the rest of the events? I see that HYPRE doesn't
>> have as much information as the other PCs like GAMG and ML but can one
>> still breakdown the timing phases through log_view alone?
>>
>
> 1) It looks like you call PCSetUp() yourself, since otherwise KSPSetUp()
> would contain that time. Notice that you can ignore KSPSetUp() here.
>
> 2) The setup time is usually KSPSetUp(), but if here you add to it
> PCSetUp() since you called it.
>
> 3) The solve time for SNES can be split into
>
>   a) KSPSolve() for the update calculation
>
>   b) SNESFunctionEval, SNESJacobianEval for everything else (conv check,
> line search, J calc, etc.) or you can just take SNESSolve() - KSPSolve()
>
>  4) Note that PCApply() is most of KSPSolve(), which is generally good
>
>   Thanks,
>
>  Matt
>
>
>> Thanks,
>> Justin
>>
>> On Tue, Aug 30, 2016 at 11:14 PM, Jed Brown <j...@jedbrown.org> wrote:
>>
>>> Mark Adams <mfad...@lbl.gov> writes:
>>>
>>> >>
>>> >>
>>> >> Anyway, what I really wanted to say is, it's good to know that these
>>> >> "dynamic range/performance spectrum/static scaling" plots are
>>> designed to
>>> >> go past the sweet spots. I also agree that it would be interesting to
>>> see a
>>> >> time vs dofs*iterations/time plot. Would it then also be useful to
>>> look at
>>> >> the step to setting up the preconditioner?
>>> >>
>>> >>
>>> > Yes, I generally split up timing between "mesh setup" (symbolic
>>> > factorization of LU), "matrix setup" (eg, factorizations), and solve
>>> time.
>>> > The degree of amortization that you get for the two setup phases
>>> depends on
>>> > your problem and so it is useful to separate them.
>>>
>>> Right, there is nothing wrong with splitting up the phases, but if you
>>> never show a spectrum for the total, then I will be suspicious.  And if
>>> you only show "per iteration" instead of for a complete solve, then I
>>> will assume that you're only doing that because convergence is unusably
>>> slow.
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] strong-scaling vs weak-scaling

2016-08-31 Thread Justin Chang
Attached is the -log_view output (from firedrake). Event Stage 1:
Linear_solver is where I assemble and solve the linear system of equations.

I am using the HYPRE BoomerAMG preconditioner so log_view cannot "see into"
the exact steps, but based on what it can see, how do I distinguish between
these various setup and timing phases?

For example, when I look at these lines:

PCSetUp1 1.0 2.2858e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  9  0  0  0  0  11  0  0  0  0 0
PCApply   38 1.0 1.4102e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 56  0  0  0  0  66  0  0  0  0 0
KSPSetUp   1 1.0 9.9111e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00  0  0  0  0  0   0  0  0  0  0 0
KSPSolve   1 1.0 1.7529e+01 1.0 2.44e+09 1.0 0.0e+00 0.0e+00
0.0e+00 70  7  0  0  0  82  7  0  0  0   139
SNESSolve  1 1.0 2.1056e+01 1.0 3.75e+10 1.0 0.0e+00 0.0e+00
0.0e+00 84100  0  0  0  99100  0  0  0  1781
SNESFunctionEval   1 1.0 1.0763e+00 1.0 1.07e+10 1.0 0.0e+00 0.0e+00
0.0e+00  4 29  0  0  0   5 29  0  0  0  9954
SNESJacobianEval   1 1.0 2.4495e+00 1.0 2.43e+10 1.0 0.0e+00 0.0e+00
0.0e+00 10 65  0  0  0  12 65  0  0  0  9937

So how do I break down "mesh setup", "matrix setup", and "solve time"
phases? I am guessing "PCSetUp" has to do with one of the first two phases,
but how would I categorize the rest of the events? I see that HYPRE doesn't
have as much information as the other PCs like GAMG and ML but can one
still breakdown the timing phases through log_view alone?

Thanks,
Justin

On Tue, Aug 30, 2016 at 11:14 PM, Jed Brown  wrote:

> Mark Adams  writes:
>
> >>
> >>
> >> Anyway, what I really wanted to say is, it's good to know that these
> >> "dynamic range/performance spectrum/static scaling" plots are designed
> to
> >> go past the sweet spots. I also agree that it would be interesting to
> see a
> >> time vs dofs*iterations/time plot. Would it then also be useful to look
> at
> >> the step to setting up the preconditioner?
> >>
> >>
> > Yes, I generally split up timing between "mesh setup" (symbolic
> > factorization of LU), "matrix setup" (eg, factorizations), and solve
> time.
> > The degree of amortization that you get for the two setup phases depends
> on
> > your problem and so it is useful to separate them.
>
> Right, there is nothing wrong with splitting up the phases, but if you
> never show a spectrum for the total, then I will be suspicious.  And if
> you only show "per iteration" instead of for a complete solve, then I
> will assume that you're only doing that because convergence is unusably
> slow.
>
Residual norms for linear_ solve.
0 KSP Residual norm 5.261660052036e+02 
1 KSP Residual norm 1.356995663739e+02 
2 KSP Residual norm 4.098866223191e+01 
3 KSP Residual norm 1.600475709119e+01 
4 KSP Residual norm 6.956667251063e+00 
5 KSP Residual norm 3.861942754258e+00 
6 KSP Residual norm 2.331981130299e+00 
7 KSP Residual norm 1.404876311943e+00 
8 KSP Residual norm 8.215556397889e-01 
9 KSP Residual norm 5.226439657305e-01 
   10 KSP Residual norm 3.421520551962e-01 
   11 KSP Residual norm 2.382992002722e-01 
   12 KSP Residual norm 1.743249670147e-01 
   13 KSP Residual norm 1.277911689618e-01 
   14 KSP Residual norm 9.453802371730e-02 
   15 KSP Residual norm 7.022732618304e-02 
   16 KSP Residual norm 5.276835142527e-02 
   17 KSP Residual norm 3.966717849679e-02 
   18 KSP Residual norm 2.987708356527e-02 
   19 KSP Residual norm 2.221046390150e-02 
   20 KSP Residual norm 1.631262945106e-02 
   21 KSP Residual norm 1.188030506469e-02 
   22 KSP Residual norm 8.655984108945e-03 
   23 KSP Residual norm 6.239072936196e-03 
   24 KSP Residual norm 4.455419528387e-03 
   25 KSP Residual norm 3.235023376588e-03 
   26 KSP Residual norm 2.345588803418e-03 
   27 KSP Residual norm 1.668600898579e-03 
   28 KSP Residual norm 1.180578845647e-03 
   29 KSP Residual norm 8.327223711005e-04 
   30 KSP Residual norm 5.853054571413e-04 
   31 KSP Residual norm 4.038722556707e-04 
   32 KSP Residual norm 2.731786184181e-04 
   33 KSP Residual norm 1.853188978548e-04 
   34 KSP Residual norm 1.277834040044e-04 
   35 KSP Residual norm 8.853670330190e-05 
   36 KSP Residual norm 6.151569062192e-05 
   37 KSP Residual norm 4.247283089736e-05 
  Linear linear_ solve converged due to CONVERGED_RTOL iterations 37
Wall-clock time: 2.126e+01 seconds

*** WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document***


-- PETSc Performance Summary: 
--

3D_ex1.py on a 

Re: [petsc-users] strong-scaling vs weak-scaling

2016-08-30 Thread Justin Chang
Thanks everyone. I still think there is a even better phrase for this,
like, static scaling? Because unlike strong/weak scaling, concurrency is
fixed (hence "static") and we only scale the problem, so this is a mix
between strong and weak scaling.

¯\_(ツ)_/¯

Anyway, what I really wanted to say is, it's good to know that these
"dynamic range/performance spectrum/static scaling" plots are designed to
go past the sweet spots. I also agree that it would be interesting to see a
time vs dofs*iterations/time plot. Would it then also be useful to look at
the step to setting up the preconditioner?

Justin

On Tue, Aug 30, 2016 at 1:55 PM, Jed Brown  wrote:

> Mark Adams  writes:
> > I would guess it is the latter.
>
> In this case, definitely.
>
> > It is hard to get "rollover" to the right.  You could get it on KNL
> > (cache configuration of HBM) when you spill out of HBM.
>
> Yes, but the same occurs if you start repeatedly spilling from some
> level of cache, which can happen even if the overall data structure is
> much larger than cache.  Not all algorithms have the flexibility to
> choose tile sizes independently from problem size and specification;
> it's easy to forget that this luxury is not universal when focusing on
> dense linear algebra, for example.
>
> > Personally, if you are you going to go into this much detail (eg, more
> than
> > just one plot) I would show a plot of iteration count vs problem size,
> and
> > be done with it, and then fix the iteration count for the weak scaling
> and
> > dynamic range plot (I agree we could use a better name).
>
> Alternatively, plot the performance spectrum (dynamic range) for the
> end-to-end solve and per iteration.  The end user ultimately doesn't
> care about the cost per iteration (and it's meaningless when comparing
> to an algorithm that converges differently), so I'd prefer that the
> spectrum for the end-to-end application always be shown.
>


[petsc-users] GMRES restart guidelines

2016-08-26 Thread Justin Chang
Hi all,

When exactly would one need to modify the default restart value of 30 for
GMRES? I have seen papers like this

which suggest playing around with values between 1 and 30, but are there
cases where people need to use large restart values like 50 or 100?


Thanks,
Justin


Re: [petsc-users] strong-scaling vs weak-scaling

2016-08-23 Thread Justin Chang
Redid some of those experiments for 8 and 20 cores and scaled it up to even
larger problems. Attached is the plot.

Looking at this "dynamic plot" (if you ask me, I honestly think there could
be a better word for this out there), the lines curve up for the smaller
problems, have a "flat line" in the middle, then slowly tail down as the
problem gets bigger. I am guessing these downward curves have to do with
either memory bandwidth effects or simply the solver requiring more effort
to handle larger problems (or a combination of both). I currently only have
access to a small 80 node (20 cores per node) HPC cluster so obviously I am
unable to experiment with 10k cores or more.

If our goal is to see how close flat the lines get, we can easily game the
system by scaling the problem until we find the "sweet spot(s)". In the
weak-scaling and strong-scaling studies there are perfect lines we can
compare to, but there does not seem to be such lines for this type of study
even in the seemingly flat regions. Seems these plots are useful if we
simply compare different solvers/preconditioners/etc or different HPC
platforms.

Also, the solver count iteration increases with problem size - it went from
9 KSP iterations for 1,331 dofs to 48 KSP iterations for 3,442,951 dofs.
Algorithmic time-to-solution is not linearly proportional to problem size
so the RHS of the graph is obviously going to have lower N/time rates at
some point - similar to what we observe from weak-scaling.

Also, the N/time rate seems very similar to the floating-point rate,
although I can see why it's more informative.

Any thoughts on anything I said or did thus far? Just wanting to make sure
I understand these correctly :)

On Mon, Aug 22, 2016 at 9:03 PM, Justin Chang <jychan...@gmail.com> wrote:

> Thanks all. So this issue was one of our ATPESC2015 exam questions, and
> turned some friends into foes. Most eventually fell into the strong-scale
> is harder camp, but some of these "friends" also believed PETSc is *not*
> capable of handling dense matrices and is not portable. Just wanted to hear
> some expert opinions on this :)
>
> Anyway, in one of my applications, I am comparing the performance of some
> VI solvers (i.e., with variable bounds) with that of just standard linear
> solves (i.e., no variable bounds) for 3D advection-diffusion equations in
> highly heterogeneous and anisotropic porous media. The parallel efficiency
> in the strong-sense is roughly the same but the parallel efficiency in the
> weak-sense is significantly worse for VI solvers. I suppose one inference
> that can be made is that VI solvers take longer to solver as the problem
> size grows. And yes solver iteration counts do grow so that has some to do
> with it.
>
> As for these "dynamic range" plots, I tried something like this across 1
> and 8 MPI processes with the following problem sizes for a 3D anisotropic
> diffusion problem with CG/BoomerAMG:
>
> 1,331
> 9,261
> 29,791
> 68,921
> 132,651
> 226,981
> 357,911
> 531,441
> 753,571
> 1,030,301
>
> Using a single Intel Xeon E5-2670 compute node for this. Attached is the
> plot, but instead of flat or incline lines, i get concave down curves. If
> my problem size gets too big, the N/time rate decreases, whereas for very
> small problems it increases. I am guessing bandwidth limitation have
> something to do with the decrease in performance. In that HPGMG
> presentation you attached the other day, it seems the rate should decrease
> as problem size decreases. Perhaps this study should be done with more MPI
> processes?
>
>
> On Mon, Aug 22, 2016 at 4:14 PM, Karl Rupp <r...@iue.tuwien.ac.at> wrote:
>
>> Hi Justin,
>>
>>
>> I have seen some people claim that strong-scaling is harder to achieve
>>> than weak scaling
>>> (e.g., https://www.sharcnet.ca/help/index.php/Measuring_Parallel_Sc
>>> aling_Performance)
>>> and generally speaking it makes sense - communication overhead increases
>>> with concurrency.
>>>
>>> However, we know that most PETSc solvers/applications are not only
>>> memory-bandwidth bound, but may not scale as well w.r.t. problem size as
>>> other solvers (e.g., ILU(0) may beat out GAMG for small elliptic
>>> problems but GAMG will eventually beat out ILU(0) for larger problems),
>>> so wouldn't weak-scaling not only be the more interesting but more
>>> difficult performance metric to achieve? Strong-scaling issues arise
>>> mostly from communication overhead but weak-scaling issues may come from
>>> that and also solver/algorithmic scalability w.r.t problem size (e.g.,
>>> problem size N takes 10*T seconds to compute but problem size 2*N takes
>>>

Re: [petsc-users] strong-scaling vs weak-scaling

2016-08-22 Thread Justin Chang
Thanks all. So this issue was one of our ATPESC2015 exam questions, and
turned some friends into foes. Most eventually fell into the strong-scale
is harder camp, but some of these "friends" also believed PETSc is *not*
capable of handling dense matrices and is not portable. Just wanted to hear
some expert opinions on this :)

Anyway, in one of my applications, I am comparing the performance of some
VI solvers (i.e., with variable bounds) with that of just standard linear
solves (i.e., no variable bounds) for 3D advection-diffusion equations in
highly heterogeneous and anisotropic porous media. The parallel efficiency
in the strong-sense is roughly the same but the parallel efficiency in the
weak-sense is significantly worse for VI solvers. I suppose one inference
that can be made is that VI solvers take longer to solver as the problem
size grows. And yes solver iteration counts do grow so that has some to do
with it.

As for these "dynamic range" plots, I tried something like this across 1
and 8 MPI processes with the following problem sizes for a 3D anisotropic
diffusion problem with CG/BoomerAMG:

1,331
9,261
29,791
68,921
132,651
226,981
357,911
531,441
753,571
1,030,301

Using a single Intel Xeon E5-2670 compute node for this. Attached is the
plot, but instead of flat or incline lines, i get concave down curves. If
my problem size gets too big, the N/time rate decreases, whereas for very
small problems it increases. I am guessing bandwidth limitation have
something to do with the decrease in performance. In that HPGMG
presentation you attached the other day, it seems the rate should decrease
as problem size decreases. Perhaps this study should be done with more MPI
processes?


On Mon, Aug 22, 2016 at 4:14 PM, Karl Rupp  wrote:

> Hi Justin,
>
>
> I have seen some people claim that strong-scaling is harder to achieve
>> than weak scaling
>> (e.g., https://www.sharcnet.ca/help/index.php/Measuring_Parallel_Sc
>> aling_Performance)
>> and generally speaking it makes sense - communication overhead increases
>> with concurrency.
>>
>> However, we know that most PETSc solvers/applications are not only
>> memory-bandwidth bound, but may not scale as well w.r.t. problem size as
>> other solvers (e.g., ILU(0) may beat out GAMG for small elliptic
>> problems but GAMG will eventually beat out ILU(0) for larger problems),
>> so wouldn't weak-scaling not only be the more interesting but more
>> difficult performance metric to achieve? Strong-scaling issues arise
>> mostly from communication overhead but weak-scaling issues may come from
>> that and also solver/algorithmic scalability w.r.t problem size (e.g.,
>> problem size N takes 10*T seconds to compute but problem size 2*N takes
>> 50*T seconds to compute).
>>
>> In other words, if one were to propose or design a new algorithm/solver
>> capable of handling large-scale problems, would it be equally if not
>> more important to show the weak-scaling potential? Because if you really
>> think about it, a "truly efficient" algorithm will be less likely to
>> scale in the strong sense but computation time will be close to linearly
>> proportional to problem size (hence better scaling in the weak-sense).
>> It seems if I am trying to convince someone that a proposed
>> computational framework is "high performing" without getting too deep
>> into performance modeling, a poor parallel efficiency (arising due to
>> good sequential efficiency) in the strong sense may not look promising.
>>
>
> These are all valid thoughts. Let me add another perspective: If you are
> only interested in the machine aspects of scaling, you could run for a
> fixed number of solver iterations. That allows you to focus on the actual
> computational work done and your results will exclusively reflect the
> machine's performance. Thus, even though fixing solver iterations and thus
> not running solvers to convergence is a bad shortcut from the solver point
> of view, it can be a handy way of eliminating algorithmic fluctuations.
> (Clearly, this simplistic approach has not only been used, but also
> abused...)
>
> Best regards,
> Karli
>
>


example_dynamic_range.eps
Description: PostScript document


[petsc-users] strong-scaling vs weak-scaling

2016-08-21 Thread Justin Chang
Hi all,

This may or may not be a PETSc specific question but...

I have seen some people claim that strong-scaling is harder to achieve than
weak scaling (e.g.,
https://www.sharcnet.ca/help/index.php/Measuring_Parallel_Scaling_Performance)
and generally speaking it makes sense - communication overhead increases
with concurrency.

However, we know that most PETSc solvers/applications are not only
memory-bandwidth bound, but may not scale as well w.r.t. problem size as
other solvers (e.g., ILU(0) may beat out GAMG for small elliptic problems
but GAMG will eventually beat out ILU(0) for larger problems), so wouldn't
weak-scaling not only be the more interesting but more difficult
performance metric to achieve? Strong-scaling issues arise mostly from
communication overhead but weak-scaling issues may come from that and also
solver/algorithmic scalability w.r.t problem size (e.g., problem size N
takes 10*T seconds to compute but problem size 2*N takes 50*T seconds to
compute).

In other words, if one were to propose or design a new algorithm/solver
capable of handling large-scale problems, would it be equally if not more
important to show the weak-scaling potential? Because if you really think
about it, a "truly efficient" algorithm will be less likely to scale in the
strong sense but computation time will be close to linearly proportional to
problem size (hence better scaling in the weak-sense). It seems if I am
trying to convince someone that a proposed computational framework is "high
performing" without getting too deep into performance modeling, a poor
parallel efficiency (arising due to good sequential efficiency) in the
strong sense may not look promising.

Thanks,
Justin


Re: [petsc-users] What exactly goes into DMPlexSetRefinementLimit

2016-08-17 Thread Justin Chang
Ah I didn't realize that in DMPlexCreateBoxMesh(...) one of the fields was
to define number of faces in each spatial direction. Right now it is set to
1 for 3D, so i guess I will just have to change this

Thanks!

On Wed, Aug 17, 2016 at 2:46 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Aug 17, 2016 at 2:35 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Because the base mesh starts off with 6 elements and 8 vertices. This is
>> enough data for one cell per MPI process. Refinement is done after
>> DMPlexDistribute(...). If I use anymore than 6 cores, some of the MPI ranks
>> will have an empty DM Object. For example:
>>
>
> You could
>
> 1) Just stick an explicit call to uniform refinement in for the serial mesh
>
> 2) Use a cell volume that is 1/N, where N is the number of cells you want
>
>Matt
>
>
>> $ mpirun -n 8 --bind-to-core --bysocket ./ex12 -dim 3 -run_type full
>> -interpolate 1 -petscspace_order 1 -bc_type dirichlet -ksp_rtol 1.0e-7
>> -pc_type ml -refinement_limit 1 -dm_refine 1 -dm_view
>>
>> DM Object: Parallel Mesh 8 MPI processes
>>
>>   type: plex
>>
>> Parallel Mesh in 3 dimensions:
>>
>>   0-cells: 10 0 10 10 10 0 10 10
>>
>>   1-cells: 25 0 25 25 25 0 25 25
>>
>>   2-cells: 24 0 24 24 24 0 24 24
>>
>>   3-cells: 8 0 8 8 8 0 8 8
>>
>> Labels:
>>
>>   marker: 1 strata of sizes (18)
>>
>>   depth: 4 strata of sizes (10, 25, 24, 8)
>>
>> Number of SNES iterations = 1
>>
>> L_2 Error: 0.118178
>>
>> On Wed, Aug 17, 2016 at 2:28 PM, Matthew Knepley <knep...@gmail.com>
>> wrote:
>>
>>> On Wed, Aug 17, 2016 at 2:04 PM, Justin Chang <jychan...@gmail.com>
>>> wrote:
>>>
>>>> When I enter values  like 1/16, 1/12, 1/24, and so on, I was expecting
>>>> to get roughly the same dm object as if I simply did -dm_refine <0/1/2/3>.
>>>> Instead it seems I get highly unstructured grids, and the smaller the
>>>> number gets, the fewer additional cells I get. Is there a way to make the
>>>> refinement limit uniform?
>>>>
>>>
>>> Why not just use -dm_refine ?
>>>
>>>Matt
>>>
>>>
>>>> On Wed, Aug 17, 2016 at 9:38 AM, Matthew Knepley <knep...@gmail.com>
>>>> wrote:
>>>>
>>>>> On Wed, Aug 17, 2016 at 5:23 AM, Justin Chang <jychan...@gmail.com>
>>>>> wrote:
>>>>>
>>>>>> Hi all,
>>>>>>
>>>>>> Playing around with SNES ex12.c and I am attempting to tinker around
>>>>>> with 3D options. I am trying to understand what kind of values go into
>>>>>> -refinement_limit for 3D simplices.
>>>>>>
>>>>>
>>>>> The cell volume limits for any cells created out of the existing
>>>>> cells. This is how TetGen understands refinement. I think a
>>>>> better way is to use a metric tensor field, and we now have an
>>>>> interface to pragmatic for this (I think currently I only hooked it up
>>>>> to DMCoarsen() but it does both). Clearly the interface is immature,
>>>>> but this is the way we are headed.
>>>>>
>>>>>   Matt
>>>>>
>>>>>
>>>>>> Thanks,
>>>>>> Justin
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>
>>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] What exactly goes into DMPlexSetRefinementLimit

2016-08-17 Thread Justin Chang
Because the base mesh starts off with 6 elements and 8 vertices. This is
enough data for one cell per MPI process. Refinement is done after
DMPlexDistribute(...). If I use anymore than 6 cores, some of the MPI ranks
will have an empty DM Object. For example:

$ mpirun -n 8 --bind-to-core --bysocket ./ex12 -dim 3 -run_type full
-interpolate 1 -petscspace_order 1 -bc_type dirichlet -ksp_rtol 1.0e-7
-pc_type ml -refinement_limit 1 -dm_refine 1 -dm_view

DM Object: Parallel Mesh 8 MPI processes

  type: plex

Parallel Mesh in 3 dimensions:

  0-cells: 10 0 10 10 10 0 10 10

  1-cells: 25 0 25 25 25 0 25 25

  2-cells: 24 0 24 24 24 0 24 24

  3-cells: 8 0 8 8 8 0 8 8

Labels:

  marker: 1 strata of sizes (18)

  depth: 4 strata of sizes (10, 25, 24, 8)

Number of SNES iterations = 1

L_2 Error: 0.118178

On Wed, Aug 17, 2016 at 2:28 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Aug 17, 2016 at 2:04 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> When I enter values  like 1/16, 1/12, 1/24, and so on, I was expecting to
>> get roughly the same dm object as if I simply did -dm_refine <0/1/2/3>.
>> Instead it seems I get highly unstructured grids, and the smaller the
>> number gets, the fewer additional cells I get. Is there a way to make the
>> refinement limit uniform?
>>
>
> Why not just use -dm_refine ?
>
>Matt
>
>
>> On Wed, Aug 17, 2016 at 9:38 AM, Matthew Knepley <knep...@gmail.com>
>> wrote:
>>
>>> On Wed, Aug 17, 2016 at 5:23 AM, Justin Chang <jychan...@gmail.com>
>>> wrote:
>>>
>>>> Hi all,
>>>>
>>>> Playing around with SNES ex12.c and I am attempting to tinker around
>>>> with 3D options. I am trying to understand what kind of values go into
>>>> -refinement_limit for 3D simplices.
>>>>
>>>
>>> The cell volume limits for any cells created out of the existing cells.
>>> This is how TetGen understands refinement. I think a
>>> better way is to use a metric tensor field, and we now have an interface
>>> to pragmatic for this (I think currently I only hooked it up
>>> to DMCoarsen() but it does both). Clearly the interface is immature, but
>>> this is the way we are headed.
>>>
>>>   Matt
>>>
>>>
>>>> Thanks,
>>>> Justin
>>>>
>>>
>>>
>>>
>>> --
>>> What most experimenters take for granted before they begin their
>>> experiments is infinitely more interesting than any results to which their
>>> experiments lead.
>>> -- Norbert Wiener
>>>
>>
>>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] What exactly goes into DMPlexSetRefinementLimit

2016-08-17 Thread Justin Chang
When I enter values  like 1/16, 1/12, 1/24, and so on, I was expecting to
get roughly the same dm object as if I simply did -dm_refine <0/1/2/3>.
Instead it seems I get highly unstructured grids, and the smaller the
number gets, the fewer additional cells I get. Is there a way to make the
refinement limit uniform?

On Wed, Aug 17, 2016 at 9:38 AM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Aug 17, 2016 at 5:23 AM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> Playing around with SNES ex12.c and I am attempting to tinker around with
>> 3D options. I am trying to understand what kind of values go into
>> -refinement_limit for 3D simplices.
>>
>
> The cell volume limits for any cells created out of the existing cells.
> This is how TetGen understands refinement. I think a
> better way is to use a metric tensor field, and we now have an interface
> to pragmatic for this (I think currently I only hooked it up
> to DMCoarsen() but it does both). Clearly the interface is immature, but
> this is the way we are headed.
>
>   Matt
>
>
>> Thanks,
>> Justin
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


[petsc-users] What exactly goes into DMPlexSetRefinementLimit

2016-08-17 Thread Justin Chang
Hi all,

Playing around with SNES ex12.c and I am attempting to tinker around with
3D options. I am trying to understand what kind of values go into
-refinement_limit for 3D simplices.

Thanks,
Justin


Re: [petsc-users] KSPSetOperators(ksp, A, A, SAME_NONZERO_PATTERN) does not work any more

2016-08-07 Thread Justin Chang
PETSc 3.6.3 and on automatically determines that last field, so you just
need "KSPSetOperators(ksp,A,A)"



On Sun, Aug 7, 2016 at 2:59 AM, 丁老师  wrote:

> Dear friends:
>  I just update to 3.6.3, but the command "
> KSPSetOperators(ksp,A,A,SAME_NONZERO_PATTERN)" does not work any more.
> could you please give me some suggestions
> Regards
>
>
>
>
>
>
>
>
>
>
>
>


[petsc-users] Reference for multi-grid tuning

2016-08-03 Thread Justin Chang
Hi all,

I want to play around with algebraic multi-grid parameters to optimize
performance for my 3D problems but I don't have a good idea of where to
start. Don't really see a detailed reference or tutorial for tuning these
parameters in the PETSc manual.

Can someone point me to a good paper/reference/book/online resource/etc
that can give good guidelines on how to tune parameters for implementations
similar to what's currently in GAMG/ML for instance.

Thanks,
Justin


Re: [petsc-users] Transient poisson example in petsc

2016-07-06 Thread Justin Chang
Julian,

I hand wrote my own time stepping scheme (backward Euler) for SNES ex12.c
because I had to enforce TAO's convex optimization solvers at every time
level. I am sure Matt or one of the other PETSc developers can tell you how
to make today's SNES ex12.c transient.

Thanks,
Justin

On Wednesday, July 6, 2016, Julian Andrej  wrote:

> Hi,
>
> i've seen your question on the mailing list regarding a transient
> poisson example using PetscFE and TS. Do you have any working
> solution? I'm still confused about what to change from snes ex12 for
> example to make it transient.
>
> I hope you could help me out there :)
>
> regards
> Julian
>


Re: [petsc-users] View wall-clock time of a PETSc function via command-line

2016-07-05 Thread Justin Chang
Okay, thanks!

On Tue, Jul 5, 2016 at 11:50 AM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>   ./ex1 -log_view | grep KSPSolve
>
>   or
>
>./ex1 -log_view | egrep "(KSPSolve|SNESSolve)"
>
>
> > On Jul 5, 2016, at 11:46 AM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Is there a quick way (e.g., through command-line options) to output the
> wall-clock time of a PETSc function (e.g., SNESSolve(), KSPSolve(), etc)
> without outputting the entire -log_view?
> >
> > Thanks,
> > Justin
>
>


[petsc-users] View wall-clock time of a PETSc function via command-line

2016-07-05 Thread Justin Chang
Hi all,

Is there a quick way (e.g., through command-line options) to output the
wall-clock time of a PETSc function (e.g., SNESSolve(), KSPSolve(), etc)
without outputting the entire -log_view?

Thanks,
Justin


Re: [petsc-users] Dose Petsc has DMPlex example

2016-07-03 Thread Justin Chang
Hoang, if you run this example shown from the config/builder.py

./ex62 -run_type full -refinement_limit 0.00625 -bc_type dirichlet
-interpolate 1 -vel_petscspace_order 2 -pres_petscspace_order 1 -ksp_type
fgmres -ksp_gmres_restart 100 -ksp_rtol 1.0e-9 -pc_type fieldsplit
-pc_fieldsplit_type schur -pc_fieldsplit_schur_factorization_type full
-fieldsplit_pressure_ksp_rtol 1e-10 -fieldsplit_velocity_ksp_type gmres
-fieldsplit_velocity_pc_type lu -fieldsplit_pressure_pc_type jacobi
-snes_monitor_short -ksp_monitor_short -snes_converged_reason
-ksp_converged_reason -snes_view -show_solution 0


it should work

On Sun, Jul 3, 2016 at 9:06 AM, Hoang Giang Bui  wrote:

> Hi Matt
>
> I tried to run ex62 with 1 proc (petsc 3.7.2), but it all produces zero
>
> The output is:
> hbui@bermuda:~/workspace/petsc/snes$ es$ ./ex62 run_type full -bc_type
> dirichlet -refinement_limit 0.00625 -interpolate 1 -snes_monitor_short
> -snes_converged_reason -snes_view -ksp_type fgmres -ksp_gmres_restart 100
> -ksp_rtol 1.0e-9 -ksp_monitor_short -pc_type fieldsplit -pc_fieldsplit_type
> schur -pc_fieldsplit_schur_factorization_type full
> -fieldsplit_velocity_ksp_type gmres -fieldsplit_velocity_pc_type lu
> -fieldsplit_pressure_ksp_rtol 1e-10 -fieldsplit_pressure_pc_type jacobi
>   0 SNES Function norm 0.265165
> 0 KSP Residual norm 0.265165
> Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=50, maximum function evaluations=1
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=0
>   total number of function evaluations=1
>   norm schedule ALWAYS
>   SNESLineSearch Object:   1 MPI processes
> type: bt
>   interpolation: cubic
>   alpha=1.00e-04
> maxstep=1.00e+08, minlambda=1.00e-12
> tolerances: relative=1.00e-08, absolute=1.00e-15,
> lambda=1.00e-08
> maximum iterations=40
>   KSP Object:   1 MPI processes
> type: fgmres
>   GMRES: restart=100, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
>   GMRES: happy breakdown tolerance 1e-30
> maximum iterations=1, initial guess is zero
> tolerances:  relative=1e-09, absolute=1e-50, divergence=1.
> right preconditioning
> using UNPRECONDITIONED norm type for convergence test
>   PC Object:   1 MPI processes
> type: fieldsplit
>   FieldSplit with Schur preconditioner, factorization FULL
>   Preconditioner for the Schur complement formed from A11
>   Split info:
>   Split number 0 Defined by IS
>   Split number 1 Defined by IS
>   KSP solver for A00 block
> KSP Object:(fieldsplit_velocity_) 1 MPI processes
>   type: gmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-05, absolute=1e-50, divergence=1.
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:(fieldsplit_velocity_) 1 MPI processes
>   type: lu
> LU: out-of-place factorization
> tolerance for zero pivot 2.22045e-14
> matrix ordering: nd
> factor fill ratio given 5., needed 1.
>   Factored matrix follows:
> Mat Object: 1 MPI processes
>   type: seqaij
>   rows=512, cols=512, bs=2
>   package used to perform factorization: petsc
>   total: nonzeros=1024, allocated nonzeros=1024
>   total number of mallocs used during MatSetValues calls =0
> using I-node routines: found 256 nodes, limit used is 5
>   linear system matrix = precond matrix:
>   Mat Object:  (fieldsplit_velocity_)   1 MPI
> processes
> type: seqaij
> rows=512, cols=512, bs=2
> total: nonzeros=1024, allocated nonzeros=1024
> total number of mallocs used during MatSetValues calls =0
>   using I-node routines: found 256 nodes, limit used is 5
>   KSP solver for S = A11 - A10 inv(A00) A01
> KSP Object:(fieldsplit_pressure_) 1 MPI processes
>   type: gmres
> GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
> Orthogonalization with no iterative refinement
> GMRES: happy breakdown tolerance 1e-30
>   maximum iterations=1, initial guess is zero
>   tolerances:  relative=1e-10, absolute=1e-50, divergence=1.
>   left preconditioning
>   using PRECONDITIONED norm type for convergence test
> PC Object:

Re: [petsc-users] SetVariableBounds vs ComputeVariableBounds

2016-06-27 Thread Justin Chang
Thanks all,

Btw, does Tao's Hessian evaluation routines also "cheat" the way the
Jacobian routines do? Or is it fine to supply the Hessian only once (assume
it is independent of the solution)?

Thanks,
Justin

On Monday, June 27, 2016, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>There is the same issue with ODE integrators for linear problems. The
> solvers tromp on the Jacobian.
>
>We should actually add an error indicator in these TAO/TS solvers, if
> the "Jacobian" state value is not increased in the next time step/iteration
> this means the person did not supply the new Jacobian (in other words the
> Jacobian is still whatever it was tromped to) so the solver should error
> out and tell the user their mistake.
>
>
>   Barry
>
>
>
> > On Jun 27, 2016, at 1:41 PM, Munson, Todd <tmun...@mcs.anl.gov
> <javascript:;>> wrote:
> >
> >
> > Hi Justin,
> >
> > I will have to look regarding the TAO semismooth solvers.  The TAO
> > solvers probably "cheated" and modified the Jacobian matrix rather
> > than extracting submatrices and shifting the diagonal or using a
> > matrix-free version.
> >
> > Note: the TAO interior-point and semismooth methods start from an element
> > of the generalized Jacobian matrix for a semismooth reformulation of
> > the VI problem.  This generalization Jacobian is a diagonal
> > perturbation to a scale version of the Jacobian you input.
> > If we cheat and modify the matrix, then it needs to be
> > filled back in during a Jacobian evaluation.
> >
> > At some point, the plan was to move all the VI methods into PETSc proper,
> > but we may have stopped with the active-set (ASLS) method because that
> > tends to have the best performance for PDE-related problems.
> >
> > Todd.
> >
> >> On Jun 27, 2016, at 12:37 PM, Justin Chang <jychan...@gmail.com
> <javascript:;>> wrote:
> >>
> >> So I figured it out. I had to explicitly form the Tao
> Gradient/Constraints and Jacobian. I couldn't just "pre-process" the
> gradient Vec and Jacobian Mat through SNESComputeXXX. Attached is the
> updated file and makefile.
> >>
> >> My question now is, why exactly is this the case? This preprocessing
> strategy seemed to work for bounded constrained optimization solvers (e.g.,
> TRON/BLMVM) but apparently not with the MCP ones. The system is linear so
> my original reasoning was that the Jacobian shouldn't change, thus it just
> needs to be assembled once. I recall faintly from a previous discussion
> that the SNESVI solvers will vary the Mat and Vec sizes depending on the
> regions that need to be "constrained" or something?
> >>
> >> Thanks,
> >> Justin
> >>
> >> On Sun, Jun 26, 2016 at 5:03 AM, Barry Smith <bsm...@mcs.anl.gov
> <javascript:;>> wrote:
> >>
> >>  I wish I could answer this but I am weak on these algorithms.
> Hopefully Todd has a good understanding of their application and strengths
> and weaknesses.
> >>
> >>  Barry
> >>
> >>> On Jun 25, 2016, at 3:31 PM, Justin Chang <jychan...@gmail.com
> <javascript:;>> wrote:
> >>>
> >>> Hi all,
> >>>
> >>> So I modified SNES ex9.c so that one has the option to use TAO's
> complementarity solvers for this problem. Attached is the file.
> >>>
> >>> I expect the TAO solvers to behave the same as the SNESVI ones, but I
> am having the same issues as before - SSILS and SSFLS do not work
> whatsoever but for some reason ASILS and ASFLS work. Although the latter
> two produce the same results as the SNES VI counterparts, they converge
> much slower, and something tells me I am not doing something correctly.
> Based on what I have seen from the two TAO complementarity examples, I
> would also expect the AS and SS solvers to be roughly the same.
> >>>
> >>> BTW, in the modified code, I made some "shortcuts." Instead of
> explicitly forming the Tao versions of the Gradient and Jacobian, I first
> assemble the residual r and Jacobian J through the SNESComputeXXX
> functions. Then I pass them into the TaoSetConstraints and TaoSetJacobian
> routines. Because this is a linear system, I have:
> >>>
> >>> f = r - J*u^0
> >>> gradient g = J*u - f = J*(u + *u^0) + r
> >>>
> >>> were u^0 is the initial vector. I am not sure if this "shortcut" has
> anything to do with the issue at hand. Attached is the makefile which has
> instructions on how to run the problem.
> >>&

Re: [petsc-users] SetVariableBounds vs ComputeVariableBounds

2016-06-27 Thread Justin Chang
So I figured it out. I had to explicitly form the Tao Gradient/Constraints
and Jacobian. I couldn't just "pre-process" the gradient Vec and Jacobian
Mat through SNESComputeXXX. Attached is the updated file and makefile.

My question now is, why exactly is this the case? This preprocessing
strategy seemed to work for bounded constrained optimization solvers (e.g.,
TRON/BLMVM) but apparently not with the MCP ones. The system is linear so
my original reasoning was that the Jacobian shouldn't change, thus it just
needs to be assembled once. I recall faintly from a previous discussion
that the SNESVI solvers will vary the Mat and Vec sizes depending on the
regions that need to be "constrained" or something?

Thanks,
Justin

On Sun, Jun 26, 2016 at 5:03 AM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>   I wish I could answer this but I am weak on these algorithms. Hopefully
> Todd has a good understanding of their application and strengths and
> weaknesses.
>
>   Barry
>
> > On Jun 25, 2016, at 3:31 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Hi all,
> >
> > So I modified SNES ex9.c so that one has the option to use TAO's
> complementarity solvers for this problem. Attached is the file.
> >
> > I expect the TAO solvers to behave the same as the SNESVI ones, but I am
> having the same issues as before - SSILS and SSFLS do not work whatsoever
> but for some reason ASILS and ASFLS work. Although the latter two produce
> the same results as the SNES VI counterparts, they converge much slower,
> and something tells me I am not doing something correctly. Based on what I
> have seen from the two TAO complementarity examples, I would also expect
> the AS and SS solvers to be roughly the same.
> >
> > BTW, in the modified code, I made some "shortcuts." Instead of
> explicitly forming the Tao versions of the Gradient and Jacobian, I first
> assemble the residual r and Jacobian J through the SNESComputeXXX
> functions. Then I pass them into the TaoSetConstraints and TaoSetJacobian
> routines. Because this is a linear system, I have:
> >
> > f = r - J*u^0
> > gradient g = J*u - f = J*(u + *u^0) + r
> >
> > were u^0 is the initial vector. I am not sure if this "shortcut" has
> anything to do with the issue at hand. Attached is the makefile which has
> instructions on how to run the problem.
> >
> > Any ideas what is going on??
> >
> > Thanks!
> > Justin
> >
> > On Wed, Jun 22, 2016 at 9:42 PM, Ed Bueler <elbue...@alaska.edu> wrote:
> > Justin --
> >
> > Yeah, good point.  SNESVISetVariableBounds() works fine, at least in
> ex9.c (see attached patch).  The reason for the other choice, which I found
> in my 5 year old email, was some bug in petsc3.2.
> >
> > Ed
> >
> > Date: Wed, 22 Jun 2016 08:42:33 +0100
> > From: Justin Chang <jychan...@gmail.com>
> > To: petsc-users <petsc-users@mcs.anl.gov>
> > Subject: [petsc-users] SetVariableBounds vs ComputeVariableBounds
> >
> > Hi all,
> >
> > I am looking at the SNES tutorials ex9.c and ex58.c and am wondering why
> > SNESVISetComputeVariableBounds() is called instead of just
> > SNESVISetVariableBounds(). When would it be appropriate to use only using
> > the latter?
> >
> > Thanks,
> > Justin
> >
> > 
>
>
static const char help[] = "Solves obstacle problem in 2D as a variational inequality.\n\
Modified to include option of TAO complementarity solvers (SSILS/SSFLS/ASILS/ASFS). \n\
An elliptic problem with solution  u  constrained to be above a given function  psi. \n\
Exact solution is known.\n";

/*  Solve on a square R = {-2<x<2,-2<y<2}:
u_{xx} + u_{yy} = 0
on the set where membrane is above obstacle.  Constraint is  u(x,y) >= psi(x,y).
Here psi is the upper hemisphere of the unit ball.  On the boundary of R
we have nonhomogenous Dirichlet boundary conditions coming from the exact solution.

Method is centered finite differences.

This example was contributed by Ed Bueler.  The exact solution is known for the
given psi and boundary values in question.  See
  https://github.com/bueler/fem-code-challenge/blob/master/obstacleDOC.pdf?raw=true.

Modified by Justin Chang to use TAO.

Example usage follows.

Get help:
  ./ex9_TAO -help

Basic run:
  ./ex9_TAO -solver_type <0/1>

  0 = SNES Variational Inequality
  1 = TAO Complementarity solvers

  Default SNESVI and TAO solvers are 'vinewtonrsls' and 'ssils' respectively.

*/

#include 
#include 
#include 
#include 

/* application context for obstacle problem solver */
typedef struct {
  DM da;
  Vec psi, uexact;
  Vec f; /* r - J*u if needed */
  Mat J; /* Jacobian J if needed */
} Obs

[petsc-users] SetVariableBounds vs ComputeVariableBounds

2016-06-22 Thread Justin Chang
Hi all,

I am looking at the SNES tutorials ex9.c and ex58.c and am wondering why
SNESVISetComputeVariableBounds() is called instead of just
SNESVISetVariableBounds(). When would it be appropriate to use only using
the latter?

Thanks,
Justin


Re: [petsc-users] Does PETSc have these solvers bounded-constrained optimization problems?

2016-06-15 Thread Justin Chang
Attached is my petsc4py/firedrake code which has the implementation I
described.

As of right now, my current implementation with ASILS/ASFLS solvers work
(although they are much slower compared to TRON) but the SSILS/SSFLS
solvers don't even converge.

For example, if I run the code as:

> python NR_Nonlinear_poisson.py 50 50 2 -opt_tao_type asils
-opt_tao_monitor -opt_tao_max_it 10 -opt_tao_view
iter =   0, Function value: 10.268,  Residual: 30.1512
iter =   1, Function value: 2.26616,  Residual: 3.28276
iter =   2, Function value: 0.442395,  Residual: 0.409106
iter =   3, Function value: 0.0180227,  Residual: 0.0305439
iter =   4, Function value: 0.00381325,  Residual: 0.0134287
iter =   5, Function value: 0.00191598,  Residual: 0.00599256
iter =   6, Function value: 0.00114725,  Residual: 0.00383069
iter =   7, Function value: 0.00007,  Residual: 0.0032619
iter =   8, Function value: 0.000434005,  Residual: 0.00148022
iter =   9, Function value: 0.000179781,  Residual: 0.000565633
iter =  10, Function value: 7.54948e-05,  Residual: 0.000242123
iter =  11, Function value: 3.14898e-05,  Residual: 0.000112043
Tao Object:(opt_) 1 MPI processes
  type: asils
  TaoLineSearch Object:  (opt_)   1 MPI processes
type: armijo
  KSP Object:  (opt_)   1 MPI processes
type: gmres
  total KSP iterations: 511
  Active Set subset type: subvec
  convergence tolerances: gatol=1e-16,   steptol=0.,   gttol=0.
  Residual in Function/Gradient:=0.000112043
  convergence tolerances: function minimum=1e-08
  Objective value=3.14898e-05
  total number of iterations=11,  (max: 10)
  total number of constraint function evaluations=23
  total number of Jacobian evaluations=12
  Solver terminated: -2   Maximum Iterations

The above indicates that the solver converges. However if I run the same
code but with this:

> python NR_Nonlinear_poisson.py 50 50 2 -opt_tao_type ssils
-opt_tao_monitor -opt_tao_max_it 10 -opt_tao_view
iter =   0, Function value: 10.268,  Residual: 30.1512
iter =   1, Function value: 10.268,  Residual: 30.1512
iter =   2, Function value: 10.268,  Residual: 30.1512
iter =   3, Function value: 10.268,  Residual: 30.1512
iter =   4, Function value: 10.268,  Residual: 30.1512
iter =   5, Function value: 10.268,  Residual: 30.1512
iter =   6, Function value: 10.268,  Residual: 30.1512
iter =   7, Function value: 10.268,  Residual: 30.1512
iter =   8, Function value: 10.268,  Residual: 30.1512
iter =   9, Function value: 10.268,  Residual: 30.1512
iter =  10, Function value: 10.268,  Residual: 30.1512
iter =  11, Function value: 10.268,  Residual: 30.1512
Tao Object:(opt_) 1 MPI processes
  type: ssils
  TaoLineSearch Object:  (opt_)   1 MPI processes
type: armijo
  KSP Object:  (opt_)   1 MPI processes
type: gmres
  total KSP iterations: 451
  Active Set subset type: subvec
  convergence tolerances: gatol=1e-16,   steptol=0.,   gttol=0.
  Residual in Function/Gradient:=30.1512
  convergence tolerances: function minimum=1e-08
  Objective value=10.268
  total number of iterations=11,  (max: 10)
  total number of constraint function evaluations=331
  total number of Jacobian evaluations=1
  Solver terminated: -2   Maximum Iterations

Clearly something is wrong. Any insight as to why?

Thanks,
Justin

On Tue, Jun 14, 2016 at 6:29 AM, Jed Brown  wrote:

> Barry Smith  writes:
> >In Tao it is solving it as an optimization problem while
> >SNESVINEWTONSSLS solves it as a "nonlinear equation" with
> >constraints; I think the answer should be the same.
>
> The latter may have more solutions.
>
#==
#
#  Nonlinear poisson equation with Dirichlet conditions
#  Non-negative formulation with consistent Newton-Raphson 
#
#  Run as:
#python NR_Nonlinear_diffusion.py   
#
#   = number of elements in x direction
#   = number of elements in y direction
#   = Use standard linear solver (0) 
#  TRON solver (1) 
#  SSILS solver (2)
#
#==

from firedrake import *
from firedrake.petsc import PETSc
from ufl import log
log.set_level(log.CRITICAL)
parameters["assembly_cache"]["enabled"] = True
import numpy as np
import sys
op2.init(log_level='ERROR')
xseed = int(sys.argv[1])
yseed = int(sys.argv[2])
opt = int(sys.argv[3])

#=
#  Create mesh
#=
mesh = UnitSquareMesh(xseed, yseed, quadrilateral=True)
V = FunctionSpace(mesh, 'CG', 1)
Q = TensorFunctionSpace(mesh, 'CG', 1)
u = TrialFunction(V)
v = TestFunction(V)

# old solution
u_k = Function(V)
# new solution
u_k1 = Function(V)

#=
#  Medium properties
#=
D = Function(Q)
alpha = Constant(0.0)
d1 = 1.0
d2 = 0.0001
theta = pi / 6.0
co = cos(theta)
si = sin(theta)
D = interpolate(Expression(("co*co*d1+si*si*d2", "-co*si*(d1-d2)",
  "-co*si*(d1-d2)", 

Re: [petsc-users] Does PETSc have these solvers bounded-constrained optimization problems?

2016-06-13 Thread Justin Chang
Thanks Barry,

1) Say this is my problem:

min 1/2 x^T*H*x - x^T*f
s.t. x_lower < x < x_upper

If i want to use any of the following TAO complementarity solvers:

- ASILS
- ASFLS
- SSILS
- SSFLS

I would need these routines:

TaoSetConstraintsRoutine(tao,c,formGradient(...),NULL)
TaoSetJacobianRoutine(tao,J,J,formHessian(...),NULL)
TaoSetVariableBounds(tao,x_lower,x_upper)

where formGradient() would return c = H*x - f
and formHessian would return J = H

Is this correct?

2) If so, then what exactly is the difference between these TAO
complementarity solvers and the SNESVINEWTONSSLS solver? From the online
manuals and documentation, both claim to be variational inequality solvers.

Thanks!
Justin


On Sat, Jun 11, 2016 at 12:19 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
>Justin,
>
> For bound constrained optimization problems, there is:
>
>  TRON -- a truncated Newton method from your favorite inventor of such
> methods
>  BLMVM -- a limited memory quasi-Newton method for bound constraints
> (projected quasi-Newton)
>
> There used to be the KT solvers that was just a wrapper around the
> complementarity
> methods.  Jason -- what happened to this?
>
> Anyways, from the gradient and Hessian (or Hessian vector product), you
> can apply the
> complementarity solvers to the optimality conditions:
>
>  ASLS -- active set family (e.g. projected newton)
>  SSLS -- semismooth family
>
> For more general constraints, there is:
>
>  IPM -- interior-point method
>
> I have not used it or tested it though.
>
> For PDE constrained problems, there is:
>
>  LCL -- linearly constrained augmented Lagrangian.
>
>
> ROL has imitations either directly copied from our code or written from
> our papers.
>
> > On Jun 10, 2016, at 5:57 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Hi all,
> >
> > Does PETSc currently have any of these solvers for bounded constraint
> problems:
> >
> > 1) Semismooth Newton methods (aka primal-dual active-set methods)
> > 2) Projected Newton methods
> >
> > Trilinos' Rapid Optimization Library (ROL) has them, and I have seen
> papers and books claiming that these solvers are state-of-the-art.
> >
> > I see there's SNESVINEWTONSSLS and TAOGPCG but are these the same as the
> above methods?
> >
> > Thanks,
> > Justin
> >
> >
> >
>
>


[petsc-users] Does PETSc have these solvers bounded-constrained optimization problems?

2016-06-10 Thread Justin Chang
Hi all,

Does PETSc currently have any of these solvers for bounded constraint
problems:

1) Semismooth Newton methods (aka primal-dual active-set methods)
2) Projected Newton methods

Trilinos' Rapid Optimization Library (ROL
) has them, and I
have seen papers and books claiming that these solvers are state-of-the-art.

I see there's SNESVINEWTONSSLS

and TAOGPCG

but are these the same as the above methods?

Thanks,
Justin


Re: [petsc-users] DIVERGED_NANORINF for CG/Bjacobi for transient diffusion

2016-03-30 Thread Justin Chang
Matt,

So do I use:

"-pc_type bjacobi -sub_pc_type lu"

or this:

"-sub_pc_type lu"

Thanks,
Justin

On Wed, Mar 30, 2016 at 4:23 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Wed, Mar 30, 2016 at 4:18 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> I am getting this error:
>>
>> Linear solve did not converge due to DIVERGED_NANORINF
>>
>> when solving an FEM for transient diffusion using hexahedron elements.
>> The problem is highly heterogeneous (i.e., dispersion tensor with varying
>> cell-centered velocity) and normally CG/Bjacobi has done well for me. If I
>> did CG/Jacobi, my solver would require nearly 7000 iterations, but would
>> get the "expected" solution.
>>
>> What does the above error mean? And what could I do to address it?
>>
>
> Almost certainly ILU craps out. Try using LU as the subsolver instead.
>
>   Matt
>
>
>> Thanks,
>> Justin
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


[petsc-users] DIVERGED_NANORINF for CG/Bjacobi for transient diffusion

2016-03-30 Thread Justin Chang
Hi all,

I am getting this error:

Linear solve did not converge due to DIVERGED_NANORINF

when solving an FEM for transient diffusion using hexahedron elements. The
problem is highly heterogeneous (i.e., dispersion tensor with varying
cell-centered velocity) and normally CG/Bjacobi has done well for me. If I
did CG/Jacobi, my solver would require nearly 7000 iterations, but would
get the "expected" solution.

What does the above error mean? And what could I do to address it?

Thanks,
Justin


Re: [petsc-users] Status on parallel mesh reader in DMPlex

2016-03-22 Thread Justin Chang
Hi Matt,

Is this still on track to come out in the summer? Or is it going to be here
sooner than that?

Thanks,
Justin

On Fri, Dec 18, 2015 at 6:05 PM, Matthew Knepley <knep...@gmail.com> wrote:

> I am doing it with Michael Lange. We plan to have it done by the summer.
> There is not much left to do.
>
>   Thanks,
>
> Matt
>
> On Fri, Dec 18, 2015 at 8:21 AM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> What's the status on the implementation of the parallel
>> mesh reader/generator for DMPlex meshes? Is anyone actively working on
>> this? If so is there a branch that I can peek into?
>>
>> Thanks,
>> Justin
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] Tao TRON solver tolerances

2016-03-15 Thread Justin Chang
Thanks Barry. Yes, KSP solver computing an "update" to the solution makes
sense. I also get that it's impossible to know whether this update is
"enough of" a descent direction.

What I am wondering, however, is why TAO no longer has -tao_fatol or
-tao_frtol options. 9 months ago, TAO had those options and that's what I
used for my problems. Back then, when I used BLMVM, it took ~900 tao
iterations and ~5000 seconds of wall-clock time for one of my problems.
Today, when I run the exact same problem but with -tao_grtol, it now takes
~1900 iterations and ~1000 seconds. Same solution, but twice the amount of
work.

I am guessing this means that the gradient tolerance need not be as
stringent as the objective functional tolerance. But do you guys know why
it was removed in the first place?

Thanks,
Justin

On Tue, Mar 15, 2016 at 9:54 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
> > On Mar 10, 2016, at 12:03 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Hi again,
> >
> > I was reading through the TAO manual and the impression I am getting is
> that the KSP solver computes the gradient/projection, not necessarily the
> solution itself. Meaning it matters not how accurate this projection is, so
> long as the actual objective tolerance is met.
>
>   Not sure what you mean by this.
>
>   The KSP solver computes an "update" to the solution. So long as the
> update is "enough of" a descent direction then you will get convergence of
> the optimization problem.
>
> >
> > Is this a correct assessment of why one can get away with a less
> stringent KSP tolerance and still attain an accurate solution?
>
>   Of course knowing how accurate the KSP must be to insure the update is
> "enough of" a descent direction is impossible :-)
>
>   Barry
>
> >
> > Thanks,
> > Justin
> >
> > On Tuesday, March 8, 2016, Justin Chang <jychan...@gmail.com> wrote:
> > Hi all,
> >
> > So I am solving a convex optimization problem of the following form:
> >
> > min 1/2 x^T*H*x - x^T*f
> > s.t. 0 < x < 1
> >
> > Using the TAOTRON solver, I also have CG/ILU for KSP/PC. The following
> TAO solver tolerances are used for my specific problem:
> >
> > -tao_gatol 1e-12
> > -tao_grtol 1e-7
> >
> > I noticed that the KSP tolerance truly defines the performance of this
> solver. Attached are three run cases with -ksp_rtol 1e-7, 1e-3, and 1e-1
> with "-ksp_converged_reason -ksp_monitor_true_residual -tao_view
> -tao_converged_reason -log_view". It seems that the lower the KSP
> tolerance, the faster the time-to-solution where the number of KSP/TAO
> solve iterations remains roughly the same.
> >
> > So my question is, is this "normal"? That is, if using TRON, one may
> relax the KSP tolerances because the convergence of the solver is primarily
> due to the objective functional from TRON and not necessarily the KSP solve
> itself? Is there a general rule of thumb for this, because it would seem to
> me that for any TRON solve I do, i could just set a really low KSP rtol and
> still get roughly the same performance.
> >
> > Thanks,
> > Justin
> >
>
>


Re: [petsc-users] CPU vs GPU for PETSc applications

2016-03-10 Thread Justin Chang
Matt,

So what's an example of "doing a bunch of iterations to make sending the
initial datadown worth it"? Is there a correlation between that and
arithmetic intensity, where an application is likely to be more
compute-bound and memory-bandwidth bound?

Thanks,
Justin

On Thu, Mar 10, 2016 at 2:50 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Thu, Mar 10, 2016 at 12:29 PM, Justin Chang <jychan...@gmail.com>
> wrote:
>
>> Hi all,
>>
>> When would I ever use GPU computing for a finite element simulation where
>> the limiting factor of performance is the memory bandwidth bound? Say I
>> want to run problems similar to SNES ex12 and 62. I understand that there
>> is an additional bandwidth associated with offloading data from the CPU to
>> GPU but is there more to it? I recall reading through some email threads
>> about GPU's potentially giving you a speed up of 3x that on a CPU but the
>> gain in performance may not be worth the increase in time moving data
>> around.
>
>
> The main use case is if you are being forced to use a machine which has
> GPUs. Then you can indeed get some benefit
> from the larger bandwidth. You need a problem where you are doing a bunch
> of iterations to make sending the initial data
> down worth it.
>
> It would certainly be better if you are computing the action of your
> operator directly on the GPU, but that is much more
> disruptive to the code right now.
>
>   Matt
>
>
>> Thanks,
>> Justin
>>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


[petsc-users] CPU vs GPU for PETSc applications

2016-03-10 Thread Justin Chang
Hi all,

When would I ever use GPU computing for a finite element simulation where
the limiting factor of performance is the memory bandwidth bound? Say I
want to run problems similar to SNES ex12 and 62. I understand that there
is an additional bandwidth associated with offloading data from the CPU to
GPU but is there more to it? I recall reading through some email threads
about GPU's potentially giving you a speed up of 3x that on a CPU but the
gain in performance may not be worth the increase in time moving data
around.

Thanks,
Justin


[petsc-users] Tao TRON solver tolerances

2016-03-10 Thread Justin Chang
Hi again,

I was reading through the TAO manual and the impression I am getting is
that the KSP solver computes the gradient/projection, not necessarily the
solution itself. Meaning it matters not how accurate this projection is, so
long as the actual objective tolerance is met.

Is this a correct assessment of why one can get away with a less stringent
KSP tolerance and still attain an accurate solution?

Thanks,
Justin

On Tuesday, March 8, 2016, Justin Chang <jychan...@gmail.com
<javascript:_e(%7B%7D,'cvml','jychan...@gmail.com');>> wrote:

> Hi all,
>
> So I am solving a convex optimization problem of the following form:
>
> min 1/2 x^T*H*x - x^T*f
> s.t. 0 < x < 1
>
> Using the TAOTRON solver, I also have CG/ILU for KSP/PC. The following TAO
> solver tolerances are used for my specific problem:
>
> -tao_gatol 1e-12
> -tao_grtol 1e-7
>
> I noticed that the KSP tolerance truly defines the performance of this
> solver. Attached are three run cases with -ksp_rtol 1e-7, 1e-3, and 1e-1
> with "-ksp_converged_reason -ksp_monitor_true_residual -tao_view
> -tao_converged_reason -log_view". It seems that the lower the KSP
> tolerance, the faster the time-to-solution where the number of KSP/TAO
> solve iterations remains roughly the same.
>
> So my question is, is this "normal"? That is, if using TRON, one may relax
> the KSP tolerances because the convergence of the solver is primarily due
> to the objective functional from TRON and not necessarily the KSP solve
> itself? Is there a general rule of thumb for this, because it would seem to
> me that for any TRON solve I do, i could just set a really low KSP rtol and
> still get roughly the same performance.
>
> Thanks,
> Justin
>
>


Re: [petsc-users] Strange GAMG performance for mixed FE formulation

2016-03-04 Thread Justin Chang
*Coarsen_AGG(): Square Graph on level 3 of 10 to square

[0] PC*GAMG*Prolongator_AGG(): New grid 11 nodes

[0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.368729e+00
min=1.563750e-01 PC=jacobi

[0] PCSetUp_*GAMG*(): 3) N=11, n data cols=1, nnz/row (ave)=11, 1 active pes

[0] PCSetUp_*GAMG*(): 4 levels, grid complexity = 12.0376

On Fri, Mar 4, 2016 at 8:31 AM, Lawrence Mitchell <
lawrence.mitch...@imperial.ac.uk> wrote:

>
> > On 4 Mar 2016, at 15:24, Justin Chang <jychan...@gmail.com> wrote:
> >
> > So with -pc_gamg_square_graph 10 I get the following:
>
> Because you're using gamg inside the fieldsplit, I think you need:
>
> -fieldsplit_1_pc_gamg_square_graph 10
>
>
>
> > [0] PCSetUp_GAMG(): level 0) N=48000, n data rows=1, n data cols=1,
> nnz/row (ave)=9, np=1
> > [0] PCGAMGFilterGraph():   55.7114% nnz after filtering, with
> threshold 0., 8.79533 nnz ave. (N=48000)
> > [0] PCGAMGCoarsen_AGG(): Square Graph on level 1 of 1 to square
>  ^
>
> Cheers,
>
> Lawrence
>


Re: [petsc-users] Strange GAMG performance for mixed FE formulation

2016-03-04 Thread Justin Chang
Time to solution went from 100 seconds to 30 seconds once i used 10 graphs.
Using 20 graphs started to increase in time slightly

On Fri, Mar 4, 2016 at 8:35 AM, Justin Chang <jychan...@gmail.com> wrote:

> You're right. This is what I have:
>
> [0] PCSetUp_*GAMG*(): level 0) N=48000, n data rows=1, n data cols=1,
> nnz/row (ave)=9, np=1
>
> [0] PC*GAMG*FilterGraph():  55.7114% nnz after filtering, with threshold
> 0., 8.79533 nnz ave. (N=48000)
>
> [0] PC*GAMG*Coarsen_AGG(): Square Graph on level 1 of 1 to square
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 6672 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.954700e+00
> min=1.040410e-02 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 1) N=6672, n data cols=1, nnz/row (ave)=623, 1
> active pes
>
> [0] PC*GAMG*FilterGraph():  3.40099% nnz after filtering, with threshold
> 0., 623.135 nnz ave. (N=6672)
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 724 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.313339e+00
> min=2.474586e-02 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 2) N=724, n data cols=1, nnz/row (ave)=724, 1
> active pes
>
> [0] PC*GAMG*FilterGraph():  9.82914% nnz after filtering, with threshold
> 0., 724. nnz ave. (N=724)
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 37 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=2.011784e+00
> min=2.759552e-01 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 3) N=37, n data cols=1, nnz/row (ave)=37, 1 active
> pes
>
> [0] PCSetUp_*GAMG*(): 4 levels, grid complexity = 12.0928
>
> [0] PCSetUp_*GAMG*(): level 0) N=48000, n data rows=1, n data cols=1,
> nnz/row (ave)=9, np=1
>
> [0] PC*GAMG*FilterGraph():  55.7114% nnz after filtering, with threshold
> 0., 8.79533 nnz ave. (N=48000)
>
> [0] PC*GAMG*Coarsen_AGG(): Square Graph on level 1 of 1 to square
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 6672 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.954700e+00
> min=1.040410e-02 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 1) N=6672, n data cols=1, nnz/row (ave)=623, 1
> active pes
>
> [0] PC*GAMG*FilterGraph():  3.40099% nnz after filtering, with threshold
> 0., 623.135 nnz ave. (N=6672)
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 724 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.313339e+00
> min=2.474586e-02 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 2) N=724, n data cols=1, nnz/row (ave)=724, 1
> active pes
>
> [0] PC*GAMG*FilterGraph():  9.82914% nnz after filtering, with threshold
> 0., 724. nnz ave. (N=724)
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 37 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=2.011784e+00
> min=2.759552e-01 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 3) N=37, n data cols=1, nnz/row (ave)=37, 1 active
> pes
>
> [0] PCSetUp_*GAMG*(): 4 levels, grid complexity = 12.0928
>
> [0] PCSetUp_*GAMG*(): level 0) N=162000, n data rows=1, n data cols=1,
> nnz/row (ave)=9, np=1
>
> [0] PC*GAMG*FilterGraph():  55.6621% nnz after filtering, with threshold
> 0., 8.863 nnz ave. (N=162000)
>
> [0] PC*GAMG*Coarsen_AGG(): Square Graph on level 1 of 1 to square
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 22085 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.955376e+00
> min=8.260696e-03 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 1) N=22085, n data cols=1, nnz/row (ave)=704, 1
> active pes
>
> [0] PC*GAMG*FilterGraph():  3.1314% nnz after filtering, with threshold
> 0., 704.128 nnz ave. (N=22085)
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 2283 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.311291e+00
> min=1.484874e-02 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 2) N=2283, n data cols=1, nnz/row (ave)=2283, 1
> active pes
>
> [0] PC*GAMG*FilterGraph():  3.64497% nnz after filtering, with threshold
> 0., 2283. nnz ave. (N=2283)
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 97 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=2.043254e+00
> min=1.321528e-01 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 3) N=97, n data cols=1, nnz/row (ave)=97, 1 active
> pes
>
> [0] PC*GAMG*FilterGraph():  66.8403% nnz after filtering, with threshold
> 0., 97. nnz ave. (N=97)
>
> [0] PC*GAMG*Prolongator_AGG(): New grid 5 nodes
>
> [0] PC*GAMG*OptProlongator_AGG(): Smooth P0: max eigen=1.653762e+00
> min=4.460582e-01 PC=jacobi
>
> [0] PCSetUp_*GAMG*(): 4) N=5, n data cols=1, nnz/row (ave)=5, 1 active pes
>
> [0] PCSetUp_*GAMG*(): 5 levels, grid complexity = 15.4673
>
> [0] PCSetUp_*GAMG*(): level 0) N=162000, n data rows=1, n data cols=1,
> nnz/row (ave)=9, np=1
>
> [0] PC*GAMG*FilterGraph():  55.6621% nnz after filtering, with th

Re: [petsc-users] Strange GAMG performance for mixed FE formulation

2016-03-03 Thread Justin Chang
Mark,

Using "-pc_gamg_square_graph 10" didn't change anything. I used values of
1, 10, 100, and 1000 and the performance seemed unaffected.

Changing the threshold of -pc_gamg_threshold to 0.8 did decrease wall-clock
time but it required more iterations.

I am not really sure how I go about tinkering around with GAMG or even ML.
Do you have a manual/reference/paper/etc that describes what's going on
within gamg?

Thanks,
Justin

On Thursday, March 3, 2016, Mark Adams <mfad...@lbl.gov> wrote:

> You have a very sparse 3D problem, with 9 non-zeros per row.   It is
> coarsening very slowly and creating huge coarse grids. which are expensive
> to construct.  The superlinear speedup is from cache effects, most likely.
> First try with:
>
> -pc_gamg_square_graph 10
>
> ML must have some AI in there to do this automatically, because gamg are
> pretty similar algorithmically.  There is a threshold parameter that is
> important (-pc_gamg_threshold <0.0>) and I think ML has the same default.
> ML is doing OK, but I would guess that if you use like 0.02 for MLs
> threshold you would see some improvement.
>
> Hypre is doing pretty bad also.  I suspect that it is getting confused as
> well.  I know less about how to deal with hypre.
>
> If you use -info and grep on GAMG you will see about 20 lines that will
> tell you the number of equations on level and the average number of
> non-zeros per row.  In 3D the reduction per level should be -- very
> approximately -- 30x and the number of non-zeros per row should not
> explode, but getting up to several hundred is OK.
>
> If you care to test this we should be able to get ML and GAMG to agree
> pretty well.  ML is a nice solver, but our core numerics should be about
> the same.  I tested this on a 3D elasticity problem a few years ago.  That
> said, I think your ML solve is pretty good.
>
> Mark
>
>
>
>
> On Thu, Mar 3, 2016 at 4:36 AM, Lawrence Mitchell <
> lawrence.mitch...@imperial.ac.uk
> <javascript:_e(%7B%7D,'cvml','lawrence.mitch...@imperial.ac.uk');>> wrote:
>
>> On 02/03/16 22:28, Justin Chang wrote:
>> ...
>>
>>
>> > Down solver (pre-smoother) on level 3
>> >
>> >   KSP Object:  (solver_fieldsplit_1_mg_levels_3_)
>> > linear system matrix = precond matrix:
>> ...
>> > Mat Object: 1 MPI processes
>> >
>> >   type: seqaij
>> >
>> >   rows=52147, cols=52147
>> >
>> >   total: nonzeros=38604909, allocated nonzeros=38604909
>> >
>> >   total number of mallocs used during MatSetValues calls =2
>> >
>> > not using I-node routines
>> >
>> > Down solver (pre-smoother) on level 4
>> >
>> >   KSP Object:  (solver_fieldsplit_1_mg_levels_4_)
>> > linear system matrix followed by preconditioner matrix:
>> >
>> > Mat Object:(solver_fieldsplit_1_)
>>
>> ...
>> >
>> > Mat Object: 1 MPI processes
>> >
>> >   type: seqaij
>> >
>> >   rows=384000, cols=384000
>> >
>> >   total: nonzeros=3416452, allocated nonzeros=3416452
>>
>>
>> This looks pretty suspicious to me.  The original matrix on the finest
>> level has 3.8e5 rows and ~3.4e6 nonzeros.  The next level up, the
>> coarsening produces 5.2e4 rows, but 38e6 nonzeros.
>>
>> FWIW, although Justin's PETSc is from Oct 2015, I get the same
>> behaviour with:
>>
>> ad5697c (Master as of 1st March).
>>
>> If I compare with the coarse operators that ML produces on the same
>> problem:
>>
>> The original matrix has, again:
>>
>> Mat Object: 1 MPI processes
>>   type: seqaij
>>   rows=384000, cols=384000
>>   total: nonzeros=3416452, allocated nonzeros=3416452
>>   total number of mallocs used during MatSetValues calls=0
>> not using I-node routines
>>
>> While the next finest level has:
>>
>> Mat Object: 1 MPI processes
>>   type: seqaij
>>   rows=65258, cols=65258
>>   total: nonzeros=1318400, allocated nonzeros=1318400
>>   total number of mallocs used during MatSetValues calls=0
>> not using I-node routines
>>
>> So we have 6.5e4 rows and 1.3e6 nonzeros, which seems more plausible.
>>
>> Cheers,
>>
>> Lawrence
>>
>>
>


[petsc-users] Strange GAMG performance for mixed FE formulation

2016-03-02 Thread Justin Chang
Dear all,

Using the firedrake project, I am solving this simple mixed poisson problem:

mesh = UnitCubeMesh(40,40,40)
V = FunctionSpace(mesh,"RT",1)
Q = FunctionSpace(mesh,"DG",0)
W = V*Q

v, p = TrialFunctions(W)
w, q = TestFunctions(W)

f = Function(Q)
f.interpolate(Expression("12*pi*pi*sin(pi*x[0]*2)*sin(pi*x[1]*2)*sin(2*pi*x[2])"))

a = dot(v,w)*dx - p*div(w)*dx + div(v)*q*dx
L = f*q*dx

u = Function(W)
solve(a==L,u,solver_parameters={...})

This problem has 1161600 degrees of freedom. The solver_parameters are:

-ksp_type gmres
-pc_type fieldsplit
-pc_fieldsplit_type schur
-pc_fieldsplit_schur_fact_type: upper
-pc_fieldsplit_schur_precondition selfp
-fieldsplit_0_ksp_type preonly
-fieldsplit_0_pc_type bjacobi
-fieldsplit_1_ksp_type preonly
-fieldsplit_1_pc_type hypre/ml/gamg

for the last option, I compared the wall-clock timings for hypre, ml,and
gamg. Here are the strong-scaling results (across 64 cores, 8 cores per
Intel Xeon E5-2670 node) for hypre, ml, and gamg:

hypre:
1 core: 47.5 s, 12 solver iters
2 cores: 34.1 s, 15 solver iters
4 cores: 21.5 s, 15 solver iters
8 cores: 16.6 s, 15 solver iters
16 cores: 10.2 s, 15 solver iters
24 cores: 7.66 s, 15 solver iters
32 cores: 6.31 s, 15 solver iters
40 cores: 5.68 s, 15 solver iters
48 cores: 5.36 s, 16 solver iters
56 cores: 5.12 s, 16 solver iters
64 cores: 4.99 s, 16 solver iters

ml:
1 core: 4.44 s, 14 solver iters
2 cores: 2.85 s, 16 solver iters
4 cores: 1.6 s, 17 solver iters
8 cores: 0.966 s, 17 solver iters
16 cores: 0.585 s, 18 solver iters
24 cores: 0.440 s, 18 solver iters
32 cores: 0.375 s, 18 solver iters
40 cores: 0.332 s, 18 solver iters
48 cores: 0.307 s, 17 solver iters
56 cores: 0.290 s, 18 solver iters
64 cores: 0.281 s, 18 solver items

gamg:
1 core: 613 s, 12 solver iters
2 cores: 204 s, 15 solver iters
4 cores: 77.1 s, 15 solver iters
8 cores: 38.1 s, 15 solver iters
16 cores: 15.9 s, 16 solver iters
24 cores: 9.24 s, 16 solver iters
32 cores: 5.92 s, 16 solver iters
40 cores: 4.72 s, 16 solver iters
48 cores: 3.89 s, 16 solver iters
56 cores: 3.65 s, 16 solver iters
64 cores: 3.46 s, 16 solver iters

The performance difference between ML and HYPRE makes sense to me, but what
I am really confused about is GAMG. It seems GAMG is really slow on a
single core but something internally is causing it to speed up
super-linearly as I increase the number of MPI processes. Shouldn't ML and
GAMG have the same performance? I am not sure what log outputs to give you
guys, but for starters, below is -ksp_view for the single core case with
GAMG

KSP Object:(solver_) 1 MPI processes

  type: gmres

GMRES: restart=30, using Classical (unmodified) Gram-Schmidt
Orthogonalization with no iterative refinement

GMRES: happy breakdown tolerance 1e-30

  maximum iterations=1, initial guess is zero

  tolerances:  relative=1e-07, absolute=1e-50, divergence=1.

  left preconditioning

  using PRECONDITIONED norm type for convergence test

PC Object:(solver_) 1 MPI processes

  type: fieldsplit

FieldSplit with Schur preconditioner, factorization UPPER

Preconditioner for the Schur complement formed from Sp, an assembled
approximation to S, which uses (lumped, if requested) A00's diagonal's
inverse

Split info:

Split number 0 Defined by IS

Split number 1 Defined by IS

KSP solver for A00 block

  KSP Object:  (solver_fieldsplit_0_)   1 MPI processes

type: preonly

maximum iterations=1, initial guess is zero

tolerances:  relative=1e-05, absolute=1e-50, divergence=1.

left preconditioning

using NONE norm type for convergence test

  PC Object:  (solver_fieldsplit_0_)   1 MPI processes

type: bjacobi

  block Jacobi: number of blocks = 1

  Local solve is same for all blocks, in the following KSP and PC
objects:

  KSP Object:  (solver_fieldsplit_0_sub_)   1 MPI
processes

type: preonly

maximum iterations=1, initial guess is zero

tolerances:  relative=1e-05, absolute=1e-50, divergence=1.

left preconditioning

using NONE norm type for convergence test

  PC Object:  (solver_fieldsplit_0_sub_)   1 MPI
processes

type: ilu

  ILU: out-of-place factorization

  0 levels of fill

  tolerance for zero pivot 2.22045e-14

  matrix ordering: natural

  factor fill ratio given 1., needed 1.

Factored matrix follows:

  Mat Object:   1 MPI processes

type: seqaij

rows=777600, cols=777600

package used to perform factorization: petsc

total: nonzeros=5385600, allocated nonzeros=5385600

total number of mallocs used during MatSetValues calls
=0

  not using I-node routines


[petsc-users] Do more solver iterations = more communication?

2016-02-18 Thread Justin Chang
Hi all,

For a poisson problem with roughly 1 million dofs (using second-order
elements), I solved the problem using two different solver/preconditioner
combinations: CG/ILU and CG/GAMG.

ILU takes roughly 82 solver iterations whereas with GAMG it takes 14
iterations (wall clock time is roughly 15 and 46 seconds respectively). I
have seen from previous mailing threads that there is a strong correlation
between solver iterations and communication (which could lead to less
strong-scaling scalability). It makes sense to me if I strictly use one of
these preconditioners to solve two different problems and compare the
number of respective iterations, but what about solving the same problem
with two different preconditioners?

If GAMG takes 14 iterations whereas ILU takes 82 iterations, does this
necessarily mean GAMG has less communication? I would think that the
"bandwidth" that happens within a single GAMG iteration would be much
greater than that within a single ILU iteration. Is there a way to
officially determine this?

I see from log_summary that we have this information:

MPI Messages: 5.000e+00  1.0   5.000e+00  5.000e+00
MPI Message Lengths:  5.816e+07  1.0   1.163e+07  5.816e+07
MPI Reductions:   2.000e+01  1.0

Can this information be used to determine the "bandwidth"? If so, does
PETSc have the ability to document this for other preconditioner packages
like HYPRE's BoomerAMG or Trilinos' ML?

Thanks,
Justin


[petsc-users] Optimization methods in PETSc/TAO

2016-01-22 Thread Justin Chang
Hi all,

Consider the following problem:

minimize  1/2 - 
subject to  c >= 0 (P1)

To solve (P1) using TAO, I recall that there were two recommended solvers
to use: TRON and BLMVM

I recently got reviews for this paper of mine that uses BLMVM and got
hammered for this, as I quote, "convenient yet inadequate choice" of
solver.

It was suggested that I use either semi smooth Newton methods or projected
Newton methods for the optimization problem. My question is, are these
methodologies/solvers available currently within PETSc/TAO?

1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year
ago but it didn't seem to work. I believe I was told by one of the PETSc
developers (Matt?) that this was not the one to use?

2) Is TRON a type of projected Newton method? I know it's an active-set
Newton trust region, but is this a well-accepted high performing
optimization method to use?

I was also referred to ROL: https://trilinos.org/packages/rol but I am
guessing this isn't accessible/downloadable from petsc at the moment?

Thanks,
Justin


Re: [petsc-users] Optimization methods in PETSc/TAO

2016-01-22 Thread Justin Chang
This was one of the citations provided:

M. Ulbrich, "Semismooth Newton Methods for Variational
Inequalities and Constrained Optimization Problems in
Function Spaces", SIAM, 2011,

Haven't looked into this in detail, but is what's described in that
equivalent to the SNESVINEWTONSSLS?

On Fri, Jan 22, 2016 at 3:42 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Fri, Jan 22, 2016 at 4:27 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Hi all,
>>
>> Consider the following problem:
>>
>> minimize  1/2<c,Kc> - <c,f>
>> subject to  c >= 0 (P1)
>>
>> To solve (P1) using TAO, I recall that there were two recommended solvers
>> to use: TRON and BLMVM
>>
>> I recently got reviews for this paper of mine that uses BLMVM and got
>> hammered for this, as I quote, "convenient yet inadequate choice" of
>> solver.
>>
>
> If they did not back this up with a citation it is just empty snobbery,
> not surprising from some quarters.
>
>
>> It was suggested that I use either semi smooth Newton methods or
>> projected Newton methods for the optimization problem. My question is, are
>> these methodologies/solvers available currently within PETSc/TAO?
>>
>
> You can Google TRON and BLMVM and they come up on the NEOS pages. BLMVM is
> a gradient descent method, but
> TRON is a Newton method, so trying it may silence the doubters.
>
>   Matt
>
>
>> 1) I see that we have SNESVINEWTONSSLS, and I tried this over half a year
>> ago but it didn't seem to work. I believe I was told by one of the PETSc
>> developers (Matt?) that this was not the one to use?
>>
>> 2) Is TRON a type of projected Newton method? I know it's an active-set
>> Newton trust region, but is this a well-accepted high performing
>> optimization method to use?
>>
>> I was also referred to ROL: https://trilinos.org/packages/rol but I am
>> guessing this isn't accessible/downloadable from petsc at the moment?
>>
>> Thanks,
>> Justin
>>
>
>
>
> --
> What most experimenters take for granted before they begin their
> experiments is infinitely more interesting than any results to which their
> experiments lead.
> -- Norbert Wiener
>


Re: [petsc-users] HPCToolKit/HPCViewer on OS X

2016-01-13 Thread Justin Chang
HPCToolkit for MacOSX doesn't require any installation.

Just go to:

http://hpctoolkit.org/download/hpcviewer/

and download this file:

hpctraceviewer-5.4.2-r20160111-macosx.cocoa.x86_64.zip

Important note: be sure to unzip the file via the terminal, not with
Finder. It may screw up the GUI. On my MacOSX I had to "Download Linked
File As..."

Then you can drag the corresponding hpctraceviewer.app into your
Applications directory. Now you *should* be good to go.

Thanks,
Justin

On Wed, Jan 13, 2016 at 10:17 PM, Griffith, Boyce Eugene <
boy...@email.unc.edu> wrote:

> I see one hot spot:
>
> On Jan 14, 2016, at 12:12 AM, Bhalla, Amneet Pal S 
> wrote:
>
>   ##
>   ##
>   #  WARNING!!!#
>   ##
>   #   This code was compiled with a debugging option,  #
>   #   To get timing results run ./configure#
>   #   using --with-debugging=no, the performance will  #
>   #   be generally two or three times faster.  #
>   ##
>   ##
>
>
>


Re: [petsc-users] Why use MATMPIBAIJ?

2016-01-13 Thread Justin Chang
Okay that makes sense, thanks

On Wed, Jan 13, 2016 at 10:12 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
> > On Jan 13, 2016, at 10:24 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Thanks Barry,
> >
> > 1) So for block matrices, the ja array is smaller. But what's the
> "hardware" explanation for this performance improvement? Does it have to do
> with spatial locality where you are more likely to reuse data in that ja
> array, or does it have to do with the fact that loading/storing smaller
> arrays are less likely to invoke a cache miss, thus reducing the amount of
> bandwidth?
>
> There are two distinct reasons for the improvement:
>
> 1) For 5 by 5 blocks the ja array is 1/25th the size. The "hardware"
> savings is that you have to load something that is much smaller than
> before. Cache/spatial locality have nothing to do with this particular
> improvement.
>
> 2) The other improvement comes from the reuse of each x[j] value
> multiplied by 5 values (a column) of the little block. The hardware
> explanation is that x[j] can be reused in a register for the 5 multiplies
> (while otherwise it would have to come from cache to register 5 times and
> sometimes might even have been flushed from the cache so would have to come
> from memory). This is why we have code like
>
> for (j=0; j<n; j++) {
>   xb= x + 5*(*idx++);
>   x1= xb[0]; x2 = xb[1]; x3 = xb[2]; x4 = xb[3]; x5 = xb[4];
>   sum1 += v[0]*x1 + v[5]*x2 + v[10]*x3  + v[15]*x4 + v[20]*x5;
>   sum2 += v[1]*x1 + v[6]*x2 + v[11]*x3  + v[16]*x4 + v[21]*x5;
>   sum3 += v[2]*x1 + v[7]*x2 + v[12]*x3  + v[17]*x4 + v[22]*x5;
>   sum4 += v[3]*x1 + v[8]*x2 + v[13]*x3  + v[18]*x4 + v[23]*x5;
>   sum5 += v[4]*x1 + v[9]*x2 + v[14]*x3  + v[19]*x4 + v[24]*x5;
>   v+= 25;
> }
>
> to do the block multiple.
>
> >
> > 2) So if one wants to assemble a monolithic matrix (i.e., aggregation of
> more than one dof per point) then using the BAIJ format is highly
> advisable. But if I want to form a nested matrix, say I am solving Stokes
> equation, then each "submatrix" is of AIJ format? Can these sub matrices
> also be BAIJ?
>
>Sure, but if you have separated all the variables of pressure,
> velocity_x, velocity_y, etc into there own regions of the vector then the
> block size for the sub matrices would be 1 so BAIJ does not help.
>
>There are Stokes solvers that use Vanka smoothing that keep the
> variables interlaced and hence would use BAIJ and NOT use fieldsplit
>
>
> >
> > Thanks,
> > Justin
> >
> > On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> >
> > > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychan...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > 1) I am guessing MATMPIBAIJ could theoretically have better
> performance than simply using MATMPIAIJ. Why is that? Is it similar to the
> reasoning that block (dense) matrix-vector multiply is "faster" than simple
> matrix-vector?
> >
> >   See for example table 1 in
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668=rep1=pdf
> >
> > >
> > > 2) I am looking through the manual and online documentation and it
> seems the term "block" used everywhere. In the section on "block matrices"
> (3.1.3 of the manual), it refers to field splitting, where you could either
> have a monolithic matrix or a nested matrix. Does that concept have
> anything to do with MATMPIBAIJ?
> >
> >Unfortunately the numerical analysis literature uses the term block
> in multiple ways. For small blocks, sometimes called "point-block" with
> BAIJ and for very large blocks (where the blocks are sparse themselves). I
> used fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
> > >
> > > It makes sense to me that one could create a BAIJ where if you have 5
> dofs of the same type of physics (e.g., five different primary species of a
> geochemical reaction) per grid point, you could create a block size of 5.
> And if you have different physics (e.g., velocity and pressure) you would
> ideally want to separate them out (i.e., nested matrices) for better
> preconditioning.
> >
> >Sometimes you put them together with BAIJ and sometimes you keep them
> separate with nested matrices.
> >
> > >
> > > Thanks,
> > > Justin
> >
> >
>
>


Re: [petsc-users] Why use MATMPIBAIJ?

2016-01-13 Thread Justin Chang
Thanks Barry,

1) So for block matrices, the ja array is smaller. But what's the
"hardware" explanation for this performance improvement? Does it have to do
with spatial locality where you are more likely to reuse data in that ja
array, or does it have to do with the fact that loading/storing smaller
arrays are less likely to invoke a cache miss, thus reducing the amount of
bandwidth?

2) So if one wants to assemble a monolithic matrix (i.e., aggregation of
more than one dof per point) then using the BAIJ format is highly
advisable. But if I want to form a nested matrix, say I am solving Stokes
equation, then each "submatrix" is of AIJ format? Can these sub matrices
also be BAIJ?

Thanks,
Justin

On Wed, Jan 13, 2016 at 9:12 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
> > On Jan 13, 2016, at 9:57 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Hi all,
> >
> > 1) I am guessing MATMPIBAIJ could theoretically have better performance
> than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning
> that block (dense) matrix-vector multiply is "faster" than simple
> matrix-vector?
>
>   See for example table 1 in
> http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.38.7668=rep1=pdf
>
> >
> > 2) I am looking through the manual and online documentation and it seems
> the term "block" used everywhere. In the section on "block matrices" (3.1.3
> of the manual), it refers to field splitting, where you could either have a
> monolithic matrix or a nested matrix. Does that concept have anything to do
> with MATMPIBAIJ?
>
>Unfortunately the numerical analysis literature uses the term block in
> multiple ways. For small blocks, sometimes called "point-block" with BAIJ
> and for very large blocks (where the blocks are sparse themselves). I used
> fieldsplit for big sparse blocks to try to avoid confusion in PETSc.
> >
> > It makes sense to me that one could create a BAIJ where if you have 5
> dofs of the same type of physics (e.g., five different primary species of a
> geochemical reaction) per grid point, you could create a block size of 5.
> And if you have different physics (e.g., velocity and pressure) you would
> ideally want to separate them out (i.e., nested matrices) for better
> preconditioning.
>
>Sometimes you put them together with BAIJ and sometimes you keep them
> separate with nested matrices.
>
> >
> > Thanks,
> > Justin
>
>


[petsc-users] Difference between Block Jacobi and ILU?

2016-01-13 Thread Justin Chang
Hi all,

What exactly is the difference between these two preconditioners? When I
use them to solve a Galerkin finite element poisson problem, I get the
exact same performance (iterations, wall-clock time, etc). Only thing is I
can't seem to use ILU in parallel though.

Thanks,
Justin


Re: [petsc-users] Difference between Block Jacobi and ILU?

2016-01-13 Thread Justin Chang
Thanks Satish,

And yes I meant sequentially.

On Wed, Jan 13, 2016 at 8:26 PM, Satish Balay <ba...@mcs.anl.gov> wrote:

> On Wed, 13 Jan 2016, Justin Chang wrote:
>
> > Hi all,
> >
> > What exactly is the difference between these two preconditioners? When I
> > use them to solve a Galerkin finite element poisson problem, I get the
> > exact same performance (iterations, wall-clock time, etc).
>
> you mean - when you run sequentially?
>
> With block jacobi - you decide the number of blocks. The default is
> 1-block/proc
> i.e - for sequnetial run you have only 1block i.e  the whole matrix.
>
> So the following are essentially the same:
> -pc_type bjacobi -pc_bjacobi_blocks 1 [default] -sub_pc_type ilu [default]
> -pc_type ilu
>
> Satish
>
> > Only thing is I can't seem to use ILU in parallel though.
>
>


[petsc-users] Why use MATMPIBAIJ?

2016-01-13 Thread Justin Chang
Hi all,

1) I am guessing MATMPIBAIJ could theoretically have better performance
than simply using MATMPIAIJ. Why is that? Is it similar to the reasoning
that block (dense) matrix-vector multiply is "faster" than simple
matrix-vector?

2) I am looking through the manual and online documentation and it seems
the term "block" used everywhere. In the section on "block matrices" (3.1.3
of the manual), it refers to field splitting, where you could either have a
monolithic matrix or a nested matrix. Does that concept have anything to do
with MATMPIBAIJ?

It makes sense to me that one could create a BAIJ where if you have 5 dofs
of the same type of physics (e.g., five different primary species of a
geochemical reaction) per grid point, you could create a block size of 5.
And if you have different physics (e.g., velocity and pressure) you would
ideally want to separate them out (i.e., nested matrices) for better
preconditioning.

Thanks,
Justin


Re: [petsc-users] Array of SNES's

2016-01-05 Thread Justin Chang
Okay so i think there's no need for this in my case.

Doing a standard NewtonLS and using the same SNES was no issue at all. My
original issue was dealing with the Variational Inequality at each grid
point which seemed to break down unless I "reset" the SNES. But when I use
these options: -snes_fd -ksp_type preonly -pc_type lu, it works perfectly
if I use the same snes for all N grid points. Yes I sacrifice some time
forming a FD, but it's not as great as creating new SNES objects each time.

Strange

Justin

On Tue, Jan 5, 2016 at 9:30 PM, Patrick Sanan <patrick.sa...@gmail.com>
wrote:

> Do all the SNES's need to be constructed at the same time? It will
> obviously require a lot of memory to store N SNES objects (or perhaps
> your N is small), and if they don't all need to exist simultaneously, then
> do you have the option to
> create and destroy one at a time as you loop over your grid points?
> On Tue, Jan 05, 2016 at 09:24:11PM -0700, Justin Chang wrote:
> > Timothee,
> >
> > No i haven't tried, mainly because I don't know how. Btw I am not doing
> > this in C or FORTRAN, I want to do this in python (via petsc4py) since I
> am
> > trying to make this compatible with Firedrake (which is also
> python-based).
> >
> > Thanks,
> > Justin
> >
> > On Tue, Jan 5, 2016 at 8:57 PM, Timothée Nicolas <
> timothee.nico...@gmail.com
> > > wrote:
> >
> > > Hello and happy new year,
> > >
> > > Have you actually tried ? I just declared an array of 10 snes and
> created
> > > them, and there is no complaint whatsoever. Also, something I do
> usually is
> > > that I declare a derived type which contains some Petsc Objects (like
> SNES,
> > > KSP, matrices, vectors, whatever), and create arrays of this derived
> types.
> > > This works perfectly fine in my case (I use FORTRAN btw).
> > >
> > > Best wishes
> > >
> > > Timothee
> > >
> > >
> > > 2016-01-06 11:53 GMT+09:00 Justin Chang <jychan...@gmail.com>:
> > >
> > >> Hi all,
> > >>
> > >> Is it possible to create an array of SNES's? If I have a problem size
> N
> > >> degrees of freedom, I want each dof to have its own SNES solver
> (basically
> > >> a pointer to N SNES's). Reason for this is because I am performing a
> > >> "post-processing" step where after my global solve, each entry of my
> > >> solution vector of size N will go through some algebraic manipulation.
> > >>
> > >> If I did a standard LU solve for these individual SNES's, I could use
> the
> > >> same snes and this issue would be moot. But i am using the Variational
> > >> Inequality, which basically requires a fresh SNES for each problem.
> > >>
> > >> Thanks,
> > >> Justin
> > >>
> > >
> > >
>


[petsc-users] Array of SNES's

2016-01-05 Thread Justin Chang
Hi all,

Is it possible to create an array of SNES's? If I have a problem size N
degrees of freedom, I want each dof to have its own SNES solver (basically
a pointer to N SNES's). Reason for this is because I am performing a
"post-processing" step where after my global solve, each entry of my
solution vector of size N will go through some algebraic manipulation.

If I did a standard LU solve for these individual SNES's, I could use the
same snes and this issue would be moot. But i am using the Variational
Inequality, which basically requires a fresh SNES for each problem.

Thanks,
Justin


Re: [petsc-users] Array of SNES's

2016-01-05 Thread Justin Chang
Timothee,

No i haven't tried, mainly because I don't know how. Btw I am not doing
this in C or FORTRAN, I want to do this in python (via petsc4py) since I am
trying to make this compatible with Firedrake (which is also python-based).

Thanks,
Justin

On Tue, Jan 5, 2016 at 8:57 PM, Timothée Nicolas <timothee.nico...@gmail.com
> wrote:

> Hello and happy new year,
>
> Have you actually tried ? I just declared an array of 10 snes and created
> them, and there is no complaint whatsoever. Also, something I do usually is
> that I declare a derived type which contains some Petsc Objects (like SNES,
> KSP, matrices, vectors, whatever), and create arrays of this derived types.
> This works perfectly fine in my case (I use FORTRAN btw).
>
> Best wishes
>
> Timothee
>
>
> 2016-01-06 11:53 GMT+09:00 Justin Chang <jychan...@gmail.com>:
>
>> Hi all,
>>
>> Is it possible to create an array of SNES's? If I have a problem size N
>> degrees of freedom, I want each dof to have its own SNES solver (basically
>> a pointer to N SNES's). Reason for this is because I am performing a
>> "post-processing" step where after my global solve, each entry of my
>> solution vector of size N will go through some algebraic manipulation.
>>
>> If I did a standard LU solve for these individual SNES's, I could use the
>> same snes and this issue would be moot. But i am using the Variational
>> Inequality, which basically requires a fresh SNES for each problem.
>>
>> Thanks,
>> Justin
>>
>
>


[petsc-users] Status on parallel mesh reader in DMPlex

2015-12-18 Thread Justin Chang
Hi all,

What's the status on the implementation of the parallel
mesh reader/generator for DMPlex meshes? Is anyone actively working on
this? If so is there a branch that I can peek into?

Thanks,
Justin


Re: [petsc-users] Distributing data along with DMPlex

2015-12-17 Thread Justin Chang
So you would use something like DMPlexDistributeField()
<http://www.mcs.anl.gov/petsc/petsc-dev/docs/manualpages/DM/DMPlexDistributeField.html>
in
that case. You have your original/global DM, PetscSection, and Vec of
values. Then using the PetscSF that was created during DMPlexDistribute,
you map the global stuff into the local stuff.

On Thu, Dec 17, 2015 at 11:21 AM, Alejandro D Otero <aot...@fi.uba.ar>
wrote:

> Thanks,
> The problem then is that after DMPlexDistribute the DMPlex 'points' are
> renumbered. So if the values are related to each point in the original
> numbering how do I set the values after the distribution. I know the
> property stored in the vector related to the entities with the numbering of
> the original mesh which I use to create the first DMPlex.
>
> Ideally for me, I would like to set the values in the vector before
> DMPlexDistribute and get the vector components renumbered and redistributed
> accordingly in a global vector. And then, get the local vector.
>
> Hope it could be more clear now.
> Regards,
>
> Alejandro
>
>
>
> On Wed, Dec 16, 2015 at 7:01 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> I think you would follow this order:
>>
>> 1*) create a DMPlex (depth, chart, etc) on rank 0. Other ranks have an
>> empty DM
>>
>> 2) DMPlexDistribute()
>>
>> 3*) Create the PetscSection
>>
>> 4) DMCreateGlobalVector()
>>
>> 5) DMCreateLocalVector()
>>
>> Now you have a global vector and a local vector for your distributed
>> DMPlex. The mapping/ghosting/etc of dofs is already taken care of.
>>
>> * if you're using standard Galerkin FE then in SNES examples 12 and 62
>> (and maybe others?) the first step is handled through the mesh generation
>> functions and step 3 is handled through step 4
>>
>> Thanks,
>> Justin
>>
>> On Wednesday, December 16, 2015, Alejandro D Otero <aot...@fi.uba.ar>
>> wrote:
>>
>>> Hi, I need some help understanding how to distribute data together with
>>> a dmplex representing a FE mesh.
>>> At the beginning I define the structure of the dmplex assigning certain
>>> number of DoF to cells, edges and vertexes, in one process (the dmplex in
>>> the rest is empty)
>>> I create a petscsecton and I create an associated global vector with the
>>> quantities I want to store.
>>> Then I distribute the dmplex over all the processes.
>>> * Although this does not perform well it is just a starting point. I
>>> know it has to be improved.
>>>
>>> I would like to have the global vector distributed accordingly so that
>>> each process has access to the corresponding local part with its DoF
>>> (possibly adding some ghost values corresponding to the shared DoF not
>>> taken care by it).
>>>
>>> Is there any 'correct' way to do that in PETSc?
>>>
>>> Thanks in advance,
>>>
>>> Alejandro
>>>
>>
>


Re: [petsc-users] Distributing data along with DMPlex

2015-12-16 Thread Justin Chang
I think you would follow this order:

1*) create a DMPlex (depth, chart, etc) on rank 0. Other ranks have an
empty DM

2) DMPlexDistribute()

3*) Create the PetscSection

4) DMCreateGlobalVector()

5) DMCreateLocalVector()

Now you have a global vector and a local vector for your distributed
DMPlex. The mapping/ghosting/etc of dofs is already taken care of.

* if you're using standard Galerkin FE then in SNES examples 12 and 62 (and
maybe others?) the first step is handled through the mesh generation
functions and step 3 is handled through step 4

Thanks,
Justin

On Wednesday, December 16, 2015, Alejandro D Otero  wrote:

> Hi, I need some help understanding how to distribute data together with a
> dmplex representing a FE mesh.
> At the beginning I define the structure of the dmplex assigning certain
> number of DoF to cells, edges and vertexes, in one process (the dmplex in
> the rest is empty)
> I create a petscsecton and I create an associated global vector with the
> quantities I want to store.
> Then I distribute the dmplex over all the processes.
> * Although this does not perform well it is just a starting point. I know
> it has to be improved.
>
> I would like to have the global vector distributed accordingly so that
> each process has access to the corresponding local part with its DoF
> (possibly adding some ghost values corresponding to the shared DoF not
> taken care by it).
>
> Is there any 'correct' way to do that in PETSc?
>
> Thanks in advance,
>
> Alejandro
>


Re: [petsc-users] Solving/creating SPD systems

2015-12-11 Thread Justin Chang
Pardon me for my apparent lack of understanding over what may be simple
concepts, but why is div[u]*div[v] singular in the context of LSFEM?

On Fri, Dec 11, 2015 at 12:15 PM, Jed Brown <j...@jedbrown.org> wrote:

> Justin Chang <jychan...@gmail.com> writes:
>
> > Jed,
> >
> > 1) What exactly are the PETSc options for CGNE?
>
> -ksp_type cgne
>
> (Conjugate Gradients on the Normal Equations)
>
> > Also, would LSQR be worth trying? I am doing all of this through
> > Firedrake, so I hope these things can be done directly through simply
> > providing command line PETSc options :)
>
> You can try, but I think this line of thinking is getting off in the weeds.
>
> > 2) So i spoke with Matt the other day, and the primary issue I am having
> > with LSFEM is finding a suitable preconditioner for the problematic
> penalty
> > term in Darcy's equation (i.e., the div-div term). So if I had this:
> >
> > (u, v) + (u, grad(q)) + (grad(p), v) + (grad(p), grad(q)) + (div(u),
> > (div(v)) = (rho*b, v + grad(q))
> >
> > If I remove the div-div term, I have a very nice SPD system which could
> > simply be solved with CG/HYPRE. Do you know of any good preconditioning
> > strategies for this type of problem?
>
> That term is singular, so if the penalty is strong, it will be a bear to
> solve.
>
> Penalties suck.
>
> Sometimes you can add more variables to get better compatibility.  See
> FOSLL*, for example.
>
> My opinion is that least squares methods are riddled with lame
> compromises and tradeoffs that you shouldn't have to make.  If you want
> something robust, use compatible spaces and (possibly) deal with the
> fact that you are solving a saddle point problem.
>


Re: [petsc-users] Solving/creating SPD systems

2015-12-11 Thread Justin Chang
Jed,

1) What exactly are the PETSc options for CGNE? Also, would LSQR be worth
trying? I am doing all of this through Firedrake, so I hope these things
can be done directly through simply providing command line PETSc options :)

2) So i spoke with Matt the other day, and the primary issue I am having
with LSFEM is finding a suitable preconditioner for the problematic penalty
term in Darcy's equation (i.e., the div-div term). So if I had this:

(u, v) + (u, grad(q)) + (grad(p), v) + (grad(p), grad(q)) + (div(u),
(div(v)) = (rho*b, v + grad(q))

If I remove the div-div term, I have a very nice SPD system which could
simply be solved with CG/HYPRE. Do you know of any good preconditioning
strategies for this type of problem?

Thanks,
Justin

On Thu, Dec 10, 2015 at 9:13 PM, Jed Brown <j...@jedbrown.org> wrote:

> Justin Chang <jychan...@gmail.com> writes:
>
> > So I am wanting to compare the performance of various FEM discretization
> > with their respective "best" possible solver/pre conditioner. There
> > are saddle-point systems which HDiv formulations like RT0 work, but then
> > there are others like LSFEM that are naturally SPD and so the CG solver
> can
> > be used (though finding a good preconditioner is still an open problem).
>
> LSFEM ensures an SPD system because it's effectively solving the normal
> equations.  You can use CGNE if you want, but it's not likely to work
> well.  Note that LS methods give up local conservation, which was the
> reason to choose a H(div) formulation in the first place.  You can use
> compatible spaces/Lagrange multiplies with LS techniques to maintain
> conservation, but you'll return to a saddle point problem (with more
> degrees of freedom).  There's no free lunch.
>
> > I have read and learned that the advantage of LSFEM is that it will
> always
> > give you an SPD system, even for non-linear problems (because what you do
> > is linearize the problem first and then minimize/take the Gateaux
> > derivative to get the weak form). But after talking to some people and
> > reading some stuff online, it seems one could also make non SPD systems
> SPD
> > (hence eliminating what may be the only advantage of LSFEM).
>
> (Symmetric) saddle point problems have an SPD Schur complement.  Schur
> complements are dense in general, but some discretization choices (e.g.,
> quadrature in BDM) can make the primal block diagonal or block-diagonal,
> resulting in a sparse Schur complement.  If the Schur complement is
> dense, you might be able to approximate it by a sparse matrix.  The
> quality of such an approximation depends on the physics and the
> discretization.
>


[petsc-users] Fwd: Issues with the Variational Inequality solver

2015-12-03 Thread Justin Chang
Sorry forgot to hit reply all

-- Forwarded message --
From: Justin Chang <jychan...@gmail.com>
Date: Thu, Dec 3, 2015 at 2:40 PM
Subject: Re: [petsc-users] Issues with the Variational Inequality solver
To: Matthew Knepley <knep...@gmail.com>


Matt, for these problems I set an upper bound of 10, so I am not sure
that's related to the problem at hand. However, that is the *next* issue I
will bring up if this current one is resolved ;)

Another thing I found out...

The five case studies I listed used no preconditioner. When I use the
jacobi preconditioner, the Newton and VI solutions are identical.

Whereas these problems now fail:

with Newton:

Case 0:

psiA: 6.297272e-01

psiC: 1.705043e-08

k: 2.00

cA: 6.297272e-01

cB: 4.616557e-16

cC: 1.705043e-08

Case 1:

psiA: 7.570078e-01

psiC: 1.754758e-08

k: 2.00

cA: 7.570078e-01

cB: 4.067561e-16

cC: 1.754758e-08

with VI:

Case 0:

psiA: 6.297272e-01

psiC: 1.705043e-08

k: 2.00

cA: -7.684746e-17

cB: 5.362599e-09

cC: 4.616557e-16

Case 1:

psiA: 7.570078e-01

psiC: 1.754758e-08

k: 2.00

cA: 3.781787e-16

cB: 1.152521e-08

cC: 4.067561e-16

For the first set of problems, Jacobi works but no preconditioner fails.
For the second set of problems, Jacobi fails but no preconditioner works.

??

Justin

On Thu, Dec 3, 2015 at 2:26 PM, Matthew Knepley <knep...@gmail.com> wrote:

> On Thu, Dec 3, 2015 at 3:00 PM, Justin Chang <jychan...@gmail.com> wrote:
>
>> Thanks Barry,
>>
>> Next issue:
>>
>
> Barry, when Justin described this problem to me, it sounded like there
> might be a bug in the active set method which
> put in the lower value when it was supposed to be the upper value.
>
>Matt
>
>
>> Consider the following problem:
>>
>> psiA = cA - 2*cB (1a)
>> psiC = cC + 2*cB(1b)
>> cC^2 = k*cA^2*cB  (1c)
>>
>> where psiA, psiC, and k are given. If I solve these five problems using
>> the standard Newton method:
>>
>> Case 0:
>>
>> psiA: 9.400282e-01
>>
>> psiC: 6.045961e-09
>>
>> k: 2.00
>>
>> cA: 9.400282e-01
>>
>> cB: 1.555428e-16
>>
>> cC: 6.045961e-09
>>
>> Case 1:
>>
>> psiA: 8.472901e-01
>>
>> psiC: 7.425271e-09
>>
>> k: 2.00
>>
>> cA: 8.472901e-01
>>
>> cB: 2.602870e-16
>>
>> cC: 7.425270e-09
>>
>> Case 2:
>>
>> psiA: 8.337199e-01
>>
>> psiC: 7.831614e-09
>>
>> k: 2.00
>>
>> cA: 8.337199e-01
>>
>> cB: 2.942675e-16
>>
>> cC: 7.831613e-09
>>
>> Case 3:
>>
>> psiA: 8.268140e-01
>>
>> psiC: 7.912911e-09
>>
>> k: 2.00
>>
>> cA: 8.268140e-01
>>
>> cB: 3.029178e-16
>>
>> cC: 7.912910e-09
>>
>> Case 4:
>>
>> psiA: 9.852477e-01
>>
>> psiC: 7.992223e-09
>>
>> k: 2.00
>>
>> cA: 9.852477e-01
>>
>> cB: 2.593282e-16
>>
>> cC: 7.99e-09
>>
>> These solutions are "correct", that is if I plug the c's back into
>> equations (1a)-(1b), i get the original psi's.
>>
>> Now suppose I use the Variational Inequality such that cA/B/C >= 0, I
>> would expect to get the exact same solution (since the c's would be
>> non-negative regardless). But instead I get these solutions:
>>
>> Case 0:
>>
>> psiA: 9.400282e-01
>>
>> psiC: 6.045961e-09
>>
>> k: 2.00
>>
>> cA: 1.318866e-16
>>
>> cB: 3.097570e-08
>>
>> cC: 6.045961e-09
>>
>> Case 1:
>>
>> psiA: 8.472901e-01
>>
>> psiC: 7.425271e-09
>>
>> k: 2.00
>>
>> cA: 4.624944e-17
>>
>> cB: 3.015792e-08
>>
>> cC: 7.425271e-09
>>
>> Case 2:
>>
>> psiA: 8.337199e-01
>>
>> psiC: 7.831614e-09
>>
>> k: 2.00
>>
>> cA: 1.733276e-16
>>
>> cB: 3.079856e-08
>>
>> cC: 7.831614e-09
>>
>> Case 3:
>>
>> psiA: 8.268140e-01
>>
>> psiC: 7.912911e-09
>>
>> k: 2.00
>>
>> cA: 1.721078e-16
>>
>> cB: 3.061785e-08
>>
>> cC: 7.912911e-09
>>
>> Case 4:
>>
>> psiA: 9.852477e-01
>>
>> psiC: 7.992223e-09
>>
>> k: 2.00
>>
>> cA: 2.666770e-16
>>
>> cB: 4.610822e-08
>>
>> cC: 7.992223e-09
>>
>> Basically, the cA's shoot down to zero and the cB's are slightly greater
>> than zero. When I plug the c's into equations (1a) - (1b), I do not get the
>> cor

Re: [petsc-users] Issues with the Variational Inequality solver

2015-12-03 Thread Justin Chang
Thanks Barry,

Next issue:

Consider the following problem:

psiA = cA - 2*cB (1a)
psiC = cC + 2*cB(1b)
cC^2 = k*cA^2*cB  (1c)

where psiA, psiC, and k are given. If I solve these five problems using the
standard Newton method:

Case 0:

psiA: 9.400282e-01

psiC: 6.045961e-09

k: 2.00

cA: 9.400282e-01

cB: 1.555428e-16

cC: 6.045961e-09

Case 1:

psiA: 8.472901e-01

psiC: 7.425271e-09

k: 2.00

cA: 8.472901e-01

cB: 2.602870e-16

cC: 7.425270e-09

Case 2:

psiA: 8.337199e-01

psiC: 7.831614e-09

k: 2.00

cA: 8.337199e-01

cB: 2.942675e-16

cC: 7.831613e-09

Case 3:

psiA: 8.268140e-01

psiC: 7.912911e-09

k: 2.00

cA: 8.268140e-01

cB: 3.029178e-16

cC: 7.912910e-09

Case 4:

psiA: 9.852477e-01

psiC: 7.992223e-09

k: 2.00

cA: 9.852477e-01

cB: 2.593282e-16

cC: 7.99e-09

These solutions are "correct", that is if I plug the c's back into
equations (1a)-(1b), i get the original psi's.

Now suppose I use the Variational Inequality such that cA/B/C >= 0, I would
expect to get the exact same solution (since the c's would be non-negative
regardless). But instead I get these solutions:

Case 0:

psiA: 9.400282e-01

psiC: 6.045961e-09

k: 2.00

cA: 1.318866e-16

cB: 3.097570e-08

cC: 6.045961e-09

Case 1:

psiA: 8.472901e-01

psiC: 7.425271e-09

k: 2.00

cA: 4.624944e-17

cB: 3.015792e-08

cC: 7.425271e-09

Case 2:

psiA: 8.337199e-01

psiC: 7.831614e-09

k: 2.00

cA: 1.733276e-16

cB: 3.079856e-08

cC: 7.831614e-09

Case 3:

psiA: 8.268140e-01

psiC: 7.912911e-09

k: 2.00

cA: 1.721078e-16

cB: 3.061785e-08

cC: 7.912911e-09

Case 4:

psiA: 9.852477e-01

psiC: 7.992223e-09

k: 2.00

cA: 2.666770e-16

cB: 4.610822e-08

cC: 7.992223e-09

Basically, the cA's shoot down to zero and the cB's are slightly greater
than zero. When I plug the c's into equations (1a) - (1b), I do not get the
correct solution at all. I would expect discrepancies if my c's were
originally negative, but for these five problems, shouldn't VI give the
same answer as the Newton's method?

Attached is the petsc4py code, the two input files containing psiA and
psiC, and the outputs from '-snes_monitor -snes_view
-ksp_monitor_true_residual'. Run as:

python testVI.py testA testC 2 <0/1>

where 0 is Newton, 1 is VI.

Know what's going on here?

Thanks,
Justin

On Wed, Dec 2, 2015 at 1:36 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:

>
> > On Dec 2, 2015, at 1:56 PM, Justin Chang <jychan...@gmail.com> wrote:
> >
> > Barry,
> >
> > So I do not believe there is any problem with the ISEqual() per se,
> because what I am doing is I am solving 5151 different VI problem using the
> same SNES/KSP/PC/Mat. That is, I am not "resetting" these data structures
> once I am done with one problem and move on to the next. If I ran each case
> individually, I get no error; the error comes when I run these problems
> sequentially without resetting the SNES after each problem.
> >
> > I haven't run the C debugger or done any of that yet, but could this be
> a plausible explanation for my error?
>
>This is exactly the issue. Independent of VIs if you change the size of
> a problem you pass to SNES you need to call SNESReset() between each call
> of a different sizes.
>
> > Originally I was thinking about creating/destroying SNES for each
> problem but I was wondering if that might affect performance.
>
>Using SNESReset() is the way you should do it. Creating and destroying
> it each time should only be a tiny, immeasurably slower.
>
>   Barry
>
> >
> > Thanks,
> > Justin
> >
> > On Tue, Dec 1, 2015 at 5:58 PM, Barry Smith <bsm...@mcs.anl.gov> wrote:
> >
> > > On Dec 1, 2015, at 5:13 PM, Justin Chang <jychan...@gmail.com> wrote:
> > >
> > > Hi all,
> > >
> > > So I am running into some issues with the SNESVINEWTONRSLS solver. I
> originally had this done in firedrake, but have ported it to petsc4py.
> Attached is the code as well as the two required input files.
> > >
> > > First issue, when I run the program like this:
> > >
> > > python testVI.py psiA_1 psiB_1 2 1
> > >
> > > I get this error:
> > >
> > > Traceback (most recent call last):
> > >   File "testVI.py", line 103, in 
> > > snes.solve(None,X)
> > >   File "PETSc/SNES.pyx", line 520, in petsc4py.PETSc.SNES.solve
> (src/petsc4py.PETSc.c:169677)
> > > petsc4py.PETSc.Error: error code 60
> > > [0] SNESSolve() line 3992 in
> /Users/justin/Software/petsc/src/snes/interface/snes.c
> > > [0] SNESSolve_VINEWTONRSLS() line 507 in
> /Users/justin/Software/petsc/src/snes/impls/vi/rs/virs.c
> > > [0] KSPSetOperators() line 544 in
> /Users/justin/So

  1   2   3   >