Re: [petsc-users] pcfieldsplit for a composite dm with multiple subfields

Barry Smith Mon, 07 Sep 2015 18:02:57 -0700

  My guess is the Jacobian is not correct (or correct "enough"), hence PETSc 
SNES is generating a poor descent direction. You can try 
-snes_mf_operator -ksp_monitor_true residual as additional arguments. What 
happens?


  Barry



> On Sep 7, 2015, at 7:49 PM, Gideon Simpson <[email protected]> wrote:
> 
> No problem Matt, I don’t think we had previously discussed that output.  Here 
> is a case where things fail.
> 
>       0 SNES Function norm 4.027481756921e-09 
>       1 SNES Function norm 1.760477878365e-12 
>     Nonlinear solve converged due to CONVERGED_SNORM_RELATIVE iterations 1
>     0 SNES Function norm 5.066222213176e+03 
>     1 SNES Function norm 8.484697184230e+02 
>     2 SNES Function norm 6.549559723294e+02 
>     3 SNES Function norm 5.770723278153e+02 
>     4 SNES Function norm 5.237702240594e+02 
>     5 SNES Function norm 4.753909019848e+02 
>     6 SNES Function norm 4.221784590755e+02 
>     7 SNES Function norm 3.806525080483e+02 
>     8 SNES Function norm 3.762054656019e+02 
>     9 SNES Function norm 3.758975226873e+02 
>    10 SNES Function norm 3.757032042706e+02 
>    11 SNES Function norm 3.728798164234e+02 
>    12 SNES Function norm 3.723078741075e+02 
>    13 SNES Function norm 3.721848059825e+02 
>    14 SNES Function norm 3.720227575629e+02 
>    15 SNES Function norm 3.720051998555e+02 
>    16 SNES Function norm 3.718945430587e+02 
>    17 SNES Function norm 3.700412694044e+02 
>    18 SNES Function norm 3.351964889461e+02 
>    19 SNES Function norm 3.096016086233e+02 
>    20 SNES Function norm 3.008410789787e+02 
>    21 SNES Function norm 2.752316716557e+02 
>    22 SNES Function norm 2.707658474165e+02 
>    23 SNES Function norm 2.698436736049e+02 
>    24 SNES Function norm 2.618233857172e+02 
>    25 SNES Function norm 2.600121920634e+02 
>    26 SNES Function norm 2.585046423168e+02 
>    27 SNES Function norm 2.568551090220e+02 
>    28 SNES Function norm 2.556404537064e+02 
>    29 SNES Function norm 2.536353523683e+02 
>    30 SNES Function norm 2.533596070171e+02 
>    31 SNES Function norm 2.532324379596e+02 
>    32 SNES Function norm 2.531842335211e+02 
>    33 SNES Function norm 2.531684527520e+02 
>    34 SNES Function norm 2.531637604618e+02 
>    35 SNES Function norm 2.531624767821e+02 
>    36 SNES Function norm 2.531621359093e+02 
>    37 SNES Function norm 2.531620504925e+02 
>    38 SNES Function norm 2.531620350055e+02 
>    39 SNES Function norm 2.531620310522e+02 
>    40 SNES Function norm 2.531620300471e+02 
>    41 SNES Function norm 2.531620298084e+02 
>    42 SNES Function norm 2.531620297478e+02 
>    43 SNES Function norm 2.531620297324e+02 
>    44 SNES Function norm 2.531620297303e+02 
>    45 SNES Function norm 2.531620297302e+02 
>   Nonlinear solve did not converge due to DIVERGED_LINE_SEARCH iterations 45
>   0 SNES Function norm 9.636339304380e+03 
>   1 SNES Function norm 8.997731184634e+03 
>   2 SNES Function norm 8.120498349232e+03 
>   3 SNES Function norm 7.322379894820e+03 
>   4 SNES Function norm 6.599581599149e+03 
>   5 SNES Function norm 6.374872854688e+03 
>   6 SNES Function norm 6.372518007653e+03 
>   7 SNES Function norm 6.073996314301e+03 
>   8 SNES Function norm 5.635965277054e+03 
>   9 SNES Function norm 5.155389064046e+03 
>  10 SNES Function norm 5.080567902638e+03 
>  11 SNES Function norm 5.058878643969e+03 
>  12 SNES Function norm 5.058835649793e+03 
>  13 SNES Function norm 5.058491285707e+03 
>  14 SNES Function norm 5.057452865337e+03 
>  15 SNES Function norm 5.057226140688e+03 
>  16 SNES Function norm 5.056651272898e+03 
>  17 SNES Function norm 5.056575190057e+03 
>  18 SNES Function norm 5.056574632598e+03 
>  19 SNES Function norm 5.056574520229e+03 
>  20 SNES Function norm 5.056574492569e+03 
>  21 SNES Function norm 5.056574485124e+03 
>  22 SNES Function norm 5.056574483029e+03 
>  23 SNES Function norm 5.056574482427e+03 
>  24 SNES Function norm 5.056574482302e+03 
>  25 SNES Function norm 5.056574482287e+03 
>  26 SNES Function norm 5.056574482282e+03 
>  27 SNES Function norm 5.056574482281e+03 
> Nonlinear solve did not converge due to DIVERGED_LINE_SEARCH iterations 27
> SNES Object: 1 MPI processes
>   type: newtonls
>   maximum iterations=50, maximum function evaluations=10000
>   tolerances: relative=1e-08, absolute=1e-50, solution=1e-08
>   total number of linear solver iterations=28
>   total number of function evaluations=323
>   total number of grid sequence refinements=2
>   SNESLineSearch Object:   1 MPI processes
>     type: bt
>       interpolation: cubic
>       alpha=1.000000e-04
>     maxstep=1.000000e+08, minlambda=1.000000e-12
>     tolerances: relative=1.000000e-08, absolute=1.000000e-15, 
> lambda=1.000000e-08
>     maximum iterations=40
>   KSP Object:   1 MPI processes
>     type: gmres
>       GMRES: restart=30, using Classical (unmodified) Gram-Schmidt 
> Orthogonalization with no iterative refinement
>       GMRES: happy breakdown tolerance 1e-30
>     maximum iterations=10000, initial guess is zero
>     tolerances:  relative=1e-05, absolute=1e-50, divergence=10000
>     left preconditioning
>     using PRECONDITIONED norm type for convergence test
>   PC Object:   1 MPI processes
>     type: lu
>       LU: out-of-place factorization
>       tolerance for zero pivot 2.22045e-14
>       matrix ordering: nd
>       factor fill ratio given 0, needed 0
>         Factored matrix follows:
>           Mat Object:           1 MPI processes
>             type: seqaij
>             rows=15991, cols=15991
>             package used to perform factorization: mumps
>             total: nonzeros=255801, allocated nonzeros=255801
>             total number of mallocs used during MatSetValues calls =0
>               MUMPS run parameters:
>                 SYM (matrix type):                   0 
>                 PAR (host participation):            1 
>                 ICNTL(1) (output for error):         6 
>                 ICNTL(2) (output of diagnostic msg): 0 
>                 ICNTL(3) (output for global info):   0 
>                 ICNTL(4) (level of printing):        0 
>                 ICNTL(5) (input mat struct):         0 
>                 ICNTL(6) (matrix prescaling):        7 
>                 ICNTL(7) (sequentia matrix ordering):6 
>                 ICNTL(8) (scalling strategy):        77 
>                 ICNTL(10) (max num of refinements):  0 
>                 ICNTL(11) (error analysis):          0 
>                 ICNTL(12) (efficiency control):                         1 
>                 ICNTL(13) (efficiency control):                         0 
>                 ICNTL(14) (percentage of estimated workspace increase): 20 
>                 ICNTL(18) (input mat struct):                           0 
>                 ICNTL(19) (Shur complement info):                       0 
>                 ICNTL(20) (rhs sparse pattern):                         0 
>                 ICNTL(21) (somumpstion struct):                            0 
>                 ICNTL(22) (in-core/out-of-core facility):               0 
>                 ICNTL(23) (max size of memory can be allocated locally):0 
>                 ICNTL(24) (detection of null pivot rows):               0 
>                 ICNTL(25) (computation of a null space basis):          0 
>                 ICNTL(26) (Schur options for rhs or solution):          0 
>                 ICNTL(27) (experimental parameter):                     -8 
>                 ICNTL(28) (use parallel or sequential ordering):        1 
>                 ICNTL(29) (parallel ordering):                          0 
>                 ICNTL(30) (user-specified set of entries in inv(A)):    0 
>                 ICNTL(31) (factors is discarded in the solve phase):    0 
>                 ICNTL(33) (compute determinant):                        0 
>                 CNTL(1) (relative pivoting threshold):      0.01 
>                 CNTL(2) (stopping criterion of refinement): 1.49012e-08 
>                 CNTL(3) (absomumpste pivoting threshold):      0 
>                 CNTL(4) (vamumpse of static pivoting):         -1 
>                 CNTL(5) (fixation for null pivots):         0 
>                 RINFO(1) (local estimated flops for the elimination after 
> analysis): 
>                   [0] 1.95838e+06 
>                 RINFO(2) (local estimated flops for the assembly after 
> factorization): 
>                   [0]  143924 
>                 RINFO(3) (local estimated flops for the elimination after 
> factorization): 
>                   [0]  1.95943e+06 
>                 INFO(15) (estimated size of (in MB) MUMPS internal data for 
> running numerical factorization): 
>                 [0] 7 
>                 INFO(16) (size of (in MB) MUMPS internal data used during 
> numerical factorization): 
>                   [0] 7 
>                 INFO(23) (num of pivots eliminated on this processor after 
> factorization): 
>                   [0] 15991 
>                 RINFOG(1) (global estimated flops for the elimination after 
> analysis): 1.95838e+06 
>                 RINFOG(2) (global estimated flops for the assembly after 
> factorization): 143924 
>                 RINFOG(3) (global estimated flops for the elimination after 
> factorization): 1.95943e+06 
>                 (RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0,0)*(2^0)
>                 INFOG(3) (estimated real workspace for factors on all 
> processors after analysis): 255801 
>                 INFOG(4) (estimated integer workspace for factors on all 
> processors after analysis): 127874 
>                 INFOG(5) (estimated maximum front size in the complete tree): 
> 11 
>                 INFOG(6) (number of nodes in the complete tree): 3996 
>                 INFOG(7) (ordering option effectively use after analysis): 6 
>                 INFOG(8) (structural symmetry in percent of the permuted 
> matrix after analysis): 86 
>                 INFOG(9) (total real/complex workspace to store the matrix 
> factors after factorization): 255865 
>                 INFOG(10) (total integer space store the matrix factors after 
> factorization): 127890 
>                 INFOG(11) (order of largest frontal matrix after 
> factorization): 11 
>                 INFOG(12) (number of off-diagonal pivots): 19 
>                 INFOG(13) (number of delayed pivots after factorization): 8 
>                 INFOG(14) (number of memory compress after factorization): 0 
>                 INFOG(15) (number of steps of iterative refinement after 
> solution): 0 
>                 INFOG(16) (estimated size (in MB) of all MUMPS internal data 
> for factorization after analysis: value on the most memory consuming 
> processor): 7 
>                 INFOG(17) (estimated size of all MUMPS internal data for 
> factorization after analysis: sum over all processors): 7 
>                 INFOG(18) (size of all MUMPS internal data allocated during 
> factorization: value on the most memory consuming processor): 7 
>                 INFOG(19) (size of all MUMPS internal data allocated during 
> factorization: sum over all processors): 7 
>                 INFOG(20) (estimated number of entries in the factors): 
> 255801 
>                 INFOG(21) (size in MB of memory effectively used during 
> factorization - value on the most memory consuming processor): 7 
>                 INFOG(22) (size in MB of memory effectively used during 
> factorization - sum over all processors): 7 
>                 INFOG(23) (after analysis: value of ICNTL(6) effectively 
> used): 0 
>                 INFOG(24) (after analysis: value of ICNTL(12) effectively 
> used): 1 
>                 INFOG(25) (after factorization: number of pivots modified by 
> static pivoting): 0 
>                 INFOG(28) (after factorization: number of null pivots 
> encountered): 0
>                 INFOG(29) (after factorization: effective number of entries 
> in the factors (sum over all processors)): 255865
>                 INFOG(30, 31) (after solution: size in Mbytes of memory used 
> during solution phase): 5, 5
>                 INFOG(32) (after analysis: type of analysis done): 1
>                 INFOG(33) (value used for ICNTL(8)): 7
>                 INFOG(34) (exponent of the determinant if determinant is 
> requested): 0
>     linear system matrix = precond matrix:
>     Mat Object:     1 MPI processes
>       type: seqaij
>       rows=15991, cols=15991
>       total: nonzeros=223820, allocated nonzeros=431698
>       total number of mallocs used during MatSetValues calls =15991
>         using I-node routines: found 4000 nodes, limit used is 5
> 
> 
> 
> 
> -gideon
> 
>> On Sep 7, 2015, at 8:40 PM, Matthew Knepley <[email protected]> wrote:
>> 
>> On Mon, Sep 7, 2015 at 7:32 PM, Gideon Simpson <[email protected]> 
>> wrote:
>> Barry,
>> 
>> I finally got a chance to really try using the grid sequencing within my 
>> code.  I find that, in some cases, even if it can solve successfully on the 
>> coarsest mesh, the SNES fails, usually due to a line search failure, when it 
>> tries to compute along the grid sequence.  Would you have any suggestions?
>> 
>> I apologize if I have asked before, but can you give me -snes_view for the 
>> solver? I could not find it in the email thread.
>> 
>> I would suggest trying to fiddle with the line search, or precondition it 
>> with Richardson. It would be nice to see -snes_monitor
>> for the runs that fail, and then we can break down the residual into fields 
>> and look at it again (if my custom residual monitor
>> does not work we can write one easily). Seeing which part of the residual 
>> does not converge is key to designing the NASM
>> for the problem. I have just seen the virtuoso of this, Xiao-Chuan Cai, 
>> present it. We need better monitoring in PETSc.
>> 
>>   Thanks,
>> 
>>     Matt
>>  
>> -gideon
>> 
>>> On Aug 28, 2015, at 4:21 PM, Barry Smith <[email protected]> wrote:
>>> 
>>> 
>>>> On Aug 28, 2015, at 3:04 PM, Gideon Simpson <[email protected]> 
>>>> wrote:
>>>> 
>>>> Yes, if i continue in this parameter on the coarse mesh, I can generally 
>>>> solve at all values. I do find that I need to do some amount of 
>>>> continuation to solve near the endpoint.  The problem is that on the 
>>>> coarse mesh, things are not fully resolved at all the values along the 
>>>> continuation parameter, and I would like to do refinement.  
>>>> 
>>>> One subtlety is that I actually want the intermediate continuation 
>>>> solutions  too.  Currently, without doing any grid sequence, I compute 
>>>> each, write it to disk, and then go on to the next one.  So I now need to 
>>>> go back an refine them.  I was thinking that perhaps I could refine them 
>>>> on the fly, dump them to disk, and use the coarse solution as the starting 
>>>> guess at the next iteration, but that would seem to require resetting the 
>>>> snes back to the coarse grid.
>>>> 
>>>> The alternative would be to just script the mesh refinement in a post 
>>>> processing stage, where each value of the continuation is parameter is 
>>>> loaded on the coarse mesh, and refined.  Perhaps that’s the most practical 
>>>> thing to do.
>>> 
>>>   I would do the following. Create your DM and create a SNES that will do 
>>> the continuation
>>> 
>>>   loop over continuation parameter
>>> 
>>>        SNESSolve(snes,NULL,Ucoarse);
>>> 
>>>        if (you decide you want to see the refined solution at this 
>>> continuation point) {
>>>             SNESCreate(comm,&snesrefine);
>>>             SNESSetDM()
>>>             etc
>>>             SNESSetGridSequence(snesrefine,)
>>>             SNESSolve(snesrefine,0,Ucoarse);
>>>             SNESGetSolution(snesrefine,&Ufine);
>>>             VecView(Ufine or do whatever you want to do with the Ufine at 
>>> that continuation point
>>>             SNESDestroy(snesrefine);
>>>       end if
>>> 
>>>   end loop over continuation parameter.
>>> 
>>>   Barry
>>> 
>>>> 
>>>> -gideon
>>>> 
>>>>> On Aug 28, 2015, at 3:55 PM, Barry Smith <[email protected]> wrote:
>>>>> 
>>>>>> 
>>>>>> 
>>>>>> 3.  This problem is actually part of a continuation problem that roughly 
>>>>>> looks like this 
>>>>>> 
>>>>>> for( continuation parameter p = 0 to 1){
>>>>>> 
>>>>>>  solve with parameter p_i using solution from p_{i-1},
>>>>>> }
>>>>>> 
>>>>>> What I would like to do is to start the solver, for each value of 
>>>>>> parameter p_i on the coarse mesh, and then do grid sequencing on that.  
>>>>>> But it appears that after doing grid sequencing on the initial p_0 = 0, 
>>>>>> the SNES is set to use the finer mesh.
>>>>> 
>>>>>  So you are using continuation to give you a good enough initial guess on 
>>>>> the coarse level to even get convergence on the coarse level? First I 
>>>>> would check if you even need the continuation (or can you not even solve 
>>>>> the coarse problem without it).
>>>>> 
>>>>>  If you do need the continuation then you will need to tweak how you do 
>>>>> the grid sequencing. I think this will work: 
>>>>> 
>>>>> Do not use -snes_grid_sequencing  
>>>>> 
>>>>> Run SNESSolve() as many times as you want with your continuation 
>>>>> parameter. This will all happen on the coarse mesh.
>>>>> 
>>>>> Call SNESSetGridSequence()
>>>>> 
>>>>> Then call SNESSolve() again and it will do one solve on the coarse level 
>>>>> and then interpolate to the next level etc.
>>>> 
>>> 
>> 
>> 
>> 
>> 
>> -- 
>> What most experimenters take for granted before they begin their experiments 
>> is infinitely more interesting than any results to which their experiments 
>> lead.
>> -- Norbert Wiener
>

Re: [petsc-users] pcfieldsplit for a composite dm with multiple subfields

Reply via email to