Re: [petsc-users] sources of floating point randomness in JFNK in serial

Dave May Thu, 04 May 2023 06:03:14 -0700

Is your code valgrind clean?

On Thu 4. May 2023 at 05:54, Mark Lohry <[email protected]> wrote:


> Try -pc_type none.
>>
>
> With -pc_type none the 0 KSP residual looks identical. But *sometimes*
> it's producing exactly the same history and others it's gradually
> changing.  I'm reasonably confident my residual evaluation has no
> randomness, see info after the petsc output.
>
> solve history 1:
>
>   0 SNES Function norm 3.424003312857e+04
>     0 KSP Residual norm 3.424003312857e+04
>     1 KSP Residual norm 2.871734444536e+04
>     2 KSP Residual norm 2.490276931041e+04
> ...
>    20 KSP Residual norm 7.449686034356e+03
>   Linear solve converged due to CONVERGED_ITS iterations 20
>   1 SNES Function norm 1.085015821006e+04
>
> solve history 2, identical to 1:
>
>   0 SNES Function norm 3.424003312857e+04
>     0 KSP Residual norm 3.424003312857e+04
>     1 KSP Residual norm 2.871734444536e+04
>     2 KSP Residual norm 2.490276931041e+04
> ...
>    20 KSP Residual norm 7.449686034356e+03
>   Linear solve converged due to CONVERGED_ITS iterations 20
>   1 SNES Function norm 1.085015821006e+04
>
> solve history 3, identical KSP at 0 and 1, slight change at 2, growing
> difference to the end:
>   0 SNES Function norm 3.424003312857e+04
>     0 KSP Residual norm 3.424003312857e+04
>     1 KSP Residual norm 2.871734444536e+04
>     2 KSP Residual norm 2.490276930242e+04
> ...
>  20 KSP Residual norm 7.449686095424e+03
>   Linear solve converged due to CONVERGED_ITS iterations 20
>   1 SNES Function norm 1.085015646971e+04
>
>
> Ths is using a standard explicit 3-stage Runge-Kutta smoother for 10
> iterations, so 30 calls of the same residual evaluation, identical
> residuals every time
>
> run 1:
>
> # iteration            rho                 rhou                rhov
>          rhoE                abs_res             rel_res             umin
>              vmax                vmin                elapsed_time
> #
>
>
>           1.00000e+00  1.086860616292e+00  2.782316758416e+02
>  4.482867643761e+00  2.993435920340e+02         2.04353e+02
> 1.00000e+00        -8.23945e-15        -6.15326e-15        -1.35563e-14
>     6.34834e-01
>           2.00000e+00  2.310547487017e+00  1.079059352425e+02
>  3.958323921837e+00  5.058927165686e+02         2.58647e+02
> 1.26568e+00        -1.02539e-14        -9.35368e-15        -1.69925e-14
>     6.40063e-01
>           3.00000e+00  2.361005867444e+00  5.706213331683e+01
>  6.130016323357e+00  4.688968362579e+02         2.36201e+02
> 1.15585e+00        -1.19370e-14        -1.15216e-14        -1.59733e-14
>     6.45166e-01
>           4.00000e+00  2.167518999963e+00  3.757541401594e+01
>  6.313917437428e+00  4.054310291628e+02         2.03612e+02
> 9.96372e-01        -1.81831e-14        -1.28312e-14        -1.46238e-14
>     6.50494e-01
>           5.00000e+00  1.941443738676e+00  2.884190334049e+01
>  6.237106158479e+00  3.539201037156e+02         1.77577e+02
> 8.68970e-01         3.56633e-14        -8.74089e-15        -1.06666e-14
>     6.55656e-01
>           6.00000e+00  1.736947124693e+00  2.429485695670e+01
>  5.996962200407e+00  3.148280178142e+02         1.57913e+02
> 7.72745e-01        -8.98634e-14        -2.41152e-14        -1.39713e-14
>     6.60872e-01
>           7.00000e+00  1.564153212635e+00  2.149609219810e+01
>  5.786910705204e+00  2.848717011033e+02         1.42872e+02
> 6.99144e-01        -2.95352e-13        -2.48158e-14        -2.39351e-14
>     6.66041e-01
>           8.00000e+00  1.419280815384e+00  1.950619804089e+01
>  5.627281158306e+00  2.606623371229e+02         1.30728e+02
> 6.39715e-01         8.98941e-13         1.09674e-13         3.78905e-14
>     6.71316e-01
>           9.00000e+00  1.296115915975e+00  1.794843530745e+01
>  5.514933264437e+00  2.401524522393e+02         1.20444e+02
> 5.89394e-01         1.70717e-12         1.38762e-14         1.09825e-13
>     6.76447e-01
>           1.00000e+01  1.189639693918e+00  1.665381754953e+01
>  5.433183087037e+00  2.222572900473e+02         1.11475e+02
> 5.45501e-01        -4.22462e-12        -7.15206e-13        -2.28736e-13
>     6.81716e-01
>
> run N:
>
>
> #
>
>
> # iteration            rho                 rhou                rhov
>          rhoE                abs_res             rel_res             umin
>              vmax                vmin                elapsed_time
> #
>
>
>           1.00000e+00  1.086860616292e+00  2.782316758416e+02
>  4.482867643761e+00  2.993435920340e+02         2.04353e+02
> 1.00000e+00        -8.23945e-15        -6.15326e-15        -1.35563e-14
>     6.23316e-01
>           2.00000e+00  2.310547487017e+00  1.079059352425e+02
>  3.958323921837e+00  5.058927165686e+02         2.58647e+02
> 1.26568e+00        -1.02539e-14        -9.35368e-15        -1.69925e-14
>     6.28510e-01
>           3.00000e+00  2.361005867444e+00  5.706213331683e+01
>  6.130016323357e+00  4.688968362579e+02         2.36201e+02
> 1.15585e+00        -1.19370e-14        -1.15216e-14        -1.59733e-14
>     6.33558e-01
>           4.00000e+00  2.167518999963e+00  3.757541401594e+01
>  6.313917437428e+00  4.054310291628e+02         2.03612e+02
> 9.96372e-01        -1.81831e-14        -1.28312e-14        -1.46238e-14
>     6.38773e-01
>           5.00000e+00  1.941443738676e+00  2.884190334049e+01
>  6.237106158479e+00  3.539201037156e+02         1.77577e+02
> 8.68970e-01         3.56633e-14        -8.74089e-15        -1.06666e-14
>     6.43887e-01
>           6.00000e+00  1.736947124693e+00  2.429485695670e+01
>  5.996962200407e+00  3.148280178142e+02         1.57913e+02
> 7.72745e-01        -8.98634e-14        -2.41152e-14        -1.39713e-14
>     6.49073e-01
>           7.00000e+00  1.564153212635e+00  2.149609219810e+01
>  5.786910705204e+00  2.848717011033e+02         1.42872e+02
> 6.99144e-01        -2.95352e-13        -2.48158e-14        -2.39351e-14
>     6.54167e-01
>           8.00000e+00  1.419280815384e+00  1.950619804089e+01
>  5.627281158306e+00  2.606623371229e+02         1.30728e+02
> 6.39715e-01         8.98941e-13         1.09674e-13         3.78905e-14
>     6.59394e-01
>           9.00000e+00  1.296115915975e+00  1.794843530745e+01
>  5.514933264437e+00  2.401524522393e+02         1.20444e+02
> 5.89394e-01         1.70717e-12         1.38762e-14         1.09825e-13
>     6.64516e-01
>           1.00000e+01  1.189639693918e+00  1.665381754953e+01
>  5.433183087037e+00  2.222572900473e+02         1.11475e+02
> 5.45501e-01        -4.22462e-12        -7.15206e-13        -2.28736e-13
>     6.69677e-01
>
>
>
>
>
> On Thu, May 4, 2023 at 8:41 AM Mark Adams <[email protected]> wrote:
>
>> ASM is just the sub PC with one proc but gets weaker with more procs
>> unless you use jacobi. (maybe I am missing something).
>>
>> On Thu, May 4, 2023 at 8:31 AM Mark Lohry <[email protected]> wrote:
>>
>>>  Please send the output of -snes_view.
>>>>
>>> pasted below. anything stand out?
>>>
>>>
>>> SNES Object: 1 MPI process
>>>   type: newtonls
>>>   maximum iterations=1, maximum function evaluations=-1
>>>   tolerances: relative=0.1, absolute=1e-15, solution=1e-15
>>>   total number of linear solver iterations=20
>>>   total number of function evaluations=22
>>>   norm schedule ALWAYS
>>>   Jacobian is never rebuilt
>>>   Jacobian is applied matrix-free with differencing
>>>   Preconditioning Jacobian is built using finite differences with
>>> coloring
>>>   SNESLineSearch Object: 1 MPI process
>>>     type: basic
>>>     maxstep=1.000000e+08, minlambda=1.000000e-12
>>>     tolerances: relative=1.000000e-08, absolute=1.000000e-15,
>>> lambda=1.000000e-08
>>>     maximum iterations=40
>>>   KSP Object: 1 MPI process
>>>     type: gmres
>>>       restart=30, using Classical (unmodified) Gram-Schmidt
>>> Orthogonalization with no iterative refinement
>>>       happy breakdown tolerance 1e-30
>>>     maximum iterations=20, initial guess is zero
>>>     tolerances:  relative=0.1, absolute=1e-15, divergence=10.
>>>     left preconditioning
>>>     using PRECONDITIONED norm type for convergence test
>>>   PC Object: 1 MPI process
>>>     type: asm
>>>       total subdomain blocks = 1, amount of overlap = 0
>>>       restriction/interpolation type - RESTRICT
>>>       Local solver information for first block is in the following KSP
>>> and PC objects on rank 0:
>>>       Use -ksp_view ::ascii_info_detail to display information for all
>>> blocks
>>>     KSP Object: (sub_) 1 MPI process
>>>       type: preonly
>>>       maximum iterations=10000, initial guess is zero
>>>       tolerances:  relative=1e-05, absolute=1e-50, divergence=10000.
>>>       left preconditioning
>>>       using NONE norm type for convergence test
>>>     PC Object: (sub_) 1 MPI process
>>>       type: ilu
>>>         out-of-place factorization
>>>         0 levels of fill
>>>         tolerance for zero pivot 2.22045e-14
>>>         matrix ordering: natural
>>>         factor fill ratio given 1., needed 1.
>>>           Factored matrix follows:
>>>             Mat Object: (sub_) 1 MPI process
>>>               type: seqbaij
>>>               rows=16384, cols=16384, bs=16
>>>               package used to perform factorization: petsc
>>>               total: nonzeros=1277952, allocated nonzeros=1277952
>>>                   block size is 16
>>>       linear system matrix = precond matrix:
>>>       Mat Object: (sub_) 1 MPI process
>>>         type: seqbaij
>>>         rows=16384, cols=16384, bs=16
>>>         total: nonzeros=1277952, allocated nonzeros=1277952
>>>         total number of mallocs used during MatSetValues calls=0
>>>             block size is 16
>>>     linear system matrix followed by preconditioner matrix:
>>>     Mat Object: 1 MPI process
>>>       type: mffd
>>>       rows=16384, cols=16384
>>>         Matrix-free approximation:
>>>           err=1.49012e-08 (relative error in function evaluation)
>>>           Using wp compute h routine
>>>               Does not compute normU
>>>     Mat Object: 1 MPI process
>>>       type: seqbaij
>>>       rows=16384, cols=16384, bs=16
>>>       total: nonzeros=1277952, allocated nonzeros=1277952
>>>       total number of mallocs used during MatSetValues calls=0
>>>           block size is 16
>>>
>>> On Thu, May 4, 2023 at 8:30 AM Mark Adams <[email protected]> wrote:
>>>
>>>> If you are using MG what is the coarse grid solver?
>>>> -snes_view might give you that.
>>>>
>>>> On Thu, May 4, 2023 at 8:25 AM Matthew Knepley <[email protected]>
>>>> wrote:
>>>>
>>>>> On Thu, May 4, 2023 at 8:21 AM Mark Lohry <[email protected]> wrote:
>>>>>
>>>>>> Do they start very similarly and then slowly drift further apart?
>>>>>>
>>>>>>
>>>>>> Yes, this. I take it this sounds familiar?
>>>>>>
>>>>>> See these two examples with 20 fixed iterations pasted at the end.
>>>>>> The difference for one solve is slight (final SNES norm is identical to 5
>>>>>> digits), but in the context I'm using it in (repeated applications to 
>>>>>> solve
>>>>>> a steady state multigrid problem, though here just one level) the
>>>>>> differences add up such that I might reach global convergence in 35
>>>>>> iterations or 38. It's not the end of the world, but I was expecting that
>>>>>> with -np 1 these would be identical and I'm not sure where the root cause
>>>>>> would be.
>>>>>>
>>>>>
>>>>> The initial KSP residual is different, so its the PC. Please send the
>>>>> output of -snes_view. If your ASM is using direct factorization, then it
>>>>> could be randomness in whatever LU you are using.
>>>>>
>>>>>   Thanks,
>>>>>
>>>>>     Matt
>>>>>
>>>>>
>>>>>>   0 SNES Function norm 2.801842107848e+04
>>>>>>     0 KSP Residual norm 4.045639499595e+01
>>>>>>     1 KSP Residual norm 1.917999809040e+01
>>>>>>     2 KSP Residual norm 1.616048521958e+01
>>>>>> [...]
>>>>>>    19 KSP Residual norm 8.788043518111e-01
>>>>>>    20 KSP Residual norm 6.570851270214e-01
>>>>>>   Linear solve converged due to CONVERGED_ITS iterations 20
>>>>>>   1 SNES Function norm 1.801309983345e+03
>>>>>> Nonlinear solve converged due to CONVERGED_ITS iterations 1
>>>>>>
>>>>>>
>>>>>> Same system, identical initial 0 SNES norm, 0 KSP is slightly
>>>>>> different
>>>>>>
>>>>>>   0 SNES Function norm 2.801842107848e+04
>>>>>>     0 KSP Residual norm 4.045639473002e+01
>>>>>>     1 KSP Residual norm 1.917999883034e+01
>>>>>>     2 KSP Residual norm 1.616048572016e+01
>>>>>> [...]
>>>>>>    19 KSP Residual norm 8.788046348957e-01
>>>>>>    20 KSP Residual norm 6.570859588610e-01
>>>>>>   Linear solve converged due to CONVERGED_ITS iterations 20
>>>>>>   1 SNES Function norm 1.801311320322e+03
>>>>>> Nonlinear solve converged due to CONVERGED_ITS iterations 1
>>>>>>
>>>>>> On Wed, May 3, 2023 at 11:05 PM Barry Smith <[email protected]> wrote:
>>>>>>
>>>>>>>
>>>>>>>   Do they start very similarly and then slowly drift further apart?
>>>>>>> That is the first couple of KSP iterations they are almost identical but
>>>>>>> then for each iteration get a bit further. Similar for the SNES 
>>>>>>> iterations,
>>>>>>> starting close and then for more iterations and more solves they start
>>>>>>> moving apart. Or do they suddenly jump to be very different? You can run
>>>>>>> with -snes_monitor -ksp_monitor
>>>>>>>
>>>>>>> On May 3, 2023, at 9:07 PM, Mark Lohry <[email protected]> wrote:
>>>>>>>
>>>>>>> This is on a single MPI rank. I haven't checked the coloring, was
>>>>>>> just guessing there. But the solutions/residuals are slightly different
>>>>>>> from run to run.
>>>>>>>
>>>>>>> Fair to say that for serial JFNK/asm ilu0/gmres we should expect
>>>>>>> bitwise identical results?
>>>>>>>
>>>>>>>
>>>>>>> On Wed, May 3, 2023, 8:50 PM Barry Smith <[email protected]> wrote:
>>>>>>>
>>>>>>>>
>>>>>>>>   No, the coloring should be identical every time. Do you see
>>>>>>>> differences with 1 MPI rank? (Or much smaller ones?).
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>> > On May 3, 2023, at 8:42 PM, Mark Lohry <[email protected]> wrote:
>>>>>>>> >
>>>>>>>> > I'm running multiple iterations of newtonls with an MFFD/JFNK
>>>>>>>> nonlinear solver where I give it the sparsity. PC asm, KSP gmres, with
>>>>>>>> SNESSetLagJacobian -2 (compute once and then frozen jacobian).
>>>>>>>> >
>>>>>>>> > I'm seeing slight (<1%) but nonzero differences in residuals from
>>>>>>>> run to run. I'm wondering where randomness might enter here -- does the
>>>>>>>> jacobian coloring use a random seed?
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>
>>>>> --
>>>>> What most experimenters take for granted before they begin their
>>>>> experiments is infinitely more interesting than any results to which their
>>>>> experiments lead.
>>>>> -- Norbert Wiener
>>>>>
>>>>> https://www.cse.buffalo.edu/~knepley/
>>>>> <http://www.cse.buffalo.edu/~knepley/>
>>>>>
>>>>

Re: [petsc-users] sources of floating point randomness in JFNK in serial

Reply via email to