Do the ASM runs for thousands of time-steps produce the same final "physical results" as the MUMPS run for thousands of timesteps? While with SuperLU you get a very different "physical results"?
Barry > On Nov 15, 2017, at 4:52 PM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > On Wed, Nov 15, 2017 at 3:35 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > Since the convergence labeled linear does not converge to 14 digits in one > iteration I am assuming you are using lagged preconditioning and or lagged > Jacobian? > > We are using Jacobian-free Newton. So Jacobian is different from the > preconditioning matrix. > > > What happens if you do no lagging and solve each linear solve with a new > LU factorization? > > We have the following results without using Jacobian-free Newton. Again, > superlu_dist produces differences, while MUMPS gives the same results in > terms of the residual norms. > > > Fande, > > > Superlu_dist run1: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.322285e-11 > 1 Nonlinear |R| = 1.666987e-11 > > > Superlu_dist run2: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.322171e-11 > 1 Nonlinear |R| = 1.666977e-11 > > > Superlu_dist run3: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.321964e-11 > 1 Nonlinear |R| = 1.666959e-11 > > > Superlu_dist run4: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.321978e-11 > 1 Nonlinear |R| = 1.668688e-11 > > > MUMPS run1: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > MUMPS run 2: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > MUMPS run 3: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > MUMPS run4: > > 0 Nonlinear |R| = 9.447423e+03 > 0 Linear |R| = 9.447423e+03 > 1 Linear |R| = 1.360637e-11 > 1 Nonlinear |R| = 1.654334e-11 > > > > > > > > > > Barry > > > > On Nov 15, 2017, at 4:24 PM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > > > > > On Wed, Nov 15, 2017 at 2:52 PM, Smith, Barry F. <bsm...@mcs.anl.gov> wrote: > > > > > > > On Nov 15, 2017, at 3:36 PM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > > > Hi Barry, > > > > > > Thanks for your reply. I was wondering why this happens only when we use > > > superlu_dist. I am trying to understand the algorithm in superlu_dist. If > > > we use ASM or MUMPS, we do not produce these differences. > > > > > > The differences actually are NOT meaningless. In fact, we have a real > > > transient application that presents this issue. When we run the > > > simulation with superlu_dist in parallel for thousands of time steps, the > > > final physics solution looks totally different from different runs. The > > > differences are not acceptable any more. For a steady problem, the > > > difference may be meaningless. But it is significant for the transient > > > problem. > > > > I submit that the "physics solution" of all of these runs is equally > > right and equally wrong. If the solutions are very different due to a small > > perturbation than something is wrong with the model or the integrator, I > > don't think you can blame the linear solver (see below) > > > > > > This makes the solution not reproducible, and we can not even set a > > > targeting solution in the test system because the solution is so > > > different from one run to another. I guess there might/may be a tiny > > > bug in superlu_dist or the PETSc interface to superlu_dist. > > > > This is possible but it is also possible this is due to normal round off > > inside of SuperLU dist. > > > > Since you have SuperLU_Dist inside a nonlinear iteration it shouldn't > > really matter exactly how well SuperLU_Dist does. The nonlinear iteration > > does essential defect correction for you; are you making sure that the > > nonlinear iteration always works for every timestep? For example confirm > > that SNESGetConvergedReason() is always positive. > > > > Definitely it could be something wrong on my side. But let us focus on the > > simple question first. > > > > To make the discussion a little simpler, let us back to the simple problem > > (heat conduction). Now I want to understand why this happens to > > superlu_dist only. When we are using ASM or MUMPS, why we can not see the > > differences from one run to another? I posted the residual histories for > > MUMPS and ASM. We can not see any differences in terms of the residual > > norms when using MUMPS or ASM. Does superlu_dist have higher round off than > > other solvers? > > > > > > > > MUMPS run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > MUMPS run4: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 1.013384e-02 > > 2 Linear |R| = 4.020993e-08 > > 1 Nonlinear |R| = 1.404678e-02 > > 0 Linear |R| = 1.404678e-02 > > 1 Linear |R| = 4.836162e-08 > > 2 Linear |R| = 7.055620e-14 > > 2 Nonlinear |R| = 4.836392e-08 > > > > > > > > ASM run1: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > ASM run2: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > ASM run3: > > > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > ASM run4: > > 0 Nonlinear |R| = 9.447423e+03 > > 0 Linear |R| = 9.447423e+03 > > 1 Linear |R| = 6.189229e+03 > > 2 Linear |R| = 3.252487e+02 > > 3 Linear |R| = 3.485174e+01 > > 4 Linear |R| = 8.600695e+00 > > 5 Linear |R| = 3.333942e+00 > > 6 Linear |R| = 1.706112e+00 > > 7 Linear |R| = 5.047863e-01 > > 8 Linear |R| = 2.337297e-01 > > 9 Linear |R| = 1.071627e-01 > > 10 Linear |R| = 4.692177e-02 > > 11 Linear |R| = 1.340717e-02 > > 12 Linear |R| = 4.753951e-03 > > 1 Nonlinear |R| = 2.320271e-02 > > 0 Linear |R| = 2.320271e-02 > > 1 Linear |R| = 4.367880e-03 > > 2 Linear |R| = 1.407852e-03 > > 3 Linear |R| = 6.036360e-04 > > 4 Linear |R| = 1.867661e-04 > > 5 Linear |R| = 8.760076e-05 > > 6 Linear |R| = 3.260519e-05 > > 7 Linear |R| = 1.435418e-05 > > 8 Linear |R| = 4.532875e-06 > > 9 Linear |R| = 2.439053e-06 > > 10 Linear |R| = 7.998549e-07 > > 11 Linear |R| = 2.428064e-07 > > 12 Linear |R| = 4.766918e-08 > > 13 Linear |R| = 1.713748e-08 > > 2 Nonlinear |R| = 3.671573e-07 > > > > > > > > > > > > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > On Wed, Nov 15, 2017 at 1:59 PM, Smith, Barry F. <bsm...@mcs.anl.gov> > > > wrote: > > > > > > Meaningless differences > > > > > > > > > > On Nov 15, 2017, at 2:26 PM, Kong, Fande <fande.k...@inl.gov> wrote: > > > > > > > > Hi, > > > > > > > > There is a heat conduction problem. When superlu_dist is used as a > > > > preconditioner, we have random results from different runs. Is there a > > > > random algorithm in superlu_dist? If we use ASM or MUMPS as the > > > > preconditioner, we then don't have this issue. > > > > > > > > run 1: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.104757e-08 > > > > 2 Linear |R| = 7.699637e-14 > > > > 2 Nonlinear |R| = 5.106418e-08 > > > > > > > > > > > > run 2: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.109913e-08 > > > > 2 Linear |R| = 7.189091e-14 > > > > 2 Nonlinear |R| = 5.111591e-08 > > > > > > > > run 3: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.104942e-08 > > > > 2 Linear |R| = 7.465572e-14 > > > > 2 Nonlinear |R| = 5.106642e-08 > > > > > > > > run 4: > > > > > > > > 0 Nonlinear |R| = 9.447423e+03 > > > > 0 Linear |R| = 9.447423e+03 > > > > 1 Linear |R| = 1.013384e-02 > > > > 2 Linear |R| = 4.020995e-08 > > > > 1 Nonlinear |R| = 1.404678e-02 > > > > 0 Linear |R| = 1.404678e-02 > > > > 1 Linear |R| = 5.102730e-08 > > > > 2 Linear |R| = 7.132220e-14 > > > > 2 Nonlinear |R| = 5.104442e-08 > > > > > > > > Solver details: > > > > > > > > SNES Object: 8 MPI processes > > > > type: newtonls > > > > maximum iterations=15, maximum function evaluations=10000 > > > > tolerances: relative=1e-08, absolute=1e-11, solution=1e-50 > > > > total number of linear solver iterations=4 > > > > total number of function evaluations=7 > > > > norm schedule ALWAYS > > > > SNESLineSearch Object: 8 MPI processes > > > > type: basic > > > > maxstep=1.000000e+08, minlambda=1.000000e-12 > > > > tolerances: relative=1.000000e-08, absolute=1.000000e-15, > > > > lambda=1.000000e-08 > > > > maximum iterations=40 > > > > KSP Object: 8 MPI processes > > > > type: gmres > > > > restart=30, using Classical (unmodified) Gram-Schmidt > > > > Orthogonalization with no iterative refinement > > > > happy breakdown tolerance 1e-30 > > > > maximum iterations=100, initial guess is zero > > > > tolerances: relative=1e-06, absolute=1e-50, divergence=10000. > > > > right preconditioning > > > > using UNPRECONDITIONED norm type for convergence test > > > > PC Object: 8 MPI processes > > > > type: lu > > > > out-of-place factorization > > > > tolerance for zero pivot 2.22045e-14 > > > > matrix ordering: natural > > > > factor fill ratio given 0., needed 0. > > > > Factored matrix follows: > > > > Mat Object: 8 MPI processes > > > > type: superlu_dist > > > > rows=7925, cols=7925 > > > > package used to perform factorization: superlu_dist > > > > total: nonzeros=0, allocated nonzeros=0 > > > > total number of mallocs used during MatSetValues calls =0 > > > > SuperLU_DIST run parameters: > > > > Process grid nprow 4 x npcol 2 > > > > Equilibrate matrix TRUE > > > > Matrix input mode 1 > > > > Replace tiny pivots FALSE > > > > Use iterative refinement TRUE > > > > Processors in row 4 col partition 2 > > > > Row permutation LargeDiag > > > > Column permutation METIS_AT_PLUS_A > > > > Parallel symbolic factorization FALSE > > > > Repeated factorization SamePattern > > > > linear system matrix followed by preconditioner matrix: > > > > Mat Object: 8 MPI processes > > > > type: mffd > > > > rows=7925, cols=7925 > > > > Matrix-free approximation: > > > > err=1.49012e-08 (relative error in function evaluation) > > > > Using wp compute h routine > > > > Does not compute normU > > > > Mat Object: () 8 MPI processes > > > > type: mpiaij > > > > rows=7925, cols=7925 > > > > total: nonzeros=63587, allocated nonzeros=63865 > > > > total number of mallocs used during MatSetValues calls =0 > > > > not using I-node (on process 0) routines > > > > > > > > > > > > Fande, > > > > > > > > > > > > > > > >