Hi Barry and Mark,
Thank you for looking into my problem. The two equations I am solving with
PETSc are equations 6 and 7 from this paper:
https://ris.utwente.nl/ws/portalfiles/portal/5676495/Roghair+Paper_final_draft_v1.pdf
I just used MUMPS and SuperLU_DIST on my full-size problem (with 3,000,000
unknowns). To clarify, I did a direct solve with -ksp_type preonly. They take a
very long time, about 30 minutes for MUMPS and 18 minutes for SuperLU_DIST, see
attached output. For reference, the same matrix took 658 iterations of
BoomerAMG and about 20 seconds of walltime. Maybe I am already getting a great
deal with BoomerAMG!
I'll try removing some terms from my solve (e.g. removing the second equation,
then making the second equation just the elliptic portion of the equation,
etc.) and try with a simpler geometry. I'll keep you updated as I run into
troubles with that route. I wasn't aware of Field Split preconditioners, I'll
do some reading on them and give them a try as well.
Thank you again,
Joshua
________________________________
From: Barry Smith <[email protected]>
Sent: Thursday, March 2, 2023 7:47 AM
To: Christopher, Joshua <[email protected]>
Cc: [email protected] <[email protected]>
Subject: Re: [petsc-users] Overcoming slow convergence with GMRES+Hypre
BoomerAMG
Have you tried MUMPS (or SuperLU_DIST) on the full-size problem with the
5,000,000 unknowns? It is at the high end of problem sizes you can do with
direct solvers but is worth comparing with BoomerAMG. You likely want to use
more nodes and fewer cores per node with MUMPs to be able to access more
memory. If you are needing to solve multiple right hand sides but with the same
matrix the factors will be reused resulting in the second and later solves
being much faster.
I agree with Mark, with iterative solvers you are likely to end up with
PCFIELDSPLIT.
Barry
On Mar 1, 2023, at 7:17 PM, Christopher, Joshua via petsc-users
<[email protected]> wrote:
Hello,
I am trying to solve the leaky-dielectric model equations with PETSc using a
second-order discretization scheme (with limiting to first order as needed)
using the finite volume method. The leaky dielectric model is a coupled system
of two equations, consisting of a Poisson equation and a convection-diffusion
equation. I have tested on small problems with simple geometry (~1000 DoFs)
using:
-ksp_type gmres
-pc_type hypre
-pc_hypre_type boomeramg
and I get RTOL convergence to 1.e-5 in about 4 iterations. I tested this in
parallel with 2 cores, but also previously was able to use successfully use a
direct solver in serial to solve this problem. When I scale up to my production
problem, I get significantly worse convergence. My production problem has ~3
million DoFs, more complex geometry, and is solved on ~100 cores across two
nodes. The boundary conditions change a little because of the geometry, but are
of the same classifications (e.g. only Dirichlet and Neumann). On the
production case, I am needing 600-4000 iterations to converge. I've attached
the output from the first solve that took 658 iterations to converge, using the
following output options:
-ksp_view_pre
-ksp_view
-ksp_converged_reason
-ksp_monitor_true_residual
-ksp_test_null_space
My matrix is non-symmetric, the condition number can be around 10e6, and the
eigenvalues reported by PETSc have been real and positive (using
-ksp_view_eigenvalues).
I have tried using other preconditions (superlu, mumps, gamg, mg) but
hypre+boomeramg has performed the best so far. The literature seems to indicate
that AMG is the best approach for solving these equations in a coupled fashion.
Do you have any advice on speeding up the convergence of this system?
Thank you,
Joshua
<petsc_gmres_boomeramg.txt>
Residual norms for cs_ solve.
0 KSP none resid norm 1.254940857906e+01 true resid norm 1.158447123888e-14
||r(i)||/||b|| 9.231089390304e-16
1 KSP none resid norm 1.158447123888e-14 true resid norm 1.158447123888e-14
||r(i)||/||b|| 9.231089390304e-16
Linear cs_ solve converged due to CONVERGED_ITS iterations 1
KSP Object: (cs_) 108 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (cs_) 108 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: external
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 108 MPI processes
type: mumps
rows=5351238, cols=5351238
package used to perform factorization: mumps
total: nonzeros=-11493, allocated nonzeros=-11493
MUMPS run parameters:
SYM (matrix type): 0
PAR (host participation): 1
ICNTL(1) (output for error): 6
ICNTL(2) (output of diagnostic msg): 0
ICNTL(3) (output for global info): 0
ICNTL(4) (level of printing): 0
ICNTL(5) (input mat struct): 0
ICNTL(6) (matrix prescaling): 7
ICNTL(7) (sequential matrix ordering):7
ICNTL(8) (scaling strategy): 77
ICNTL(10) (max num of refinements): 0
ICNTL(11) (error analysis): 0
ICNTL(12) (efficiency control): 1
ICNTL(13) (sequential factorization of the root node): 0
ICNTL(14) (percentage of estimated workspace increase): 35
ICNTL(18) (input mat struct): 3
ICNTL(19) (Schur complement info): 0
ICNTL(20) (RHS sparse pattern): 10
ICNTL(21) (solution struct): 1
ICNTL(22) (in-core/out-of-core facility): 0
ICNTL(23) (max size of memory can be allocated locally):0
ICNTL(24) (detection of null pivot rows): 0
ICNTL(25) (computation of a null space basis): 0
ICNTL(26) (Schur options for RHS or solution): 0
ICNTL(27) (blocking size for multiple RHS): -32
ICNTL(28) (use parallel or sequential ordering): 1
ICNTL(29) (parallel ordering): 0
ICNTL(30) (user-specified set of entries in inv(A)): 0
ICNTL(31) (factors is discarded in the solve phase): 0
ICNTL(33) (compute determinant): 0
ICNTL(35) (activate BLR based factorization): 0
ICNTL(36) (choice of BLR factorization variant): 0
ICNTL(38) (estimated compression rate of LU factors): 333
CNTL(1) (relative pivoting threshold): 0.01
CNTL(2) (stopping criterion of refinement): 1.49012e-08
CNTL(3) (absolute pivoting threshold): 0.
CNTL(4) (value of static pivoting): -1.
CNTL(5) (fixation for null pivots): 0.
CNTL(7) (dropping parameter for BLR): 0.
RINFO(1) (local estimated flops for the elimination after
analysis):
[0] 1.66073e+12
[1] 1.6585e+12
[2] 1.63203e+12
[3] 1.66098e+12
[4] 1.66093e+12
[5] 1.67157e+12
[6] 1.63749e+12
[7] 1.66218e+12
[8] 1.66996e+12
[9] 1.6724e+12
[10] 1.66066e+12
[11] 1.66488e+12
[12] 1.66376e+12
[13] 1.67345e+12
[14] 1.66895e+12
[15] 1.65963e+12
[16] 1.80688e+12
[17] 1.81292e+12
[18] 1.84546e+12
[19] 1.84133e+12
[20] 1.81582e+12
[21] 1.81373e+12
[22] 1.82058e+12
[23] 1.81928e+12
[24] 1.78549e+12
[25] 1.81829e+12
[26] 1.81991e+12
[27] 1.81214e+12
[28] 1.80272e+12
[29] 1.83771e+12
[30] 1.78353e+12
[31] 1.81475e+12
[32] 1.9671e+12
[33] 2.01986e+12
[34] 1.71875e+12
[35] 1.66608e+12
[36] 1.68529e+12
[37] 1.6605e+12
[38] 1.64795e+12
[39] 1.65916e+12
[40] 1.66158e+12
[41] 1.66343e+12
[42] 1.66994e+12
[43] 1.65919e+12
[44] 1.72425e+12
[45] 1.88805e+12
[46] 1.75515e+12
[47] 1.7273e+12
[48] 1.71609e+12
[49] 1.73189e+12
[50] 1.73073e+12
[51] 1.7049e+12
[52] 1.72524e+12
[53] 1.73553e+12
[54] 1.74309e+12
[55] 1.66744e+12
[56] 1.71165e+12
[57] 1.70487e+12
[58] 1.72586e+12
[59] 1.72037e+12
[60] 1.75284e+12
[61] 1.74677e+12
[62] 1.71936e+12
[63] 1.72901e+12
[64] 1.69258e+12
[65] 1.52824e+12
[66] 1.69267e+12
[67] 1.72771e+12
[68] 1.72308e+12
[69] 1.73153e+12
[70] 1.70773e+12
[71] 1.71084e+12
[72] 1.7297e+12
[73] 1.73028e+12
[74] 1.74747e+12
[75] 1.77166e+12
[76] 1.70912e+12
[77] 1.74161e+12
[78] 1.7246e+12
[79] 1.7157e+12
[80] 1.70691e+12
[81] 1.74236e+12
[82] 1.72341e+12
[83] 1.72029e+12
[84] 1.73416e+12
[85] 1.71799e+12
[86] 1.74188e+12
[87] 1.74702e+12
[88] 1.74342e+12
[89] 1.74041e+12
[90] 1.72573e+12
[91] 1.86275e+12
[92] 1.86433e+12
[93] 1.85934e+12
[94] 1.85806e+12
[95] 1.83803e+12
[96] 1.86591e+12
[97] 1.8614e+12
[98] 1.81489e+12
[99] 1.86052e+12
[100] 1.85587e+12
[101] 2.21322e+12
[102] 1.89808e+12
[103] 1.86218e+12
[104] 1.85395e+12
[105] 1.73799e+12
[106] 1.65875e+12
[107] 1.68549e+12
RINFO(2) (local estimated flops for the assembly after
factorization):
[0] 1.23324e+09
[1] 1.26652e+09
[2] 1.29086e+09
[3] 1.28862e+09
[4] 1.29277e+09
[5] 1.13907e+09
[6] 1.24423e+09
[7] 1.22431e+09
[8] 1.23427e+09
[9] 1.21851e+09
[10] 1.3006e+09
[11] 1.22163e+09
[12] 1.3956e+09
[13] 1.3435e+09
[14] 1.13975e+09
[15] 1.2994e+09
[16] 1.17168e+09
[17] 1.36141e+09
[18] 1.27683e+09
[19] 1.36749e+09
[20] 1.35887e+09
[21] 1.17786e+09
[22] 1.31688e+09
[23] 1.23037e+09
[24] 1.38063e+09
[25] 1.28078e+09
[26] 1.34857e+09
[27] 1.3585e+09
[28] 1.30495e+09
[29] 1.33338e+09
[30] 1.27015e+09
[31] 1.27686e+09
[32] 1.22159e+09
[33] 1.07075e+09
[34] 1.07552e+09
[35] 1.19348e+09
[36] 1.16066e+09
[37] 1.25166e+09
[38] 1.21874e+09
[39] 1.31041e+09
[40] 1.35048e+09
[41] 1.16192e+09
[42] 1.16826e+09
[43] 1.3497e+09
[44] 1.21962e+09
[45] 8.19742e+08
[46] 1.22407e+09
[47] 1.32461e+09
[48] 1.28711e+09
[49] 1.35711e+09
[50] 1.21803e+09
[51] 1.28077e+09
[52] 1.2183e+09
[53] 1.41845e+09
[54] 1.27162e+09
[55] 1.30681e+09
[56] 1.22733e+09
[57] 1.11684e+09
[58] 1.24733e+09
[59] 1.20164e+09
[60] 1.22904e+09
[61] 1.17636e+09
[62] 1.23171e+09
[63] 1.22975e+09
[64] 1.29833e+09
[65] 1.36646e+09
[66] 1.19033e+09
[67] 1.28285e+09
[68] 1.30905e+09
[69] 1.2874e+09
[70] 1.20752e+09
[71] 1.32472e+09
[72] 1.20872e+09
[73] 1.23065e+09
[74] 1.31336e+09
[75] 1.38972e+09
[76] 1.1689e+09
[77] 1.3065e+09
[78] 1.30035e+09
[79] 1.31215e+09
[80] 1.32861e+09
[81] 1.2647e+09
[82] 1.4236e+09
[83] 1.32676e+09
[84] 1.24456e+09
[85] 1.36916e+09
[86] 1.30353e+09
[87] 1.42703e+09
[88] 1.25465e+09
[89] 1.2578e+09
[90] 1.33372e+09
[91] 1.38357e+09
[92] 1.46306e+09
[93] 1.42037e+09
[94] 1.38921e+09
[95] 1.4006e+09
[96] 1.40985e+09
[97] 1.41777e+09
[98] 1.18292e+09
[99] 1.22771e+09
[100] 1.2416e+09
[101] 1.02753e+09
[102] 1.13616e+09
[103] 1.19053e+09
[104] 1.23898e+09
[105] 1.25436e+09
[106] 1.19131e+09
[107] 1.21628e+09
RINFO(3) (local estimated flops for the elimination after
factorization):
[0] 1.60244e+12
[1] 1.74294e+12
[2] 1.69196e+12
[3] 1.695e+12
[4] 1.80765e+12
[5] 1.56607e+12
[6] 1.76597e+12
[7] 1.74922e+12
[8] 1.58864e+12
[9] 1.66604e+12
[10] 1.76361e+12
[11] 1.68972e+12
[12] 1.92682e+12
[13] 1.70654e+12
[14] 1.62132e+12
[15] 1.77676e+12
[16] 1.63572e+12
[17] 1.89394e+12
[18] 1.75041e+12
[19] 1.90894e+12
[20] 1.83295e+12
[21] 1.60131e+12
[22] 1.72745e+12
[23] 1.7096e+12
[24] 1.78206e+12
[25] 1.81382e+12
[26] 1.77662e+12
[27] 1.89438e+12
[28] 1.81961e+12
[29] 1.87533e+12
[30] 1.6349e+12
[31] 1.75706e+12
[32] 1.75646e+12
[33] 1.76596e+12
[34] 1.70886e+12
[35] 1.65914e+12
[36] 1.71771e+12
[37] 1.63745e+12
[38] 1.64457e+12
[39] 1.7803e+12
[40] 1.77006e+12
[41] 1.61189e+12
[42] 1.66595e+12
[43] 1.84219e+12
[44] 1.62509e+12
[45] 1.26457e+12
[46] 1.66727e+12
[47] 1.74436e+12
[48] 1.67484e+12
[49] 1.82673e+12
[50] 1.63331e+12
[51] 1.63468e+12
[52] 1.73827e+12
[53] 1.85881e+12
[54] 1.76778e+12
[55] 1.80892e+12
[56] 1.67656e+12
[57] 1.59896e+12
[58] 1.70286e+12
[59] 1.66236e+12
[60] 1.70077e+12
[61] 1.64734e+12
[62] 1.74903e+12
[63] 1.72385e+12
[64] 1.78797e+12
[65] 1.83683e+12
[66] 1.68984e+12
[67] 1.92741e+12
[68] 1.78981e+12
[69] 1.73937e+12
[70] 1.64734e+12
[71] 1.81129e+12
[72] 1.76783e+12
[73] 1.75595e+12
[74] 1.81577e+12
[75] 1.87818e+12
[76] 1.6395e+12
[77] 1.78884e+12
[78] 1.74558e+12
[79] 1.79986e+12
[80] 1.77469e+12
[81] 1.80164e+12
[82] 1.91219e+12
[83] 1.79518e+12
[84] 1.65173e+12
[85] 1.84183e+12
[86] 1.70264e+12
[87] 1.92076e+12
[88] 1.72546e+12
[89] 1.76506e+12
[90] 1.77798e+12
[91] 1.80609e+12
[92] 1.98728e+12
[93] 1.88273e+12
[94] 1.84458e+12
[95] 1.8155e+12
[96] 1.86762e+12
[97] 1.8405e+12
[98] 1.69509e+12
[99] 1.70999e+12
[100] 1.81556e+12
[101] 1.63428e+12
[102] 1.70419e+12
[103] 1.7708e+12
[104] 1.80242e+12
[105] 1.81926e+12
[106] 1.7836e+12
[107] 1.80131e+12
INFO(15) (estimated size of (in MB) MUMPS internal data for
running numerical factorization):
[0] 2300
[1] 1916
[2] 2013
[3] 2104
[4] 1855
[5] 2194
[6] 1920
[7] 2262
[8] 2078
[9] 1980
[10] 1843
[11] 2071
[12] 1933
[13] 2043
[14] 2564
[15] 1877
[16] 2369
[17] 2143
[18] 2200
[19] 2065
[20] 2058
[21] 2405
[22] 1915
[23] 2176
[24] 2075
[25] 2101
[26] 1920
[27] 1816
[28] 2079
[29] 1800
[30] 2212
[31] 2085
[32] 1901
[33] 2173
[34] 1904
[35] 1992
[36] 1955
[37] 2375
[38] 2147
[39] 1864
[40] 1784
[41] 1973
[42] 2236
[43] 1938
[44] 1889
[45] 2638
[46] 2163
[47] 2094
[48] 2086
[49] 1888
[50] 2170
[51] 2179
[52] 2055
[53] 1967
[54] 1995
[55] 1946
[56] 2166
[57] 2296
[58] 1958
[59] 1921
[60] 2118
[61] 2227
[62] 2273
[63] 2296
[64] 2093
[65] 2181
[66] 2025
[67] 1844
[68] 1919
[69] 2018
[70] 2169
[71] 2135
[72] 1854
[73] 2091
[74] 2056
[75] 2199
[76] 2139
[77] 2158
[78] 2157
[79] 2219
[80] 2284
[81] 1779
[82] 1890
[83] 1958
[84] 2150
[85] 2043
[86] 2325
[87] 1912
[88] 2000
[89] 1906
[90] 2056
[91] 1812
[92] 1914
[93] 2078
[94] 1914
[95] 2190
[96] 2211
[97] 2022
[98] 3255
[99] 3384
[100] 3480
[101] 4184
[102] 3364
[103] 3486
[104] 3348
[105] 3573
[106] 3379
[107] 3444
INFO(16) (size of (in MB) MUMPS internal data used during
numerical factorization):
[0] 2300
[1] 1916
[2] 2013
[3] 2104
[4] 1855
[5] 2194
[6] 1920
[7] 2262
[8] 2078
[9] 1980
[10] 1843
[11] 2071
[12] 1933
[13] 2043
[14] 2564
[15] 1877
[16] 2369
[17] 2143
[18] 2200
[19] 2065
[20] 2058
[21] 2405
[22] 1915
[23] 2176
[24] 2075
[25] 2101
[26] 1920
[27] 1816
[28] 2079
[29] 1800
[30] 2212
[31] 2085
[32] 1901
[33] 2173
[34] 1904
[35] 1992
[36] 1955
[37] 2375
[38] 2147
[39] 1864
[40] 1784
[41] 1973
[42] 2236
[43] 1938
[44] 1889
[45] 2638
[46] 2163
[47] 2094
[48] 2086
[49] 1888
[50] 2170
[51] 2179
[52] 2055
[53] 1967
[54] 1995
[55] 1946
[56] 2166
[57] 2296
[58] 1958
[59] 1921
[60] 2118
[61] 2227
[62] 2273
[63] 2296
[64] 2093
[65] 2181
[66] 2025
[67] 1844
[68] 1919
[69] 2018
[70] 2169
[71] 2135
[72] 1854
[73] 2091
[74] 2056
[75] 2199
[76] 2139
[77] 2158
[78] 2157
[79] 2219
[80] 2284
[81] 1779
[82] 1890
[83] 1958
[84] 2150
[85] 2043
[86] 2325
[87] 1912
[88] 2000
[89] 1906
[90] 2056
[91] 1812
[92] 1914
[93] 2078
[94] 1914
[95] 2190
[96] 2211
[97] 2022
[98] 3255
[99] 3384
[100] 3480
[101] 4184
[102] 3364
[103] 3486
[104] 3348
[105] 3573
[106] 3379
[107] 3444
INFO(23) (num of pivots eliminated on this processor after
factorization):
[0] 177460
[1] 35826
[2] 36168
[3] 21562
[4] 29864
[5] 30190
[6] 32434
[7] 32186
[8] 32828
[9] 57560
[10] 32870
[11] 38961
[12] 60784
[13] 52132
[14] 23454
[15] 41060
[16] 26563
[17] 54658
[18] 36806
[19] 30702
[20] 28316
[21] 34792
[22] 15746
[23] 16978
[24] 21104
[25] 23388
[26] 20728
[27] 21698
[28] 40120
[29] 26260
[30] 58238
[31] 21424
[32] 25210
[33] 2728
[34] 1784
[35] 18258
[36] 17222
[37] 25140
[38] 28888
[39] 29554
[40] 34856
[41] 17960
[42] 24771
[43] 72424
[44] 33690
[45] 39812
[46] 23474
[47] 41682
[48] 27212
[49] 29338
[50] 59726
[51] 39712
[52] 78260
[53] 57154
[54] 31556
[55] 2332
[56] 2427
[57] 3001
[58] 20384
[59] 23604
[60] 26968
[61] 23986
[62] 60086
[63] 51186
[64] 40922
[65] 32312
[66] 2220
[67] 11216
[68] 14900
[69] 16532
[70] 15728
[71] 20326
[72] 18738
[73] 22016
[74] 40980
[75] 38930
[76] 28656
[77] 29244
[78] 41942
[79] 21374
[80] 31886
[81] 24676
[82] 36636
[83] 30114
[84] 22164
[85] 31014
[86] 41380
[87] 25650
[88] 3469
[89] 2216
[90] 16120
[91] 14990
[92] 17696
[93] 26000
[94] 22886
[95] 41160
[96] 22980
[97] 29318
[98] 196482
[99] 197634
[100] 286590
[101] 279818
[102] 203248
[103] 232626
[104] 245590
[105] 217912
[106] 252344
[107] 215358
RINFOG(1) (global estimated flops for the elimination after
analysis): 1.88847e+14
RINFOG(2) (global estimated flops for the assembly after
factorization): 1.3691e+11
RINFOG(3) (global estimated flops for the elimination after
factorization): 1.88713e+14
(RINFOG(12) RINFOG(13))*2^INFOG(34) (determinant): (0.,0.)*(2^0)
INFOG(3) (estimated real workspace for factors on all processors
after analysis): -11765
INFOG(4) (estimated integer workspace for factors on all
processors after analysis): 114029512
INFOG(5) (estimated maximum front size in the complete tree):
40552
INFOG(6) (number of nodes in the complete tree): 1135950
INFOG(7) (ordering option effectively used after analysis): 5
INFOG(8) (structural symmetry in percent of the permuted matrix
after analysis): -1
INFOG(9) (total real/complex workspace to store the matrix
factors after factorization): -11491
INFOG(10) (total integer space store the matrix factors after
factorization): 112027552
INFOG(11) (order of largest frontal matrix after factorization):
40546
INFOG(12) (number of off-diagonal pivots): 648
INFOG(13) (number of delayed pivots after factorization): 22
INFOG(14) (number of memory compress after factorization): 855
INFOG(15) (number of steps of iterative refinement after
solution): 0
INFOG(16) (estimated size (in MB) of all MUMPS internal data for
factorization after analysis: value on the most memory consuming processor):
4184
INFOG(17) (estimated size of all MUMPS internal data for
factorization after analysis: sum over all processors): 237537
INFOG(18) (size of all MUMPS internal data allocated during
factorization: value on the most memory consuming processor): 4184
INFOG(19) (size of all MUMPS internal data allocated during
factorization: sum over all processors): 237537
INFOG(20) (estimated number of entries in the factors): -11493
INFOG(21) (size in MB of memory effectively used during
factorization - value on the most memory consuming processor): 3087
INFOG(22) (size in MB of memory effectively used during
factorization - sum over all processors): 188082
INFOG(23) (after analysis: value of ICNTL(6) effectively used): 0
INFOG(24) (after analysis: value of ICNTL(12) effectively used): 1
INFOG(25) (after factorization: number of pivots modified by
static pivoting): 0
INFOG(28) (after factorization: number of null pivots
encountered): 0
INFOG(29) (after factorization: effective number of entries in
the factors (sum over all processors)): -11491
INFOG(30, 31) (after solution: size in Mbytes of memory used
during solution phase): 4030, 222811
INFOG(32) (after analysis: type of analysis done): 1
INFOG(33) (value used for ICNTL(8)): 7
INFOG(34) (exponent of the determinant if determinant is
requested): 0
INFOG(35) (after factorization: number of entries taking into
account BLR factor compression - sum over all processors): -11491
INFOG(36) (after analysis: estimated size of all MUMPS internal
data for running BLR in-core - value on the most memory consuming processor): 0
INFOG(37) (after analysis: estimated size of all MUMPS internal
data for running BLR in-core - sum over all processors): 0
INFOG(38) (after analysis: estimated size of all MUMPS internal
data for running BLR out-of-core - value on the most memory consuming
processor): 0
INFOG(39) (after analysis: estimated size of all MUMPS internal
data for running BLR out-of-core - sum over all processors): 0
linear system matrix = precond matrix:
Mat Object: (cs_) 108 MPI processes
type: mpiaij
rows=5351238, cols=5351238
total: nonzeros=74533580, allocated nonzeros=149067160
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines
Residual norms for cs_ solve.
0 KSP none resid norm 1.254940857906e+01 true resid norm 6.348257807283e-15
||r(i)||/||b|| 5.058611142740e-16
1 KSP none resid norm 6.348257807283e-15 true resid norm 6.348257807283e-15
||r(i)||/||b|| 5.058611142740e-16
Linear cs_ solve converged due to CONVERGED_ITS iterations 1
KSP Object: (cs_) 108 MPI processes
type: preonly
maximum iterations=10000, initial guess is zero
tolerances: relative=1e-05, absolute=1e-50, divergence=10000.
left preconditioning
using NONE norm type for convergence test
PC Object: (cs_) 108 MPI processes
type: lu
out-of-place factorization
tolerance for zero pivot 2.22045e-14
matrix ordering: external
factor fill ratio given 0., needed 0.
Factored matrix follows:
Mat Object: 108 MPI processes
type: superlu_dist
rows=5351238, cols=5351238
package used to perform factorization: superlu_dist
total: nonzeros=0, allocated nonzeros=0
SuperLU_DIST run parameters:
Process grid nprow 9 x npcol 12
Equilibrate matrix TRUE
Replace tiny pivots FALSE
Use iterative refinement FALSE
Processors in row 9 col partition 12
Row permutation LargeDiag_MC64
Column permutation METIS_AT_PLUS_A
Parallel symbolic factorization FALSE
Repeated factorization SamePattern
linear system matrix = precond matrix:
Mat Object: (cs_) 108 MPI processes
type: mpiaij
rows=5351238, cols=5351238
total: nonzeros=74533580, allocated nonzeros=149067160
total number of mallocs used during MatSetValues calls=0
not using I-node (on process 0) routines