Danyang : Your matrices are ill-conditioned, numerically singular with Recip. condition number = 6.000846e-16 Recip. condition number = 2.256434e-27 Recip. condition number = 1.256452e-18 i.e., condition numbers = O(1.e16 - 1.e27), there is no accuracy in computed solution.
I checked your matrix 168 - 172, got Recip. condition number = 1.548816e-12. You need check your model to understand why the matrices are so ill-conditioned. Hong Hi Hong, > > Sorry to bother you again. The modified code works much better than before > using both superlu or mumps. However, it still encounters failure. The case > is similar with the previous one, ill-conditioned matrices. > > The code crashed after a long time simulation if I use superlu_dist, but > will not fail if use superlu. I restart the simulation before the time it > crashes and can reproduce the following error > > timestep: 22 time: 1.750E+04 years delt: 2.500E+00 years iter: > 1 max.sia: 5.053E-03 tol.sia: 1.000E-01 > Newton Iteration Convergence Summary: > Newton maximum maximum solver > iteration updatePa updateTemp residual iterations maxvolpa > maxvoltemp nexvolpa nexvoltemp > 1 0.1531E+08 0.1755E+04 0.6920E-05 1 5585 > 4402 5814 5814 > > *** Error in `../program_test': malloc(): memory corruption: > 0x0000000003a70d50 *** > Program received signal SIGABRT: Process abort signal. > Backtrace for this error: > > The solver failed at timestep 22, Newton iteration 2. I exported the > matrices at timestep 1 (matrix 1) and timestep 22 (matrix 140 and 141). > Matrix 141 is where it failed. The three matrices here are not > ill-conditioned form the estimated value. > > I did the same using the new modified ex52f code and found pretty > different results for matrix 141. The norm by superlu is much acceptable > than superlu_dist. In this test, memory corruption was not detected. The > codes and example data can be download from the link below. > > https://www.dropbox.com/s/i1ls0bg0vt7gu0v/petsc-superlu-test2.tar.gz?dl=0 > > > ****************More test on matrix_and_rhs_bin2******************* > mpiexec.hydra -n 1 ./ex52f -f0 ./matrix_and_rhs_bin2/a_flow_check_1.bin > -rhs ./matrix_and_rhs_bin2/b_flow_check_1.bin -loop_matrices flow_check > -loop_folder ./matrix_and_rhs_bin2 -matrix_index_start 140 > -matrix_index_end 141 -pc_type lu -pc_factor_mat_solver_package superlu > -ksp_monitor_true_residual -mat_superlu_conditionnumber > -->loac matrix a > -->load rhs b > size l,m,n,mm 90000 90000 90000 90000 > Recip. condition number = 6.000846e-16 > 0 KSP preconditioned resid norm 1.146871454377e+08 true resid norm > 4.711091037809e+03 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.071118508260e-06 true resid norm > 3.363767171515e-08 ||r(i)||/||b|| 7.140102249181e-12 > Norm of error 3.3638E-08 iterations 1 > -->Test for matrix 140 > Recip. condition number = 2.256434e-27 > 0 KSP preconditioned resid norm 2.084372893355e+14 true resid norm > 4.711091037809e+03 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 4.689629276419e+00 true resid norm > 1.037236635337e-01 ||r(i)||/||b|| 2.201690918330e-05 > Norm of error 1.0372E-01 iterations 1 > -->Test for matrix 141 > Recip. condition number = 1.256452e-18 > 0 KSP preconditioned resid norm 1.055488964519e+08 true resid norm > 4.711091037809e+03 ||r(i)||/||b|| 1.000000000000e+00 > 1 KSP preconditioned resid norm 2.998827511681e-04 true resid norm > 4.805214542776e-04 ||r(i)||/||b|| 1.019979130994e-07 > Norm of error 4.8052E-04 iterations 1 > --> End of test, bye > > > mpiexec.hydra -n 1 ./ex52f -f0 ./matrix_and_rhs_bin2/a_flow_check_1.bin > -rhs ./matrix_and_rhs_bin2/b_flow_check_1.bin -loop_matrices flow_check > -loop_folder ./matrix_and_rhs_bin2 -matrix_index_start 140 > -matrix_index_end 141 -pc_type lu -pc_factor_mat_solver_package > superlu_dist > -->loac matrix a > -->load rhs b > size l,m,n,mm 90000 90000 90000 90000 > Norm of error 3.6752E-08 iterations 1 > -->Test for matrix 140 > Norm of error 1.6335E-01 iterations 1 > -->Test for matrix 141 > Norm of error 3.4345E+01 iterations 1 > --> End of test, bye > > Thanks, > > Danyang > > On 15-12-07 12:01 PM, Hong wrote: > > Danyang: > Add 'call MatSetFromOptions(A,ierr)' to your code. > Attached below is ex52f.F modified from your ex52f.F to be compatible with > petsc-dev. > > Hong > > Hello Hong, >> >> Thanks for the quick reply and the option "-mat_superlu_dist_fact >> SamePattern" works like a charm, if I use this option from the command >> line. >> >> How can I add this option as the default. I tried using >> PetscOptionsInsertString("-mat_superlu_dist_fact SamePattern",ierr) in my >> code but this does not work. >> >> Thanks, >> >> Danyang >> >> >> On 15-12-07 10:42 AM, Hong wrote: >> >> Danyang : >> >> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is >> how I figured it out. >> >> 1. Reading ex52f.F, I see '-superlu_default' = >> '-pc_factor_mat_solver_package superlu_dist', the later enables runtime >> options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the >> tests below. >> >> 2. Use the Matrix 168 to setup KSP solver and factorization, all >> packages, petsc, superlu_dist and mumps give same correct results: >> >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> petsc >> -->loac matrix a >> -->load rhs b >> size l,m,n,mm 90000 90000 90000 90000 >> Norm of error 7.7308E-11 iterations 1 >> -->Test for matrix 168 >> .. >> -->Test for matrix 172 >> Norm of error 3.8461E-11 iterations 1 >> >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu_dist >> Norm of error 9.4073E-11 iterations 1 >> -->Test for matrix 168 >> ... >> -->Test for matrix 172 >> Norm of error 3.8187E-11 iterations 1 >> >> 3. Use superlu, I get >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu >> Norm of error 1.0191E-06 iterations 1 >> -->Test for matrix 168 >> ... >> -->Test for matrix 172 >> Norm of error 9.7858E-07 iterations 1 >> >> Replacing default DiagPivotThresh: 1. to 0.0, I get same solutions as >> other packages: >> >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_168.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu -mat_superlu_diagpivotthresh 0.0 >> >> Norm of error 8.3614E-11 iterations 1 >> -->Test for matrix 168 >> ... >> -->Test for matrix 172 >> Norm of error 3.7098E-11 iterations 1 >> >> 4. >> using '-mat_view ascii::ascii_info', I found that a_flow_check_1.bin and >> a_flow_check_168.bin seem have same structure: >> >> -->loac matrix a >> Mat Object: 1 MPI processes >> type: seqaij >> rows=90000, cols=90000 >> total: nonzeros=895600, allocated nonzeros=895600 >> total number of mallocs used during MatSetValues calls =0 >> using I-node routines: found 45000 nodes, limit used is 5 >> >> 5. >> Using a_flow_check_1.bin, I am able to reproduce the error you reported: >> all packages give correct results except superlu_dist: >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu_dist >> Norm of error 2.5970E-12 iterations 1 >> -->Test for matrix 168 >> Norm of error 1.3936E-01 iterations 34 >> -->Test for matrix 169 >> >> I guess the error might come from reuse of matrix factor. Replacing >> default >> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with >> -mat_superlu_dist_fact SamePattern, I get >> >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu_dist -mat_superlu_dist_fact SamePattern >> >> Norm of error 2.5970E-12 iterations 1 >> -->Test for matrix 168 >> Norm of error 9.4073E-11 iterations 1 >> -->Test for matrix 169 >> Norm of error 6.4303E-11 iterations 1 >> -->Test for matrix 170 >> Norm of error 7.4327E-11 iterations 1 >> -->Test for matrix 171 >> Norm of error 5.4162E-11 iterations 1 >> -->Test for matrix 172 >> Norm of error 3.4440E-11 iterations 1 >> --> End of test, bye >> >> Sherry may tell you why SamePattern_SameRowPerm cause the difference here. >> Best on the above experiments, I would set following as default >> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface. >> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface. >> >> Hong >> >> Hi Hong, >>> >>> I did more test today and finally found that the solution accuracy >>> depends on the initial (first) matrix quality. I modified the ex52f.F to do >>> the test. There are 6 matrices and right-hand-side vectors. All these >>> matrices and rhs are from my reactive transport simulation. Results will be >>> quite different depending on which one you use to do factorization. Results >>> will also be different if you run with different options. My code is >>> similar to the First or the Second test below. When the matrix is well >>> conditioned, it works fine. But if the initial matrix is well conditioned, >>> it likely to crash when the matrix become ill-conditioned. Since most of my >>> case are well conditioned so I didn't detect the problem before. This case >>> is a special one. >>> >>> >>> How can I avoid this problem? Shall I redo factorization? Can PETSc >>> automatically detect this prolbem or is there any option available to do >>> this? >>> >>> All the data and test code (modified ex52f) can be found via the dropbox >>> link below. >>> >>> *https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0 >>> <https://www.dropbox.com/s/4al1a60creogd8m/petsc-superlu-test.tar.gz?dl=0>* >>> >>> >>> Summary of my test is shown below. >>> >>> First, use the Matrix 1 to setup KSP solver and factorization, then >>> solve 168 to 172 >>> >>> mpiexec.hydra -n 1 ./ex52f -f0 >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin >>> -rhs >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin >>> -loop_matrices flow_check -loop_folder >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> Norm of error 3.8815E-11 iterations 1 >>> -->Test for matrix 168 >>> Norm of error 4.2307E-01 iterations 32 >>> -->Test for matrix 169 >>> Norm of error 3.0528E-01 iterations 32 >>> -->Test for matrix 170 >>> Norm of error 3.1177E-01 iterations 32 >>> -->Test for matrix 171 >>> Norm of error 3.2793E-01 iterations 32 >>> -->Test for matrix 172 >>> Norm of error 3.1251E-01 iterations 31 >>> >>> Second, use the Matrix 1 to setup KSP solver and factorization using the >>> implemented SuperLU relative codes. I thought this will generate the same >>> results as the First test, but it actually not. >>> >>> mpiexec.hydra -n 1 ./ex52f -f0 >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_1.bin >>> -rhs >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_1.bin >>> -loop_matrices flow_check -loop_folder >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default >>> >>> Norm of error 2.2632E-12 iterations 1 >>> -->Test for matrix 168 >>> Norm of error 1.0817E+04 iterations 1 >>> -->Test for matrix 169 >>> Norm of error 1.0786E+04 iterations 1 >>> -->Test for matrix 170 >>> Norm of error 1.0792E+04 iterations 1 >>> -->Test for matrix 171 >>> Norm of error 1.0792E+04 iterations 1 >>> -->Test for matrix 172 >>> Norm of error 1.0792E+04 iterations 1 >>> >>> >>> Third, use the Matrix 168 to setup KSP solver and factorization, then >>> solve 168 to 172 >>> >>> mpiexec.hydra -n 1 ./ex52f -f0 >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin >>> -rhs >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin >>> -loop_matrices flow_check -loop_folder >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -pc_type lu >>> -pc_factor_mat_solver_package superlu_dist >>> >>> Norm of error 9.5528E-10 iterations 1 >>> -->Test for matrix 168 >>> Norm of error 9.4945E-10 iterations 1 >>> -->Test for matrix 169 >>> Norm of error 6.4279E-10 iterations 1 >>> -->Test for matrix 170 >>> Norm of error 7.4633E-10 iterations 1 >>> -->Test for matrix 171 >>> Norm of error 7.4863E-10 iterations 1 >>> -->Test for matrix 172 >>> Norm of error 8.9701E-10 iterations 1 >>> >>> Fourth, use the Matrix 168 to setup KSP solver and factorization using >>> the implemented SuperLU relative codes. I thought this will generate the >>> same results as the Third test, but it actually not. >>> >>> mpiexec.hydra -n 1 ./ex52f -f0 >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/a_flow_check_168.bin >>> -rhs >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin/b_flow_check_168.bin >>> -loop_matrices flow_check -loop_folder >>> /home/dsu/work/petsc-superlu-test/matrix_and_rhs_bin -superlu_default >>> >>> Norm of error 3.7017E-11 iterations 1 >>> -->Test for matrix 168 >>> Norm of error 3.6420E-11 iterations 1 >>> -->Test for matrix 169 >>> Norm of error 3.7184E-11 iterations 1 >>> -->Test for matrix 170 >>> Norm of error 3.6847E-11 iterations 1 >>> -->Test for matrix 171 >>> Norm of error 3.7883E-11 iterations 1 >>> -->Test for matrix 172 >>> Norm of error 3.8805E-11 iterations 1 >>> >>> Thanks very much, >>> >>> Danyang >>> >>> On 15-12-03 01:59 PM, Hong wrote: >>> >>> Danyang : >>> Further testing a_flow_check_168.bin, >>> ./ex10 -f0 /Users/Hong/Downloads/matrix_and_rhs_bin/a_flow_check_168.bin >>> -rhs /Users/Hong/Downloads/matrix_and_rhs_bin/x_flow_check_168.bin -pc_type >>> lu -pc_factor_mat_solver_package superlu -ksp_monitor_true_residual >>> -mat_superlu_conditionnumber >>> Recip. condition number = 1.610480e-12 >>> 0 KSP preconditioned resid norm 6.873340313547e+09 true resid norm >>> 7.295020990196e+03 ||r(i)||/||b|| 1.000000000000e+00 >>> 1 KSP preconditioned resid norm 2.051833296449e-02 true resid norm >>> 2.976859070118e-02 ||r(i)||/||b|| 4.080672384793e-06 >>> Number of iterations = 1 >>> Residual norm 0.0297686 >>> >>> condition number of this matrix = 1/1.610480e-12 = 1.e+12, >>> i.e., this matrix is ill-conditioned. >>> >>> Hong >>> >>> >>> Hi Hong, >>>> >>>> The binary format of matrix, rhs and solution can be downloaded via the >>>> link below. >>>> >>>> https://www.dropbox.com/s/cl3gfi0s0kjlktf/matrix_and_rhs_bin.tar.gz?dl=0 >>>> >>>> Thanks, >>>> >>>> Danyang >>>> >>>> >>>> On 15-12-03 10:50 AM, Hong wrote: >>>> >>>> Danyang: >>>> >>>>> >>>>> >>>>> To my surprising, solutions from SuperLU at timestep 29 seems not >>>>> correct for the first 4 Newton iterations, but the solutions from >>>>> iteration >>>>> solver and MUMPS are correct. >>>>> >>>>> Please find all the matrices, rhs and solutions at timestep 29 via the >>>>> link below. The data is a bit large so that I just share it through >>>>> Dropbox. A piece of matlab code to read these data and then computer the >>>>> norm has also been attached. >>>>> * >>>>> <https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0>https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0 >>>>> <https://www.dropbox.com/s/rr8ueysgflmxs7h/results-check.tar.gz?dl=0>* >>>>> >>>> >>>> Can you send us matrix in petsc binary format? >>>> >>>> e.g., call MatView(M, PETSC_VIEWER_BINARY_(PETSC_COMM_WORLD)) >>>> or '-ksp_view_mat binary' >>>> >>>> Hong >>>> >>>>> >>>>> >>>>> Below is a summary of the norm from the three solvers at timestep 29, >>>>> newton iteration 1 to 5. >>>>> >>>>> Timestep 29 >>>>> Norm of residual seq 1.661321e-09, superlu 1.657103e+04, mumps >>>>> 3.731225e-11 >>>>> Norm of residual seq 1.753079e-09, superlu 6.675467e+02, mumps >>>>> 1.509919e-13 >>>>> Norm of residual seq 4.914971e-10, superlu 1.236362e-01, mumps >>>>> 2.139303e-17 >>>>> Norm of residual seq 3.532769e-10, superlu 1.304670e-04, mumps >>>>> 5.387000e-20 >>>>> Norm of residual seq 3.885629e-10, superlu 2.754876e-07, mumps >>>>> 4.108675e-21 >>>>> >>>>> Would anybody please check if SuperLU can solve these matrices? >>>>> Another possibility is that something is wrong in my own code. But so far, >>>>> I cannot find any problem in my code since the same code works fine if I >>>>> using iterative solver or direct solver MUMPS. But for other cases I have >>>>> tested, all these solvers work fine. >>>>> >>>>> Please let me know if I did not write down the problem clearly. >>>>> >>>>> Thanks, >>>>> >>>>> Danyang >>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >> > >
