Sherry, We set '-mat_superlu_dist_fact SamePattern' as default in petsc/superlu_dist on 12/6/15 (see attached email below).
However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his code. Checking http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html I see detailed description on using SamePattern_SameRowPerm, which requires more from user than SamePattern. I guess these flags are used for efficiency. The library sets a default, then have users to switch for their own applications. The default setting should not cause crash. If crash occurs, give a meaningful error message would be help. Do you have suggestion how should we set default in petsc for this flag? Hong ------------------- Hong <[email protected]> 12/7/15 to Danyang, petsc-maint, PETSc, Xiaoye Danyang : Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is how I figured it out. 1. Reading ex52f.F, I see '-superlu_default' = '-pc_factor_mat_solver_package superlu_dist', the later enables runtime options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the tests below. ... 5. Using a_flow_check_1.bin, I am able to reproduce the error you reported: all packages give correct results except superlu_dist: ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package superlu_dist Norm of error 2.5970E-12 iterations 1 -->Test for matrix 168 Norm of error 1.3936E-01 iterations 34 -->Test for matrix 169 I guess the error might come from reuse of matrix factor. Replacing default -mat_superlu_dist_fact <SamePattern_SameRowPerm> with -mat_superlu_dist_fact SamePattern, I get ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package superlu_dist -mat_superlu_dist_fact SamePattern Norm of error 2.5970E-12 iterations 1 -->Test for matrix 168 ... Sherry may tell you why SamePattern_SameRowPerm cause the difference here. Best on the above experiments, I would set following as default '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface. '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface. Hong On Tue, Oct 25, 2016 at 10:38 AM, Hong <[email protected]> wrote: > Anton, > I guess, when you reuse matrix and its symbolic factor with updated > numerical values, superlu_dist requires this option. I'm cc'ing Sherry to > confirm it. > > I'll check petsc/superlu-dist interface to set this flag for this case. > > Hong > > > On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov <[email protected]> wrote: > >> Hong, >> >> I get all the problems gone and valgrind-clean output if I specify this: >> >> -mat_superlu_dist_fact SamePattern_SameRowPerm >> What does SamePattern_SameRowPerm actually mean? >> Row permutations are for large diagonal, column permutations are for >> sparsity, right? >> Will it skip subsequent matrix permutations for large diagonal even if >> matrix values change significantly? >> >> Surprisingly everything works even with: >> >> -mat_superlu_dist_colperm PARMETIS >> -mat_superlu_dist_parsymbfact TRUE >> >> Thanks, >> Anton >> >> On 10/24/2016 09:06 PM, Hong wrote: >> >> Anton: >>> >>> If replacing superlu_dist with mumps, does your code work? >>> >>> yes >>> >> >> You may use mumps in your code, or tests different options for >> superlu_dist: >> >> -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None) >> -mat_superlu_dist_rowperm <LargeDiag> Row permutation (choose one of) >> LargeDiag NATURAL (None) >> -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column permutation (choose >> one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None) >> -mat_superlu_dist_replacetinypivot: <FALSE> Replace tiny pivots (None) >> -mat_superlu_dist_parsymbfact: <FALSE> Parallel symbolic factorization >> (None) >> -mat_superlu_dist_fact <SamePattern> Sparsity pattern for repeated >> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm >> (None) >> >> The options inside <> are defaults. You may try others. This might help >> narrow down the bug. >> >> Hong >> >>> >>> Hong >>>> >>>> On 10/24/2016 05:47 PM, Hong wrote: >>>> >>>> Barry, >>>> Your change indeed fixed the error of his testing code. >>>> As Satish tested, on your branch, ex16 runs smooth. >>>> >>>> I do not understand why on maint or master branch, ex16 creases inside >>>> superlu_dist, but not with mumps. >>>> >>>> >>>> I also confirm that ex16 runs fine with latest fix, but unfortunately >>>> not my code. >>>> >>>> This is something to be expected, since my code preallocates once in >>>> the beginning. So there is no way it can be affected by multiple >>>> preallocations. Subsequently I only do matrix assembly, that makes sure >>>> structure doesn't change (set to get error otherwise). >>>> >>>> Summary: we don't have a simple test code to debug superlu issue >>>> anymore. >>>> >>>> Anton >>>> >>>> Hong >>>> >>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay <[email protected]> >>>> wrote: >>>> >>>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>>> >>>>> > >>>>> > > [Or perhaps Hong is using a different test code and is observing >>>>> bugs >>>>> > > with superlu_dist interface..] >>>>> > >>>>> > She states that her test does a NEW MatCreate() for each matrix >>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>>>> only related to using the SAME matrix from one MatLoad() in another >>>>> MatLoad(). >>>>> >>>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>>> >>>>> Satish >>>>> >>>> >>>> >>>> >>> >>> >> >> >
