Some graph preprocessing steps can be skipped ONLY IF a previous factorization was done, and the information can be reused (AS INPUT) to the new factorization.
In general, the driver routine SRC/pdgssvx.c() performs the LU factorization of the following (preprocessed) matrix: Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U The default is to do LU from scratch, including all the steps to compute equilibration (R, C), pivot ordering (Pr), and sparsity ordering (Pc). -- The default should be set as options.Fact = DOFACT. -- When you set options.Fact = SamePattern, the sparsity ordering step is skipped, but you need to input Pc which was obtained from a previous factorization. -- When you set options.Fact = SamePattern_SameRowPerm, both sparsity reordering and pivoting ordering steps are skipped, but you need to input both Pr and Pc. Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details, regarding which data structures should be inputs and which are outputs. The Users Guide also explains this. In EXAMPLE/ directory, I have various examples of these usage situations, see EXAMPLE/README. I am a little puzzled why in PETSc, the default is set to SamePattern ?? Sherry On Tue, Oct 25, 2016 at 9:18 AM, Hong <[email protected]> wrote: > Sherry, > > We set '-mat_superlu_dist_fact SamePattern' as default in > petsc/superlu_dist on 12/6/15 (see attached email below). > > However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his > code. Checking > http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_ > code_html/pzgssvx___a_bglobal_8c.html > I see detailed description on using SamePattern_SameRowPerm, which > requires more from user than SamePattern. I guess these flags are used > for efficiency. The library sets a default, then have users to switch for > their own applications. The default setting should not cause crash. If > crash occurs, give a meaningful error message would be help. > > Do you have suggestion how should we set default in petsc for this flag? > > Hong > > ------------------- > Hong <[email protected]> > 12/7/15 > to Danyang, petsc-maint, PETSc, Xiaoye > Danyang : > > Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is > how I figured it out. > > 1. Reading ex52f.F, I see '-superlu_default' = > '-pc_factor_mat_solver_package superlu_dist', the later enables runtime > options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the > tests below. > ... > 5. > Using a_flow_check_1.bin, I am able to reproduce the error you reported: > all packages give correct results except superlu_dist: > ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs > matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check > -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package > superlu_dist > Norm of error 2.5970E-12 iterations 1 > -->Test for matrix 168 > Norm of error 1.3936E-01 iterations 34 > -->Test for matrix 169 > > I guess the error might come from reuse of matrix factor. Replacing default > -mat_superlu_dist_fact <SamePattern_SameRowPerm> with > -mat_superlu_dist_fact SamePattern, I get > > ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs > matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check > -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package > superlu_dist -mat_superlu_dist_fact SamePattern > > Norm of error 2.5970E-12 iterations 1 > -->Test for matrix 168 > ... > Sherry may tell you why SamePattern_SameRowPerm cause the difference here. > Best on the above experiments, I would set following as default > '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface. > '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface. > > Hong > > On Tue, Oct 25, 2016 at 10:38 AM, Hong <[email protected]> wrote: > >> Anton, >> I guess, when you reuse matrix and its symbolic factor with updated >> numerical values, superlu_dist requires this option. I'm cc'ing Sherry to >> confirm it. >> >> I'll check petsc/superlu-dist interface to set this flag for this case. >> >> Hong >> >> >> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov <[email protected]> wrote: >> >>> Hong, >>> >>> I get all the problems gone and valgrind-clean output if I specify this: >>> >>> -mat_superlu_dist_fact SamePattern_SameRowPerm >>> What does SamePattern_SameRowPerm actually mean? >>> Row permutations are for large diagonal, column permutations are for >>> sparsity, right? >>> Will it skip subsequent matrix permutations for large diagonal even if >>> matrix values change significantly? >>> >>> Surprisingly everything works even with: >>> >>> -mat_superlu_dist_colperm PARMETIS >>> -mat_superlu_dist_parsymbfact TRUE >>> >>> Thanks, >>> Anton >>> >>> On 10/24/2016 09:06 PM, Hong wrote: >>> >>> Anton: >>>> >>>> If replacing superlu_dist with mumps, does your code work? >>>> >>>> yes >>>> >>> >>> You may use mumps in your code, or tests different options for >>> superlu_dist: >>> >>> -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None) >>> -mat_superlu_dist_rowperm <LargeDiag> Row permutation (choose one of) >>> LargeDiag NATURAL (None) >>> -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column permutation (choose >>> one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None) >>> -mat_superlu_dist_replacetinypivot: <FALSE> Replace tiny pivots (None) >>> -mat_superlu_dist_parsymbfact: <FALSE> Parallel symbolic factorization >>> (None) >>> -mat_superlu_dist_fact <SamePattern> Sparsity pattern for repeated >>> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm >>> (None) >>> >>> The options inside <> are defaults. You may try others. This might help >>> narrow down the bug. >>> >>> Hong >>> >>>> >>>> Hong >>>>> >>>>> On 10/24/2016 05:47 PM, Hong wrote: >>>>> >>>>> Barry, >>>>> Your change indeed fixed the error of his testing code. >>>>> As Satish tested, on your branch, ex16 runs smooth. >>>>> >>>>> I do not understand why on maint or master branch, ex16 creases inside >>>>> superlu_dist, but not with mumps. >>>>> >>>>> >>>>> I also confirm that ex16 runs fine with latest fix, but unfortunately >>>>> not my code. >>>>> >>>>> This is something to be expected, since my code preallocates once in >>>>> the beginning. So there is no way it can be affected by multiple >>>>> preallocations. Subsequently I only do matrix assembly, that makes sure >>>>> structure doesn't change (set to get error otherwise). >>>>> >>>>> Summary: we don't have a simple test code to debug superlu issue >>>>> anymore. >>>>> >>>>> Anton >>>>> >>>>> Hong >>>>> >>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay <[email protected]> >>>>> wrote: >>>>> >>>>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>>>> >>>>>> > >>>>>> > > [Or perhaps Hong is using a different test code and is observing >>>>>> bugs >>>>>> > > with superlu_dist interface..] >>>>>> > >>>>>> > She states that her test does a NEW MatCreate() for each matrix >>>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>>>>> only related to using the SAME matrix from one MatLoad() in another >>>>>> MatLoad(). >>>>>> >>>>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>>>> >>>>>> Satish >>>>>> >>>>> >>>>> >>>>> >>>> >>>> >>> >>> >> >
