Sherry, Thanks for detailed explanation. We use options.Fact = DOFACT as default for the first factorization. When user reuses matrix factor, then we must provide a default, either 'options.Fact = SamePattern' or 'SamePattern_SameRowPerm'. We previously set 'SamePattern_SameRowPerm'. After a user reported error, we switched to 'SamePattern' which causes problem for 2nd user.
I'll check our interface to see if we can add flag-checking for Pr and Pc, then set default accordingly. Hong On Wed, Oct 26, 2016 at 3:23 PM, Xiaoye S. Li <[email protected]> wrote: > Some graph preprocessing steps can be skipped ONLY IF a previous > factorization was done, and the information can be reused (AS INPUT) to the > new factorization. > > In general, the driver routine SRC/pdgssvx.c() performs the LU > factorization of the following (preprocessed) matrix: > Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U > > The default is to do LU from scratch, including all the steps to compute > equilibration (R, C), pivot ordering (Pr), and sparsity ordering (Pc). > > -- The default should be set as options.Fact = DOFACT. > > -- When you set options.Fact = SamePattern, the sparsity ordering step is > skipped, but you need to input Pc which was obtained from a previous > factorization. > > -- When you set options.Fact = SamePattern_SameRowPerm, both sparsity > reordering and pivoting ordering steps are skipped, but you need to input > both Pr and Pc. > > Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details, > regarding which data structures should be inputs and which are outputs. > The Users Guide also explains this. > > In EXAMPLE/ directory, I have various examples of these usage situations, > see EXAMPLE/README. > > I am a little puzzled why in PETSc, the default is set to SamePattern ?? > > Sherry > > > On Tue, Oct 25, 2016 at 9:18 AM, Hong <[email protected]> wrote: > >> Sherry, >> >> We set '-mat_superlu_dist_fact SamePattern' as default in >> petsc/superlu_dist on 12/6/15 (see attached email below). >> >> However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his >> code. Checking >> http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_ >> html/pzgssvx___a_bglobal_8c.html >> I see detailed description on using SamePattern_SameRowPerm, which >> requires more from user than SamePattern. I guess these flags are used >> for efficiency. The library sets a default, then have users to switch for >> their own applications. The default setting should not cause crash. If >> crash occurs, give a meaningful error message would be help. >> >> Do you have suggestion how should we set default in petsc for this flag? >> >> Hong >> >> ------------------- >> Hong <[email protected]> >> 12/7/15 >> to Danyang, petsc-maint, PETSc, Xiaoye >> Danyang : >> >> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is >> how I figured it out. >> >> 1. Reading ex52f.F, I see '-superlu_default' = >> '-pc_factor_mat_solver_package superlu_dist', the later enables runtime >> options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the >> tests below. >> ... >> 5. >> Using a_flow_check_1.bin, I am able to reproduce the error you reported: >> all packages give correct results except superlu_dist: >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu_dist >> Norm of error 2.5970E-12 iterations 1 >> -->Test for matrix 168 >> Norm of error 1.3936E-01 iterations 34 >> -->Test for matrix 169 >> >> I guess the error might come from reuse of matrix factor. Replacing >> default >> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with >> -mat_superlu_dist_fact SamePattern, I get >> >> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs >> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check >> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package >> superlu_dist -mat_superlu_dist_fact SamePattern >> >> Norm of error 2.5970E-12 iterations 1 >> -->Test for matrix 168 >> ... >> Sherry may tell you why SamePattern_SameRowPerm cause the difference >> here. >> Best on the above experiments, I would set following as default >> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface. >> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface. >> >> Hong >> >> On Tue, Oct 25, 2016 at 10:38 AM, Hong <[email protected]> wrote: >> >>> Anton, >>> I guess, when you reuse matrix and its symbolic factor with updated >>> numerical values, superlu_dist requires this option. I'm cc'ing Sherry to >>> confirm it. >>> >>> I'll check petsc/superlu-dist interface to set this flag for this case. >>> >>> Hong >>> >>> >>> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov <[email protected]> wrote: >>> >>>> Hong, >>>> >>>> I get all the problems gone and valgrind-clean output if I specify this: >>>> >>>> -mat_superlu_dist_fact SamePattern_SameRowPerm >>>> What does SamePattern_SameRowPerm actually mean? >>>> Row permutations are for large diagonal, column permutations are for >>>> sparsity, right? >>>> Will it skip subsequent matrix permutations for large diagonal even if >>>> matrix values change significantly? >>>> >>>> Surprisingly everything works even with: >>>> >>>> -mat_superlu_dist_colperm PARMETIS >>>> -mat_superlu_dist_parsymbfact TRUE >>>> >>>> Thanks, >>>> Anton >>>> >>>> On 10/24/2016 09:06 PM, Hong wrote: >>>> >>>> Anton: >>>>> >>>>> If replacing superlu_dist with mumps, does your code work? >>>>> >>>>> yes >>>>> >>>> >>>> You may use mumps in your code, or tests different options for >>>> superlu_dist: >>>> >>>> -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None) >>>> -mat_superlu_dist_rowperm <LargeDiag> Row permutation (choose one of) >>>> LargeDiag NATURAL (None) >>>> -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column permutation >>>> (choose one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS >>>> (None) >>>> -mat_superlu_dist_replacetinypivot: <FALSE> Replace tiny pivots >>>> (None) >>>> -mat_superlu_dist_parsymbfact: <FALSE> Parallel symbolic >>>> factorization (None) >>>> -mat_superlu_dist_fact <SamePattern> Sparsity pattern for repeated >>>> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm >>>> (None) >>>> >>>> The options inside <> are defaults. You may try others. This might help >>>> narrow down the bug. >>>> >>>> Hong >>>> >>>>> >>>>> Hong >>>>>> >>>>>> On 10/24/2016 05:47 PM, Hong wrote: >>>>>> >>>>>> Barry, >>>>>> Your change indeed fixed the error of his testing code. >>>>>> As Satish tested, on your branch, ex16 runs smooth. >>>>>> >>>>>> I do not understand why on maint or master branch, ex16 creases >>>>>> inside superlu_dist, but not with mumps. >>>>>> >>>>>> >>>>>> I also confirm that ex16 runs fine with latest fix, but unfortunately >>>>>> not my code. >>>>>> >>>>>> This is something to be expected, since my code preallocates once in >>>>>> the beginning. So there is no way it can be affected by multiple >>>>>> preallocations. Subsequently I only do matrix assembly, that makes sure >>>>>> structure doesn't change (set to get error otherwise). >>>>>> >>>>>> Summary: we don't have a simple test code to debug superlu issue >>>>>> anymore. >>>>>> >>>>>> Anton >>>>>> >>>>>> Hong >>>>>> >>>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay <[email protected]> >>>>>> wrote: >>>>>> >>>>>>> On Mon, 24 Oct 2016, Barry Smith wrote: >>>>>>> >>>>>>> > >>>>>>> > > [Or perhaps Hong is using a different test code and is observing >>>>>>> bugs >>>>>>> > > with superlu_dist interface..] >>>>>>> > >>>>>>> > She states that her test does a NEW MatCreate() for each matrix >>>>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was >>>>>>> only related to using the SAME matrix from one MatLoad() in another >>>>>>> MatLoad(). >>>>>>> >>>>>>> Ah - ok.. Sorry - wasn't thinking clearly :( >>>>>>> >>>>>>> Satish >>>>>>> >>>>>> >>>>>> >>>>>> >>>>> >>>>> >>>> >>>> >>> >> >
