On 10/27/2016 04:51 PM, Hong wrote:
Sherry,
Thanks for detailed explanation.
We use options.Fact = DOFACT as default for the first factorization. When user reuses matrix factor, then we must provide a default,
either 'options.Fact = SamePattern' or 'SamePattern_SameRowPerm'.
We previously set 'SamePattern_SameRowPerm'. After a user reported error, we switched to 'SamePattern' which causes problem for 2nd user.
Hong,

Setting Options.Fact = DOFACT for all factorizations is currently impossible via PETSc interface.
The user is expected to choose some kind of reuse model.
If you could add it, I (and other users probably too) would really appreciate that.

Thanks a lot,
Anton


I'll check our interface to see if we can add flag-checking for Pr and Pc, then set default accordingly.

Hong

On Wed, Oct 26, 2016 at 3:23 PM, Xiaoye S. Li <[email protected] <mailto:[email protected]>> wrote:

    Some graph preprocessing steps can be skipped ONLY IF a previous
    factorization was done, and the information can be reused (AS
    INPUT) to the new factorization.

    In general, the driver routine SRC/pdgssvx.c() performs the LU
    factorization of the following (preprocessed) matrix:
     Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U

    The default is to do LU from scratch, including all the steps to
    compute equilibration (R, C), pivot ordering (Pr), and sparsity
    ordering (Pc).

    -- The default should be set as options.Fact = DOFACT.

    -- When you set options.Fact = SamePattern, the sparsity ordering
    step is skipped, but you need to input Pc which was obtained from
    a previous factorization.

    -- When you set options.Fact = SamePattern_SameRowPerm, both
    sparsity reordering and pivoting ordering steps are skipped, but
    you need to input both Pr and Pc.

    Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details,
    regarding which data structures should be inputs and which are
    outputs.  The Users Guide also explains this.

    In EXAMPLE/ directory, I have various examples of these usage
    situations, see EXAMPLE/README.

    I am a little puzzled why in PETSc, the default is set to
    SamePattern ??

    Sherry


    On Tue, Oct 25, 2016 at 9:18 AM, Hong <[email protected]
    <mailto:[email protected]>> wrote:

        Sherry,

        We set '-mat_superlu_dist_fact SamePattern'  as default in
        petsc/superlu_dist on 12/6/15 (see attached email below).

        However, Anton must set 'SamePattern_SameRowPerm' to avoid
        crash in his code. Checking
        
http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html
        
<http://crd-legacy.lbl.gov/%7Exiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html>
        I see detailed description on using SamePattern_SameRowPerm,
        which requires more from user than SamePattern. I guess these
        flags are used for efficiency. The library sets a default,
        then have users to switch for their own applications. The
        default setting should not cause crash. If crash occurs, give
        a meaningful error message would be help.

        Do you have suggestion how should we set default in petsc for
        this flag?

        Hong

        -------------------


              Hong <[email protected] <mailto:[email protected]>>

                
        12/7/15
                
                
        to Danyang, petsc-maint, PETSc, Xiaoye

        Danyang :

        Adding '-mat_superlu_dist_fact SamePattern' fixed the problem.
        Below is how I figured it out.

        1. Reading ex52f.F, I see '-superlu_default' =
        '-pc_factor_mat_solver_package superlu_dist', the later
        enables runtime options for other packages. I use
        superlu_dist-4.2 and superlu-4.1 for the tests below.
        ...
        5.
        Using a_flow_check_1.bin, I am able to reproduce the error you
        reported: all packages give correct results except superlu_dist:
        ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
        matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
        flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
        -pc_factor_mat_solver_package superlu_dist
        Norm of error  2.5970E-12 iterations     1
         -->Test for matrix          168
        Norm of error  1.3936E-01 iterations    34
         -->Test for matrix          169

        I guess the error might come from reuse of matrix factor.
        Replacing default
        -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
        -mat_superlu_dist_fact SamePattern, I get

        ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
        matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
        flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
        -pc_factor_mat_solver_package superlu_dist
        -mat_superlu_dist_fact SamePattern

        Norm of error  2.5970E-12 iterations     1
         -->Test for matrix          168
        ...
        Sherry may tell you why SamePattern_SameRowPerm cause the
        difference here.
        Best on the above experiments, I would set following as default
        '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
        '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist
        interface.

        Hong

        On Tue, Oct 25, 2016 at 10:38 AM, Hong <[email protected]
        <mailto:[email protected]>> wrote:

            Anton,
            I guess, when you reuse matrix and its symbolic factor
            with updated numerical values, superlu_dist requires this
            option. I'm cc'ing Sherry to confirm it.

            I'll check petsc/superlu-dist interface to set this flag
            for this case.

            Hong


            On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov
            <[email protected] <mailto:[email protected]>> wrote:

                Hong,

                I get all the problems gone and valgrind-clean output
                if I specify this:

                -mat_superlu_dist_fact SamePattern_SameRowPerm

                What does SamePattern_SameRowPerm actually mean?
                Row permutations are for large diagonal, column
                permutations are for sparsity, right?
                Will it skip subsequent matrix permutations for large
                diagonal even if matrix values change significantly?

                Surprisingly everything works even with:

                -mat_superlu_dist_colperm PARMETIS
                -mat_superlu_dist_parsymbfact TRUE

                Thanks,
                Anton

                On 10/24/2016 09:06 PM, Hong wrote:
                Anton:

                    If replacing superlu_dist with mumps, does your
                    code work?
                    yes

                You may use mumps in your code, or tests different
                options for superlu_dist:

                -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None)
                -mat_superlu_dist_rowperm <LargeDiag> Row permutation
                (choose one of) LargeDiag NATURAL (None)
                -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column
                permutation (choose one of) NATURAL MMD_AT_PLUS_A
                MMD_ATA METIS_AT_PLUS_A PARMETIS (None)
                -mat_superlu_dist_replacetinypivot: <FALSE> Replace
                tiny pivots (None)
                -mat_superlu_dist_parsymbfact: <FALSE> Parallel
                symbolic factorization (None)
                -mat_superlu_dist_fact <SamePattern> Sparsity pattern
                for repeated matrix factorization (choose one of)
                SamePattern SamePattern_SameRowPerm (None)

                The options inside <> are defaults. You may try
                others. This might help narrow down the bug.

                Hong


                    Hong

                        On 10/24/2016 05:47 PM, Hong wrote:
                        Barry,
                        Your change indeed fixed the error of his
                        testing code.
                        As Satish tested, on your branch, ex16 runs
                        smooth.

                        I do not understand why on maint or master
                        branch, ex16 creases inside superlu_dist,
                        but not with mumps.


                        I also confirm that ex16 runs fine with
                        latest fix, but unfortunately not my code.

                        This is something to be expected, since my
                        code preallocates once in the beginning. So
                        there is no way it can be affected by
                        multiple preallocations. Subsequently I only
                        do matrix assembly, that makes sure
                        structure doesn't change (set to get error
                        otherwise).

                        Summary: we don't have a simple test code to
                        debug superlu issue anymore.

                        Anton

                        Hong

                        On Mon, Oct 24, 2016 at 9:34 AM, Satish
                        Balay <[email protected]
                        <mailto:[email protected]>> wrote:

                            On Mon, 24 Oct 2016, Barry Smith wrote:

                            >
                            > > [Or perhaps Hong is using a
                            different test code and is observing bugs
                            > > with superlu_dist interface..]
                            >
                            >    She states that her test does a
                            NEW MatCreate() for each matrix load (I
                            cut and pasted it in the email I just
                            sent). The bug I fixed was only related
                            to using the SAME matrix from one
                            MatLoad() in another MatLoad().

                            Ah - ok.. Sorry - wasn't thinking
                            clearly :(

                            Satish












Reply via email to