Re: [petsc-users] SuperLU_dist issue in 3.7.4

Xiaoye S. Li Wed, 26 Oct 2016 13:26:52 -0700

Some graph preprocessing steps can be skipped ONLY IF a previous
factorization was done, and the information can be reused (AS INPUT) to the
new factorization.


In general, the driver routine SRC/pdgssvx.c() performs the LU
factorization of the following (preprocessed) matrix:
 Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U

The default is to do LU from scratch, including all the steps to compute
equilibration (R, C), pivot ordering (Pr), and sparsity ordering (Pc).

-- The default should be set as options.Fact = DOFACT.

-- When you set options.Fact = SamePattern, the sparsity ordering step is
skipped, but you need to input Pc which was obtained from a previous
factorization.

-- When you set options.Fact = SamePattern_SameRowPerm, both sparsity
reordering and pivoting ordering steps are skipped, but you need to input
both Pr and Pc.

Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details, regarding
which data structures should be inputs and which are outputs.  The Users
Guide also explains this.

In EXAMPLE/ directory, I have various examples of these usage situations,
see EXAMPLE/README.

I am a little puzzled why in PETSc, the default is set to SamePattern ??

Sherry


On Tue, Oct 25, 2016 at 9:18 AM, Hong <[email protected]> wrote:

> Sherry,
>
> We set '-mat_superlu_dist_fact SamePattern'  as default in
> petsc/superlu_dist on 12/6/15 (see attached email below).
>
> However, Anton must set 'SamePattern_SameRowPerm' to avoid crash in his
> code. Checking
> http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_
> code_html/pzgssvx___a_bglobal_8c.html
> I see detailed description on using SamePattern_SameRowPerm, which
> requires more from user than SamePattern. I guess these flags are used
> for efficiency. The library sets a default, then have users to switch for
> their own applications. The default setting should not cause crash. If
> crash occurs, give a meaningful error message would be help.
>
> Do you have suggestion how should we set default in petsc for this flag?
>
> Hong
>
> -------------------
> Hong <[email protected]>
> 12/7/15
> to Danyang, petsc-maint, PETSc, Xiaoye
> Danyang :
>
> Adding '-mat_superlu_dist_fact SamePattern' fixed the problem. Below is
> how I figured it out.
>
> 1. Reading ex52f.F, I see '-superlu_default' =
> '-pc_factor_mat_solver_package superlu_dist', the later enables runtime
> options for other packages. I use superlu_dist-4.2 and superlu-4.1 for the
> tests below.
> ...
> 5.
> Using a_flow_check_1.bin, I am able to reproduce the error you reported:
> all packages give correct results except superlu_dist:
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> superlu_dist
> Norm of error  2.5970E-12 iterations     1
>  -->Test for matrix          168
> Norm of error  1.3936E-01 iterations    34
>  -->Test for matrix          169
>
> I guess the error might come from reuse of matrix factor. Replacing default
> -mat_superlu_dist_fact <SamePattern_SameRowPerm> with
> -mat_superlu_dist_fact SamePattern, I get
>
> ./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
> matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices flow_check
> -loop_folder matrix_and_rhs_bin -pc_type lu -pc_factor_mat_solver_package
> superlu_dist -mat_superlu_dist_fact SamePattern
>
> Norm of error  2.5970E-12 iterations     1
>  -->Test for matrix          168
> ...
> Sherry may tell you why SamePattern_SameRowPerm cause the difference here.
> Best on the above experiments, I would set following as default
> '-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
> '-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist interface.
>
> Hong
>
> On Tue, Oct 25, 2016 at 10:38 AM, Hong <[email protected]> wrote:
>
>> Anton,
>> I guess, when you reuse matrix and its symbolic factor with updated
>> numerical values, superlu_dist requires this option. I'm cc'ing Sherry to
>> confirm it.
>>
>> I'll check petsc/superlu-dist interface to set this flag for this case.
>>
>> Hong
>>
>>
>> On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov <[email protected]> wrote:
>>
>>> Hong,
>>>
>>> I get all the problems gone and valgrind-clean output if I specify this:
>>>
>>> -mat_superlu_dist_fact SamePattern_SameRowPerm
>>> What does SamePattern_SameRowPerm actually mean?
>>> Row permutations are for large diagonal, column permutations are for
>>> sparsity, right?
>>> Will it skip subsequent matrix permutations for large diagonal even if
>>> matrix values change significantly?
>>>
>>> Surprisingly everything works even with:
>>>
>>> -mat_superlu_dist_colperm PARMETIS
>>> -mat_superlu_dist_parsymbfact TRUE
>>>
>>> Thanks,
>>> Anton
>>>
>>> On 10/24/2016 09:06 PM, Hong wrote:
>>>
>>> Anton:
>>>>
>>>> If replacing superlu_dist with mumps, does your code work?
>>>>
>>>> yes
>>>>
>>>
>>> You may use mumps in your code, or tests different options for
>>> superlu_dist:
>>>
>>>   -mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None)
>>>   -mat_superlu_dist_rowperm <LargeDiag> Row permutation (choose one of)
>>> LargeDiag NATURAL (None)
>>>   -mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column permutation (choose
>>> one of) NATURAL MMD_AT_PLUS_A MMD_ATA METIS_AT_PLUS_A PARMETIS (None)
>>>   -mat_superlu_dist_replacetinypivot: <FALSE> Replace tiny pivots (None)
>>>   -mat_superlu_dist_parsymbfact: <FALSE> Parallel symbolic factorization
>>> (None)
>>>   -mat_superlu_dist_fact <SamePattern> Sparsity pattern for repeated
>>> matrix factorization (choose one of) SamePattern SamePattern_SameRowPerm
>>> (None)
>>>
>>> The options inside <> are defaults. You may try others. This might help
>>> narrow down the bug.
>>>
>>> Hong
>>>
>>>>
>>>> Hong
>>>>>
>>>>> On 10/24/2016 05:47 PM, Hong wrote:
>>>>>
>>>>> Barry,
>>>>> Your change indeed fixed the error of his testing code.
>>>>> As Satish tested, on your branch, ex16 runs smooth.
>>>>>
>>>>> I do not understand why on maint or master branch, ex16 creases inside
>>>>> superlu_dist, but not with mumps.
>>>>>
>>>>>
>>>>> I also confirm that ex16 runs fine with latest fix, but unfortunately
>>>>> not my code.
>>>>>
>>>>> This is something to be expected, since my code preallocates once in
>>>>> the beginning. So there is no way it can be affected by multiple
>>>>> preallocations. Subsequently I only do matrix assembly, that makes sure
>>>>> structure doesn't change (set to get error otherwise).
>>>>>
>>>>> Summary: we don't have a simple test code to debug superlu issue
>>>>> anymore.
>>>>>
>>>>> Anton
>>>>>
>>>>> Hong
>>>>>
>>>>> On Mon, Oct 24, 2016 at 9:34 AM, Satish Balay <[email protected]>
>>>>> wrote:
>>>>>
>>>>>> On Mon, 24 Oct 2016, Barry Smith wrote:
>>>>>>
>>>>>> >
>>>>>> > > [Or perhaps Hong is using a different test code and is observing
>>>>>> bugs
>>>>>> > > with superlu_dist interface..]
>>>>>> >
>>>>>> >    She states that her test does a NEW MatCreate() for each matrix
>>>>>> load (I cut and pasted it in the email I just sent). The bug I fixed was
>>>>>> only related to using the SAME matrix from one MatLoad() in another
>>>>>> MatLoad().
>>>>>>
>>>>>> Ah - ok.. Sorry - wasn't thinking clearly :(
>>>>>>
>>>>>> Satish
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>
>>>>
>>>
>>>
>>
>

Re: [petsc-users] SuperLU_dist issue in 3.7.4

Reply via email to