On 10/27/2016 04:51 PM, Hong wrote:
Sherry,
Thanks for detailed explanation.
We use options.Fact = DOFACT as default for the first factorization.
When user reuses matrix factor, then we must provide a default,
either 'options.Fact = SamePattern' or 'SamePattern_SameRowPerm'.
We previously set 'SamePattern_SameRowPerm'. After a user reported
error, we switched to 'SamePattern' which causes problem for 2nd user.
Hong,
Setting Options.Fact = DOFACT for all factorizations is currently
impossible via PETSc interface.
The user is expected to choose some kind of reuse model.
If you could add it, I (and other users probably too) would really
appreciate that.
Thanks a lot,
Anton
I'll check our interface to see if we can add flag-checking for Pr and
Pc, then set default accordingly.
Hong
On Wed, Oct 26, 2016 at 3:23 PM, Xiaoye S. Li <[email protected]
<mailto:[email protected]>> wrote:
Some graph preprocessing steps can be skipped ONLY IF a previous
factorization was done, and the information can be reused (AS
INPUT) to the new factorization.
In general, the driver routine SRC/pdgssvx.c() performs the LU
factorization of the following (preprocessed) matrix:
Pc*Pr*diag(R)*A*diag(C)*Pc^T = L*U
The default is to do LU from scratch, including all the steps to
compute equilibration (R, C), pivot ordering (Pr), and sparsity
ordering (Pc).
-- The default should be set as options.Fact = DOFACT.
-- When you set options.Fact = SamePattern, the sparsity ordering
step is skipped, but you need to input Pc which was obtained from
a previous factorization.
-- When you set options.Fact = SamePattern_SameRowPerm, both
sparsity reordering and pivoting ordering steps are skipped, but
you need to input both Pr and Pc.
Please see Lines 258 - 307 comments in SRC/pdgssvx.c for details,
regarding which data structures should be inputs and which are
outputs. The Users Guide also explains this.
In EXAMPLE/ directory, I have various examples of these usage
situations, see EXAMPLE/README.
I am a little puzzled why in PETSc, the default is set to
SamePattern ??
Sherry
On Tue, Oct 25, 2016 at 9:18 AM, Hong <[email protected]
<mailto:[email protected]>> wrote:
Sherry,
We set '-mat_superlu_dist_fact SamePattern' as default in
petsc/superlu_dist on 12/6/15 (see attached email below).
However, Anton must set 'SamePattern_SameRowPerm' to avoid
crash in his code. Checking
http://crd-legacy.lbl.gov/~xiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html
<http://crd-legacy.lbl.gov/%7Exiaoye/SuperLU/superlu_dist_code_html/pzgssvx___a_bglobal_8c.html>
I see detailed description on using SamePattern_SameRowPerm,
which requires more from user than SamePattern. I guess these
flags are used for efficiency. The library sets a default,
then have users to switch for their own applications. The
default setting should not cause crash. If crash occurs, give
a meaningful error message would be help.
Do you have suggestion how should we set default in petsc for
this flag?
Hong
-------------------
Hong <[email protected] <mailto:[email protected]>>
12/7/15
to Danyang, petsc-maint, PETSc, Xiaoye
Danyang :
Adding '-mat_superlu_dist_fact SamePattern' fixed the problem.
Below is how I figured it out.
1. Reading ex52f.F, I see '-superlu_default' =
'-pc_factor_mat_solver_package superlu_dist', the later
enables runtime options for other packages. I use
superlu_dist-4.2 and superlu-4.1 for the tests below.
...
5.
Using a_flow_check_1.bin, I am able to reproduce the error you
reported: all packages give correct results except superlu_dist:
./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
-pc_factor_mat_solver_package superlu_dist
Norm of error 2.5970E-12 iterations 1
-->Test for matrix 168
Norm of error 1.3936E-01 iterations 34
-->Test for matrix 169
I guess the error might come from reuse of matrix factor.
Replacing default
-mat_superlu_dist_fact <SamePattern_SameRowPerm> with
-mat_superlu_dist_fact SamePattern, I get
./ex52f -f0 matrix_and_rhs_bin/a_flow_check_1.bin -rhs
matrix_and_rhs_bin/b_flow_check_168.bin -loop_matrices
flow_check -loop_folder matrix_and_rhs_bin -pc_type lu
-pc_factor_mat_solver_package superlu_dist
-mat_superlu_dist_fact SamePattern
Norm of error 2.5970E-12 iterations 1
-->Test for matrix 168
...
Sherry may tell you why SamePattern_SameRowPerm cause the
difference here.
Best on the above experiments, I would set following as default
'-mat_superlu_diagpivotthresh 0.0' in petsc/superlu interface.
'-mat_superlu_dist_fact SamePattern' in petsc/superlu_dist
interface.
Hong
On Tue, Oct 25, 2016 at 10:38 AM, Hong <[email protected]
<mailto:[email protected]>> wrote:
Anton,
I guess, when you reuse matrix and its symbolic factor
with updated numerical values, superlu_dist requires this
option. I'm cc'ing Sherry to confirm it.
I'll check petsc/superlu-dist interface to set this flag
for this case.
Hong
On Tue, Oct 25, 2016 at 8:20 AM, Anton Popov
<[email protected] <mailto:[email protected]>> wrote:
Hong,
I get all the problems gone and valgrind-clean output
if I specify this:
-mat_superlu_dist_fact SamePattern_SameRowPerm
What does SamePattern_SameRowPerm actually mean?
Row permutations are for large diagonal, column
permutations are for sparsity, right?
Will it skip subsequent matrix permutations for large
diagonal even if matrix values change significantly?
Surprisingly everything works even with:
-mat_superlu_dist_colperm PARMETIS
-mat_superlu_dist_parsymbfact TRUE
Thanks,
Anton
On 10/24/2016 09:06 PM, Hong wrote:
Anton:
If replacing superlu_dist with mumps, does your
code work?
yes
You may use mumps in your code, or tests different
options for superlu_dist:
-mat_superlu_dist_equil: <TRUE> Equilibrate matrix (None)
-mat_superlu_dist_rowperm <LargeDiag> Row permutation
(choose one of) LargeDiag NATURAL (None)
-mat_superlu_dist_colperm <METIS_AT_PLUS_A> Column
permutation (choose one of) NATURAL MMD_AT_PLUS_A
MMD_ATA METIS_AT_PLUS_A PARMETIS (None)
-mat_superlu_dist_replacetinypivot: <FALSE> Replace
tiny pivots (None)
-mat_superlu_dist_parsymbfact: <FALSE> Parallel
symbolic factorization (None)
-mat_superlu_dist_fact <SamePattern> Sparsity pattern
for repeated matrix factorization (choose one of)
SamePattern SamePattern_SameRowPerm (None)
The options inside <> are defaults. You may try
others. This might help narrow down the bug.
Hong
Hong
On 10/24/2016 05:47 PM, Hong wrote:
Barry,
Your change indeed fixed the error of his
testing code.
As Satish tested, on your branch, ex16 runs
smooth.
I do not understand why on maint or master
branch, ex16 creases inside superlu_dist,
but not with mumps.
I also confirm that ex16 runs fine with
latest fix, but unfortunately not my code.
This is something to be expected, since my
code preallocates once in the beginning. So
there is no way it can be affected by
multiple preallocations. Subsequently I only
do matrix assembly, that makes sure
structure doesn't change (set to get error
otherwise).
Summary: we don't have a simple test code to
debug superlu issue anymore.
Anton
Hong
On Mon, Oct 24, 2016 at 9:34 AM, Satish
Balay <[email protected]
<mailto:[email protected]>> wrote:
On Mon, 24 Oct 2016, Barry Smith wrote:
>
> > [Or perhaps Hong is using a
different test code and is observing bugs
> > with superlu_dist interface..]
>
> She states that her test does a
NEW MatCreate() for each matrix load (I
cut and pasted it in the email I just
sent). The bug I fixed was only related
to using the SAME matrix from one
MatLoad() in another MatLoad().
Ah - ok.. Sorry - wasn't thinking
clearly :(
Satish