[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-20 Thread tri...@vision.ee.ethz.ch
Hi Jed and Matt,

thanks a lot for your help and the interesting discussion.

Kathrin


Quoting Jed Brown jed at 59a2.org:

 On Mon, 19 Apr 2010 07:23:01 -0500, Matthew Knepley  
 knepley at gmail.com wrote:
 So, to see if I understand correctly. You are saying that you can get
 away with more approximate solves if you do not do full reduction? I
 know the theory for the case of Stokes, but can you prove this in a
 general sense?

 The theory is relatively general (as much as preconditioned GMRES is) if
 you iterate in the full space with either block-diagonal or
 block-triangular preconditioners.  Note that this formulation *never*
 involves explicit application of a Schur complement.  Sometimes I get
 better convergence with one subcycle on the Schur complement with a very
 approximate inner solve (FGMRES outer).  I'm not sure if Dave sees this,
 he seems to like doing a couple subcycles in multigrid smoothers.

 The folks doing Q1-Q1 with ML are not doing *anything* with a Schur
 complement (approxmate or otherwise).  They just coarsen on the full
 indefinite system and use ASM (overlap 0 or 1) with ILU to precondition
 the coupled system.  This makes a certain amount of sense because for
 those stabilized formulations, this is similar in spirit to a Vanka
 smoother (block SOR is a more precise analogue).

 This sounds like the black magic I expect :)

 Yeah, this involves some sort of very local solve to produce the
 aggregates and interpolations that are not transposes of each other (if
 I understood Ray and Eric correctly).

 I still maintain that aggregation is a really crappy way to generate
 coarse systems, especially for mixed elements. We should be generating
 coarse systems geometrically, and then using a nice (maybe Black-Box)
 framework for calculating good projectors.

 This whole framework doesn't work for mixed discretizations.

 Jed






[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-19 Thread tri...@vision.ee.ethz.ch
Hi Jed,


 ML works now using, e.g., -mg_coarse_redundant_pc_factor_shift_type
 POSITIVE_DEFINITE. However, it converges very slowly using the default
 REDUNDANT for the coarse solve.

 Converges slowly or the coarse-level solve is expensive?

hm, rather converges slowly. Using ML inside a preconditioner for  
the Schur complement system, the overall outer system preconditioned  
with the approximated Schur complement preconditioner converges  
slowly, if you understand what I mean.

My particular problem is that the convergence rate depends strongly on  
the number of processors. In case of one processor, using ML for  
preconditioning the deeply inner system the outer system converges in,  
e.g., 39 iterations. In case of np=10, however, it needs 69 iterations.

This number of iterations is independent on the number of processes  
using HYPRE (at least if np80), but the latter is (applied to this  
inner system, not generally) slower and scales very badly. That's why  
I would like to use ML.

Thinking about it, all this shouldn't have to do anything with the  
choice of the direct solver of the coarse system inside ML (mumps or  
petsc-own), should it? The direct solver solves completely,  
independently from the number of processes, and shouldn't have an  
influence on the effectiveness of ML, or am I wrong?

 I suggest
 starting with

 -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package mumps

 or varying parameters in ML to see if you can make the coarse level
 problem smaller without hurting convergence rate.  You can do
 semi-redundant solves if you scale processor counts beyond what MUMPS
 works well with.

Thanks. Thus, MUMPS is supposed to be the usually fastest parallel  
direct solver?

 Depending on what problem you are solving, ML could be producing a
 (nearly) singular coarse level operator in which case you can expect
 very confusing and inconsistent behavior.

Could it also be the reason for the decreased convergence rate when  
increasing from 1 to 10 processors? Even if the equation system  
remains the same?


Thanks a lot,

Kathrin




[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-19 Thread Matthew Knepley
On Mon, Apr 19, 2010 at 6:29 AM, tribur at vision.ee.ethz.ch wrote:

 Hi Jed,


  ML works now using, e.g., -mg_coarse_redundant_pc_factor_shift_type
 POSITIVE_DEFINITE. However, it converges very slowly using the default
 REDUNDANT for the coarse solve.


 Converges slowly or the coarse-level solve is expensive?


 hm, rather converges slowly. Using ML inside a preconditioner for the
 Schur complement system, the overall outer system preconditioned with the
 approximated Schur complement preconditioner converges slowly, if you
 understand what I mean.

 My particular problem is that the convergence rate depends strongly on the
 number of processors. In case of one processor, using ML for preconditioning
 the deeply inner system the outer system converges in, e.g., 39 iterations.
 In case of np=10, however, it needs 69 iterations.


For Schur complement methods, the inner system usually has to be solved very
accurately.
Are you accelerating a Krylov method for A^{-1}, or just using ML itself? I
would expect for
the same linear system tolerance, you get identical convergence for the same
system,
independent of the number of processors.

   Matt


 This number of iterations is independent on the number of processes using
 HYPRE (at least if np80), but the latter is (applied to this inner system,
 not generally) slower and scales very badly. That's why I would like to use
 ML.

 Thinking about it, all this shouldn't have to do anything with the choice
 of the direct solver of the coarse system inside ML (mumps or petsc-own),
 should it? The direct solver solves completely, independently from the
 number of processes, and shouldn't have an influence on the effectiveness of
 ML, or am I wrong?

  I suggest
 starting with

 -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package mumps

 or varying parameters in ML to see if you can make the coarse level
 problem smaller without hurting convergence rate.  You can do
 semi-redundant solves if you scale processor counts beyond what MUMPS
 works well with.


 Thanks. Thus, MUMPS is supposed to be the usually fastest parallel direct
 solver?

  Depending on what problem you are solving, ML could be producing a
 (nearly) singular coarse level operator in which case you can expect
 very confusing and inconsistent behavior.


 Could it also be the reason for the decreased convergence rate when
 increasing from 1 to 10 processors? Even if the equation system remains the
 same?


 Thanks a lot,

 Kathrin





-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20100419/7a912612/attachment.htm


[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-19 Thread Jed Brown
On Mon, 19 Apr 2010 13:29:40 +0200, tribur at vision.ee.ethz.ch wrote:
  ML works now using, e.g., -mg_coarse_redundant_pc_factor_shift_type
  POSITIVE_DEFINITE. However, it converges very slowly using the default
  REDUNDANT for the coarse solve.
 
  Converges slowly or the coarse-level solve is expensive?
 
 hm, rather converges slowly. Using ML inside a preconditioner for  
 the Schur complement system, the overall outer system preconditioned  
 with the approximated Schur complement preconditioner converges  
 slowly, if you understand what I mean.

Sure, but the redundant coarse solve is a direct solve.  It may be that
the shift (to make it nonsingular) makes it ineffective (and thus outer
system converges slowly), but this is the same behavior you would get
with a non-redundant solve.  I.e. it is the shift that causes the
problem, not the REDUNDANT.

I don't know which flavor of Schur complement iteration you are
currently using.  It is true that pure Schur complement reduction
requires high-accuracy inner solves, you may of course get away with
inexact inner solves if it is part of a full-space iteration.  It's
worth comparing the number of iterations required to solve the inner
(advection-diffusion) block to a given tolerance in parallel and serial.

 My particular problem is that the convergence rate depends strongly on  
 the number of processors. In case of one processor, using ML for  
 preconditioning the deeply inner system the outer system converges in,  
 e.g., 39 iterations. In case of np=10, however, it needs 69 iterations.

ML with defaults has a significant difference between serial and
parallel.  Usually the scalability is acceptable from 2 processors up,
but the difference between one and two can be quite significant.  You
can make it stronger, e.g. with

  -mg_levels_ksp_type gmres -mg_levels_ksp_max_it 1 -mg_levels_pc_type asm 
-mg_levels_sub_pc_type ilu

 This number of iterations is independent on the number of processes  
 using HYPRE (at least if np80), but the latter is (applied to this  
 inner system, not generally) slower and scales very badly. That's why  
 I would like to use ML.
 
 Thinking about it, all this shouldn't have to do anything with the  
 choice of the direct solver of the coarse system inside ML (mumps or  
 petsc-own), should it? The direct solver solves completely,  
 independently from the number of processes, and shouldn't have an  
 influence on the effectiveness of ML, or am I wrong?

A shift makes it solve a somewhat different system.  How different that
perturbed system is depends on the problem and the size of the shift.
MUMPS has more sophisticated ordering/pivoting schemes so you should use
it if the coarse system demands it (you can also try using different
ordering schemes in PETSc,
-mg_coarse_redundant_pc_factor_mat_ordering_type).

 Thanks. Thus, MUMPS is supposed to be the usually fastest parallel  
 direct solver?

Usually.

  Depending on what problem you are solving, ML could be producing a
  (nearly) singular coarse level operator in which case you can expect
  very confusing and inconsistent behavior.
 
 Could it also be the reason for the decreased convergence rate when  
 increasing from 1 to 10 processors? Even if the equation system  
 remains the same?

ML's aggregates change somewhat in parallel (I don't know how much, I
haven't investigated precisely what is different) and the smoothers are
all different.  With a normal discretization of an elliptic system, it
would seem surprising for ML to produce nearly singular coarse-level
operators, in parallel or otherwise.  But snes/tutorials/examples/ex48
exhibits pretty bad ML behavior (the coarse-level isn't singular, but
the parallel aggregates with default smoothers don't converge despite
being an SPD system, ML is informed of translations but not rigid body
modes, I haven't investigated ML's troublesome modes for this problem so
I don't know if they are rigid body modes or something else).

Jed


[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-19 Thread Jed Brown
On Mon, 19 Apr 2010 06:34:08 -0500, Matthew Knepley knepley at gmail.com 
wrote:
 For Schur complement methods, the inner system usually has to be
 solved very accurately.  Are you accelerating a Krylov method for
 A^{-1}, or just using ML itself? I would expect for the same linear
 system tolerance, you get identical convergence for the same system,
 independent of the number of processors.

Matt, run ex48 with ML in parallel and serial, the aggregates are quite
different and the parallel case doesn't converge with SOR.  Also, from
talking with Ray, Eric Cyr, and John Shadid two weeks ago, they are
currently using ML on coupled Navier-Stokes systems and usually beating
block factorization (i.e. full-space iterations with
approximate-commutator Schur-complement preconditioners (PCD or LSC
variants) which are beating full Schur-complement reduction).  They are
using Q1-Q1 with PSPG or Bochev stabilization and SUPG for advection.

The trouble is that this method occasionally runs into problems where
convergence completely falls apart, despite not having extreme parameter
choices.  ML has an option energy minimization which they are using
(PETSc's interface doesn't currently support this, I'll add it if
someone doesn't beat me to it) which is apparently crucial for
generating reasonable coarse levels for these systems.

They always coarsen all the degrees of freedom together, this is not
possible with mixed finite element spaces, so you have to trade quality
answers produced by a stable approximation along with necessity to make
subdomain and coarse-level problems compatible with inf-sup against the
wiggle-room you get with stabilized non-mixed discretizations but with
possible artifacts and significant divergence error.

Jed


[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-19 Thread Matthew Knepley
On Mon, Apr 19, 2010 at 7:12 AM, Jed Brown jed at 59a2.org wrote:

 On Mon, 19 Apr 2010 06:34:08 -0500, Matthew Knepley knepley at gmail.com
 wrote:
  For Schur complement methods, the inner system usually has to be
  solved very accurately.  Are you accelerating a Krylov method for
  A^{-1}, or just using ML itself? I would expect for the same linear
  system tolerance, you get identical convergence for the same system,
  independent of the number of processors.

 Matt, run ex48 with ML in parallel and serial, the aggregates are quite
 different and the parallel case doesn't converge with SOR.  Also, from
 talking with Ray, Eric Cyr, and John Shadid two weeks ago, they are
 currently using ML on coupled Navier-Stokes systems and usually beating
 block factorization (i.e. full-space iterations with
 approximate-commutator Schur-complement preconditioners (PCD or LSC
 variants) which are beating full Schur-complement reduction).  They are
 using Q1-Q1 with PSPG or Bochev stabilization and SUPG for advection.


So, to see if I understand correctly. You are saying that you can get away
with
more approximate solves if you do not do full reduction? I know the theory
for
the case of Stokes, but can you prove this in a general sense?


 The trouble is that this method occasionally runs into problems where
 convergence completely falls apart, despite not having extreme parameter
 choices.  ML has an option energy minimization which they are using
 (PETSc's interface doesn't currently support this, I'll add it if
 someone doesn't beat me to it) which is apparently crucial for
 generating reasonable coarse levels for these systems.


This sounds like the black magic I expect :)


 They always coarsen all the degrees of freedom together, this is not
 possible with mixed finite element spaces, so you have to trade quality
 answers produced by a stable approximation along with necessity to make
 subdomain and coarse-level problems compatible with inf-sup against the
 wiggle-room you get with stabilized non-mixed discretizations but with
 possible artifacts and significant divergence error.


I still maintain that aggregation is a really crappy way to generate coarse
systems,
especially for mixed elements. We should be generating coarse systems
geometrically,
and then using a nice (maybe Black-Box) framework for calculating good
projectors.

   Matt



 Jed




-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20100419/4e9a7ea5/attachment-0001.htm


[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-19 Thread Jed Brown
On Mon, 19 Apr 2010 07:23:01 -0500, Matthew Knepley knepley at gmail.com 
wrote:
 So, to see if I understand correctly. You are saying that you can get
 away with more approximate solves if you do not do full reduction? I
 know the theory for the case of Stokes, but can you prove this in a
 general sense?

The theory is relatively general (as much as preconditioned GMRES is) if
you iterate in the full space with either block-diagonal or
block-triangular preconditioners.  Note that this formulation *never*
involves explicit application of a Schur complement.  Sometimes I get
better convergence with one subcycle on the Schur complement with a very
approximate inner solve (FGMRES outer).  I'm not sure if Dave sees this,
he seems to like doing a couple subcycles in multigrid smoothers.

The folks doing Q1-Q1 with ML are not doing *anything* with a Schur
complement (approxmate or otherwise).  They just coarsen on the full
indefinite system and use ASM (overlap 0 or 1) with ILU to precondition
the coupled system.  This makes a certain amount of sense because for
those stabilized formulations, this is similar in spirit to a Vanka
smoother (block SOR is a more precise analogue).

 This sounds like the black magic I expect :)

Yeah, this involves some sort of very local solve to produce the
aggregates and interpolations that are not transposes of each other (if
I understood Ray and Eric correctly).

 I still maintain that aggregation is a really crappy way to generate
 coarse systems, especially for mixed elements. We should be generating
 coarse systems geometrically, and then using a nice (maybe Black-Box)
 framework for calculating good projectors.

This whole framework doesn't work for mixed discretizations.

Jed


[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-16 Thread tri...@vision.ee.ethz.ch
Dear Barry and Matt,

thanks for your helpful response.
ML works now using, e.g., -mg_coarse_redundant_pc_factor_shift_type  
POSITIVE_DEFINITE. However, it converges very slowly using the default  
REDUNDANT for the coarse solve. On 10 processors, e.g., even bjacobi  
plus -mg_coarse_ksp_max_it 10 works better. What solver do you  
recommend for the coarse solve? Maybe superlu?

Best regards,

Kathrin


Quoting Barry Smith bsmith at mcs.anl.gov:


 -mg_coarse_pc_factor_shift_nonzero since it is the coarse level of   
 the multigrid that is producing the zero pivot.

 Barry

 On Apr 13, 2010, at 8:51 AM, Matthew Knepley wrote:

 On Tue, Apr 13, 2010 at 2:49 PM, tribur at vision.ee.ethz.ch wrote:
 Hi,

 using ML I got the error

 [0]PETSC ERROR: Detected zero pivot in LU factorization

 As recommended at  
 http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html ,  
 I tried -pc_factor_shift_nonzero but it doesn't have the desired   
 effect using ML.

 How do I have to formulate the command line option? What does -  
 [level]_pc_factor_shift_nonzero mean? What other parallel   
 preconditioner could I try besides Hypre/Boomeramg or ML?

 This means the MG level, like 2. You can see all available options   
 using -help.

 Matt

 Thanks in advance for your precious help,

 Kathrin





 -- 
 What most experimenters take for granted before they begin their   
 experiments is infinitely more interesting than any results to  
 which  their experiments lead.
 -- Norbert Wiener







[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-16 Thread Jed Brown
On Fri, 16 Apr 2010 13:51:13 +0200, tribur at vision.ee.ethz.ch wrote:
 ML works now using, e.g., -mg_coarse_redundant_pc_factor_shift_type  
 POSITIVE_DEFINITE. However, it converges very slowly using the default  
 REDUNDANT for the coarse solve.

Converges slowly or the coarse-level solve is expensive?  I suggest
starting with

  -mg_coarse_pc_type lu -mg_coarse_pc_factor_mat_solver_package mumps

or varying parameters in ML to see if you can make the coarse level
problem smaller without hurting convergence rate.  You can do
semi-redundant solves if you scale processor counts beyond what MUMPS
works well with.

Depending on what problem you are solving, ML could be producing a
(nearly) singular coarse level operator in which case you can expect
very confusing and inconsistent behavior.

Jed


[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-13 Thread tri...@vision.ee.ethz.ch
Hi,

using ML I got the error

[0]PETSC ERROR: Detected zero pivot in LU factorization

As recommended at  
http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html, I  
tried -pc_factor_shift_nonzero but it doesn't have the desired effect  
using ML.

How do I have to formulate the command line option? What does  
-[level]_pc_factor_shift_nonzero mean? What other parallel  
preconditioner could I try besides Hypre/Boomeramg or ML?

Thanks in advance for your precious help,

Kathrin




[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-13 Thread Matthew Knepley
On Tue, Apr 13, 2010 at 2:49 PM, tribur at vision.ee.ethz.ch wrote:

 Hi,

 using ML I got the error

 [0]PETSC ERROR: Detected zero pivot in LU factorization

 As recommended at
 http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html,
 I tried -pc_factor_shift_nonzero but it doesn't have the desired effect
 using ML.

 How do I have to formulate the command line option? What does
 -[level]_pc_factor_shift_nonzero mean? What other parallel preconditioner
 could I try besides Hypre/Boomeramg or ML?


This means the MG level, like 2. You can see all available options using
-help.

  Matt


 Thanks in advance for your precious help,

 Kathrin





-- 
What most experimenters take for granted before they begin their experiments
is infinitely more interesting than any results to which their experiments
lead.
-- Norbert Wiener
-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20100413/1012779e/attachment.htm


[petsc-users] ML and -pc_factor_shift_nonzero

2010-04-13 Thread Barry Smith

   -mg_coarse_pc_factor_shift_nonzero since it is the coarse level of  
the multigrid that is producing the zero pivot.

Barry

On Apr 13, 2010, at 8:51 AM, Matthew Knepley wrote:

 On Tue, Apr 13, 2010 at 2:49 PM, tribur at vision.ee.ethz.ch wrote:
 Hi,

 using ML I got the error

 [0]PETSC ERROR: Detected zero pivot in LU factorization

 As recommended at 
 http://www.mcs.anl.gov/petsc/petsc-as/documentation/troubleshooting.html 
 , I tried -pc_factor_shift_nonzero but it doesn't have the desired  
 effect using ML.

 How do I have to formulate the command line option? What does - 
 [level]_pc_factor_shift_nonzero mean? What other parallel  
 preconditioner could I try besides Hypre/Boomeramg or ML?

 This means the MG level, like 2. You can see all available options  
 using -help.

   Matt

 Thanks in advance for your precious help,

 Kathrin





 -- 
 What most experimenters take for granted before they begin their  
 experiments is infinitely more interesting than any results to which  
 their experiments lead.
 -- Norbert Wiener

-- next part --
An HTML attachment was scrubbed...
URL: 
http://lists.mcs.anl.gov/pipermail/petsc-users/attachments/20100413/281ecff3/attachment.htm