Thanks for your advice. Using the different boomeramg options helps to trade off between speed and memory.
Right now, I've distribute the matrix to 2 processors. However, when solving, the parallel version takes a longer time with more iteration counts. I enabled -ksp_monitor and they seems to converge at a different rate, although using the same options. Is there a reason for this? The matrix formed by the serial and parallel are the same. Jerome On Thu, Jan 15, 2009 at 5:29 PM, Jed Brown <jed at 59a2.org> wrote: > On Wed, Jan 14, 2009 at 20:04, jerome ho <jerome.snho at gmail.com> wrote: >> boomeramg+minres: 388MB in 1min (8 iterations) >> icc+cg: 165MB in 30min (>5000 iterations) >> bjacobi+cg: 201MB in 50min (>5000 iterations) > > Note that in serial, bjacobi is just whatever -sub_pc_type is (ilu by > default). In parallel, it's always worth trying -pc_type asm as an > alternative to bjacobi. You can frequently make the incomplete > factorization stronger by using multiple levels (-pc_factor_levels N), > but it will use more memory. It looks like multigrid works well for > your problem so it will likely be very hard for a traditional method > to compete. > > To reduce memory usage in BoomerAMG, try these options > > -pc_hypre_boomeramg_truncfactor <0>: Truncation factor for > interpolation (0=no truncation) (None) > -pc_hypre_boomeramg_P_max <0>: Max elements per row for > interpolation operator ( 0=unlimited ) (None) > -pc_hypre_boomeramg_agg_nl <0>: Number of levels of aggressive > coarsening (None) > -pc_hypre_boomeramg_agg_num_paths <1>: Number of paths for > aggressive coarsening (None) > -pc_hypre_boomeramg_strong_threshold <0.25>: Threshold for being > strongly connected (None) > > For 3D problems, the manual suggests setting strong_threshold to 0.5. > > It's also worth trying ML, especially for vector problems. > > Jed >
