On Tue, Jan 20, 2009 at 21:52, jerome ho <jerome.snho at gmail.com> wrote: > Right now, I've distribute the matrix to 2 processors. However, when > solving, the parallel version takes a longer time > with more iteration counts. > I enabled -ksp_monitor and they seems to converge at a different rate, > although using the same options. > Is there a reason for this?
Most preconditioners are not the same in parallel, including these implementations of AMG. At a minimum, the smoother is using a block Jacobi version of SOR or ILU. As you add processes beyond 2, the increase in iteration count is usually very minor. If you are using multiple cores, the per-core floating point performance will also be worse due to the memory bandwidth bottleneck. That may contribute to the poor parallel performance you are seeing. Jed
