On Wed, Jan 21, 2009 at 3:14 PM, Jed Brown <jed at 59a2.org> wrote: > Most preconditioners are not the same in parallel, including these > implementations of AMG. At a minimum, the smoother is using a block > Jacobi version of SOR or ILU. As you add processes beyond 2, the > increase in iteration count is usually very minor. > > If you are using multiple cores, the per-core floating point > performance will also be worse due to the memory bandwidth bottleneck. > That may contribute to the poor parallel performance you are seeing. >
Hi I'm getting strange results. In parallel (on 2 processors), the result doesn't to be able to converge further but appears to fluctuate between 1e-9 and 1e-8 (after 100+ iterations), when it solves in 8 iterations on a single machine. I decrease the rtol (from 1e-7) for the parallel simulation because I'm getting a 20% result difference. When I split into more (6) processors, it's reporting divergence. Am I doing something wrong here? Should I be switching to DMMG method instead? The matrix size is about 1mil x 1mil. Jerome
