Hi PETSc-Users, when solving an implicit equation with KSPSolve() in 3D (communication with 7-point-stencil) I experienced the following: Parallelization of the e.g. 64 x 64 x 64 domain on n cpus in the last direction (every cpu has a 64 x 64 x 64/n subdomain) leads to a parallel efficiency of approximately 90%, which is fine for us. Parallelization of the e.g. 64 x 64 x 64 domain on n cpus in more than one direction (every cpu has e.g. a 64 x 64/sqrt(n) x 64/sqrt(n) subdomain) leads to a parallel efficiency of approximately 10%, which is absolutely unusable.
Is this behavior generally true for this kind of solver? If so, why? If not: What did I do wrong most presumably? Has anybody made the same experience and/or could help me? Thanks in advance, Rolf
