It looks like you are using the default preconditioner of block Jacobi with one block per process and ILU on each block.
"Normally" with block Jacobi as you use more blocks the convergence rate gets worse in a reasonably monotonic way. But if the particular decomposition of the blocks "drops" important parts of the operator the convergence can be very different with slightly changes in the blocks. It looks like your problem has this kind of structure, in spades; or there is something wrong with your parallel construction of the matrix entries resulting in very different (and wrong) linear systems for different number of processors. I suggest you run the following experiment; run with ONE process but use -pc_type bjacobi -sub_pc_type ilu -pc_bjacobi_blocks <blocks> where you use for <blocks> 1 up to 24 and then get the number of iterations needed for each (don't worry about the time it takes, this is done for understanding of the convergence). Send the table of Blocks Iterations 1 a1 2 a2 .... 24 a24 and from this you'll be able to see if your matrix does indeed have the special "sensitivity" to the blocks. Till then no speculation. Barry > On Jul 1, 2015, at 4:29 PM, Jose A. Abell M. <[email protected]> wrote: > > Dear PETSc-users, > > I'm running the same dummy simulation (which involves solving a 10000 x 10000 > linear system of equations 10 times) using 12 and 18 processors on a SMP > machine. With 18 processors I spend 3.5s on PETsc calls, with 12 I spend > ~260s. > > Again, the matrix is the same, the only difference is the number of > processors, which would affect the ordering of the matrix rows and columns as > the domain gets partitioned differently. > > When looking at the performance log I see: > > For 12 processors: > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > MatSolve 103340 1.0 8.6910e+01 1.2 7.54e+10 1.0 0.0e+00 0.0e+00 > 0.0e+00 31 34 0 0 0 31 34 0 0 0 10113 > > and for 18 processors: > > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flops > --- Global --- --- Stage --- Total > Max Ratio Max Ratio Max Ratio Mess Avg len > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > MatSolve 108 1.0 6.9855e-02 1.4 5.25e+07 1.1 0.0e+00 0.0e+00 > 0.0e+00 2 32 0 0 0 2 32 0 0 0 13136 > > > > The MatSolve count is soo large in the slow case. It is similar for other > operations like MatMult and all the vector-oriented operations. I've included > the complete logs for these cases. > > What is the main driver behind the number of calls to these functions being > so high? Is it only the matrix ordering to blame or maybe there is something > else I'm missing? > > Regards and thanks! > > > -- > > José Abell > PhD Candidate > Computational Geomechanics Group > Dept. of Civil and Environmental Engineering > UC Davis > > <petsc_log_slow.txt><petsc_log_fast.txt>
