It looks like you are using the default preconditioner of block Jacobi with 
one block per process and ILU on each block.  

  "Normally" with block Jacobi as you use more blocks the convergence rate gets 
worse  in a reasonably monotonic way. But if the particular decomposition of 
the blocks "drops" important parts of the operator the convergence can be very 
different with slightly changes in the blocks. It looks like your problem has 
this kind of structure, in spades; or there is something wrong with your 
parallel construction of the matrix entries resulting in very different (and 
wrong) linear systems for different number of processors.

  I suggest you run the following experiment; run with ONE process but use 
-pc_type bjacobi -sub_pc_type ilu -pc_bjacobi_blocks <blocks> where you use for 
<blocks> 1 up to 24 and then get the number of iterations needed for each 
(don't worry about the time it takes, this is done for understanding of the 
convergence). Send the table of 

Blocks      Iterations
1                 a1
2                 a2
....
24               a24 

and from this you'll be able to see if your matrix does indeed have the special 
"sensitivity" to the blocks. Till then no speculation.

  Barry



> On Jul 1, 2015, at 4:29 PM, Jose A. Abell M. <[email protected]> wrote:
> 
> Dear PETSc-users,
> 
> I'm running the same dummy simulation (which involves solving a 10000 x 10000 
> linear system of equations 10 times) using 12 and 18 processors on a SMP 
> machine. With 18 processors I spend 3.5s on PETsc calls, with 12 I spend 
> ~260s. 
> 
> Again, the matrix is the same, the only difference is the number of 
> processors, which would affect the ordering of the matrix rows and columns as 
> the domain gets partitioned differently.
> 
> When looking at the performance log I see:
> 
> For 12 processors:
> 
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                          
>    --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatSolve          103340 1.0 8.6910e+01 1.2 7.54e+10 1.0 0.0e+00 0.0e+00 
> 0.0e+00 31 34  0  0  0  31 34  0  0  0 10113
> 
> and for 18 processors:
> 
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flops                          
>    --- Global ---  --- Stage ---   Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len 
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatSolve             108 1.0 6.9855e-02 1.4 5.25e+07 1.1 0.0e+00 0.0e+00 
> 0.0e+00  2 32  0  0  0   2 32  0  0  0 13136
> 
> 
> 
> The MatSolve count is soo large in the slow case. It is similar for other 
> operations like MatMult and all the vector-oriented operations. I've included 
> the complete logs for these cases.
> 
> What is the main driver behind the number of calls to these functions being 
> so high? Is it only the matrix ordering to blame or maybe there is something 
> else I'm missing?
> 
> Regards and thanks!
> 
> 
> --
> 
> José Abell 
> PhD Candidate
> Computational Geomechanics Group
> Dept. of Civil and Environmental Engineering
> UC Davis
> 
> <petsc_log_slow.txt><petsc_log_fast.txt>

Reply via email to