Hi all,

   I am trying to minimize the computing time to solve a large sparse matrix. 
The matrix dimension is with m=321 n=321 and p=321. I am trying to reduce the 
computing time from two directions: 1 finding a Pre-conditioner to reduce the 
number of iterations which reduces the time numerically, 2 requesting more 
cores.

----For the first method, I tried several methods:
 1 default KSP and PC,
 2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp  -ksp_pc_type jacobi, 
 3 -ksp_type lgmres  -ksp_gmres_restart 40 -ksp_lgmres_augment 10,
 4 -ksp_type lgmres  -ksp_gmres_restart 50 -ksp_lgmres_augment 10,
 5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm 
(PCASM)

The iterations and timing is like the following with 128 cores requested:
case# iter      timing (s)
1       1436        816  
2             3    12658
3       1069        669.64
4         872        768.12
5       927          513.14

It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment can help 
to reduce the iterations but not the timing (comparing case 3 and 4). Second, 
the PCASM helps a lot.  Although the second option is able to reduce 
iterations, the timing increases very much. Is it because more operations are 
needed in the PC?

My questions here are: 1. Which direction should I take to select   
-ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger restart with 
large augment is better or larger restart with smaller augment is better?  

----For the second method, I tried with -ksp_type lgmres -ksp_gmres_restart 40 
-ksp_lgmres_augment 10 -pc_type asm with different number of cores.   I found 
the speedup ratio increases slowly when  more than 32 to 64 cores are 
requested. I searched the milling list archives and found that I am very likely 
running into the memory bandwidth bottleneck. 
http://www.mail-archive.com/[email protected]/msg19152.html:

# of cores       iter     timing
    1                 923   19541.83
    4                 929     5897.06
    8                 932     4854.72
  16                 924     1494.33
  32                 924     1480.88
  64                 928       686.89
128                 927       627.33
256                 926       552.93

My question here is:    Is there any other PC can help on both reducing 
iterations and increasing scalability? Thanks. 



                                          

Reply via email to