Hi all, I am trying to minimize the computing time to solve a large sparse matrix. The matrix dimension is with m=321 n=321 and p=321. I am trying to reduce the computing time from two directions: 1 finding a Pre-conditioner to reduce the number of iterations which reduces the time numerically, 2 requesting more cores.
----For the first method, I tried several methods: 1 default KSP and PC, 2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp -ksp_pc_type jacobi, 3 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10, 4 -ksp_type lgmres -ksp_gmres_restart 50 -ksp_lgmres_augment 10, 5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm (PCASM) The iterations and timing is like the following with 128 cores requested: case# iter timing (s) 1 1436 816 2 3 12658 3 1069 669.64 4 872 768.12 5 927 513.14 It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment can help to reduce the iterations but not the timing (comparing case 3 and 4). Second, the PCASM helps a lot. Although the second option is able to reduce iterations, the timing increases very much. Is it because more operations are needed in the PC? My questions here are: 1. Which direction should I take to select -ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger restart with large augment is better or larger restart with smaller augment is better? ----For the second method, I tried with -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type asm with different number of cores. I found the speedup ratio increases slowly when more than 32 to 64 cores are requested. I searched the milling list archives and found that I am very likely running into the memory bandwidth bottleneck. http://www.mail-archive.com/[email protected]/msg19152.html: # of cores iter timing 1 923 19541.83 4 929 5897.06 8 932 4854.72 16 924 1494.33 32 924 1480.88 64 928 686.89 128 927 627.33 256 926 552.93 My question here is: Is there any other PC can help on both reducing iterations and increasing scalability? Thanks.
