This is pretty typical. You see the factorization time is significantly better 
(because their more compute-limited) but MatMult and MatSolve are about the 
same because they are limited by memory bandwidth. On most modern 
architectures, the bandwidth is saturated with 16 cores or so.

https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup

If you haven't yet, I recommend trying to use AMG for this problem. You should 
call MatSetNearNullSpace() to set the rigid body modes and then use -pc_type 
gamg or (with external packages -pc_type ml and -pc_type hypre). The iteration 
count should be much less and solves reasonably fast.

If you're interested in using different data structures, our experience is that 
we can solve similar problem sizes using Q2 elements in a few seconds (2-10) on 
a single node. 

Gong Yujie <[email protected]> writes:

> Hi,
>
> I'm using the GMRES with ASM preconditioner with sub-domain solver ILU(2) to 
> solve an elasticity problem. First, I use 16 cores to test the computation 
> time, then use 32 cores to run the same code with the same parameters.  But I 
> just get about 10% speed up. From the log file I found that the computation 
> time of KSPSolve() and MatSolve() just decrease a little bit. My PETSc 
> version is 3.16.0 and use --with-debugging=0 when configure it. The matrix 
> size is about 7*10^6. Some detail of the log is shown below:
>
> 16-cores:
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop                           
>    --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatMult              664 1.0 5.0794e+01 1.6 2.70e+10 1.1 7.1e+04 4.8e+04 
> 1.0e+00  7 13 49 20  0   7 13 49 20  0  8010
> MatSolve             663 1.0 1.9868e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 
> 0.0e+00 33 70  0  0  0  33 70  0  0  0 10932
> MatLUFactorNum         1 1.0 6.1501e+00 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 
> 0.0e+00  1  7  0  0  0   1  7  0  0  0 35056
> MatILUFactorSym        1 1.0 1.5566e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> KSPSetUp               2 1.0 5.9627e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.5168e+02 1.0 1.90e+11 1.1 1.4e+05 4.8e+04 
> 1.3e+03 44 93 98 40 89  44 93 98 40 90 11437
> KSPGMRESOrthog       641 1.0 1.8980e+01 1.7 1.82e+10 1.1 0.0e+00 0.0e+00 
> 6.4e+02  3  9  0  0 43   3  9  0  0 44 14578
> PCSetUp                2 1.0 2.2480e+01 1.1 1.40e+10 1.1 5.3e+02 6.5e+05 
> 7.0e+00  4  7  0  2  0   4  7  0  2  0  9591
> PCSetUpOnBlocks        1 1.0 2.1555e+01 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 
> 0.0e+00  3  7  0  0  0   3  7  0  0  0 10002
> PCApply              663 1.0 2.0296e+02 1.1 1.43e+11 1.1 7.0e+04 4.8e+04 
> 1.0e+00 33 70 49 20  0  33 70 49 20  0 10701
> PCApplyOnBlocks      663 1.0 1.9908e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 
> 0.0e+00 33 70  0  0  0  33 70  0  0  0 10910
>
> 32-cores:
> ------------------------------------------------------------------------------------------------------------------------
> Event                Count      Time (sec)     Flop                           
>    --- Global ---  --- Stage ----  Total
>                    Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  
> Reduct  %T %F %M %L %R  %T %F %M %L %R Mflop/s
> ------------------------------------------------------------------------------------------------------------------------
> MatMult              671 1.0 4.7602e+01 2.0 1.39e+10 1.1 1.7e+05 2.8e+04 
> 1.0e+00  7 13 49 23  0   7 13 49 23  0  8637
> MatSolve             670 1.0 1.7800e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 
> 0.0e+00 33 71  0  0  0  33 71  0  0  0 12544
> MatLUFactorNum         1 1.0 3.5714e+00 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 
> 0.0e+00  1  7  0  0  0   1  7  0  0  0 60743
> MatILUFactorSym        1 1.0 8.4088e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
> 0.0e+00  2  0  0  0  0   2  0  0  0  0     0
> KSPSetUp               2 1.0 3.8060e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
> 2.0e+00  0  0  0  0  0   0  0  0  0  0     0
> KSPSolve               1 1.0 2.1680e+02 1.0 9.95e+10 1.1 3.5e+05 2.8e+04 
> 1.3e+03 44 93 98 47 89  44 93 98 47 90 13592
> KSPGMRESOrthog       648 1.0 1.6999e+01 2.0 9.39e+09 1.1 0.0e+00 0.0e+00 
> 6.5e+02  2  9  0  0 43   2  9  0  0 44 16450
> PCSetUp                2 1.0 1.2439e+01 1.1 7.16e+09 1.1 1.3e+03 3.7e+05 
> 7.0e+00  2  7  0  2  0   2  7  0  2  0 17440
> PCSetUpOnBlocks        1 1.0 1.1876e+01 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 
> 0.0e+00  2  7  0  0  0   2  7  0  0  0 18267
> PCApply              670 1.0 1.8235e+02 1.1 7.56e+10 1.1 1.7e+05 2.7e+04 
> 1.0e+00 34 71 49 23  0  34 71 49 23  0 12245
> PCApplyOnBlocks      670 1.0 1.7838e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 
> 0.0e+00 33 71  0  0  0  33 71  0  0  0 12517
>
> Hope you can help me!
>
> Best Regards,
> Yujie

Reply via email to