This is pretty typical. You see the factorization time is significantly better (because their more compute-limited) but MatMult and MatSolve are about the same because they are limited by memory bandwidth. On most modern architectures, the bandwidth is saturated with 16 cores or so.
https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup If you haven't yet, I recommend trying to use AMG for this problem. You should call MatSetNearNullSpace() to set the rigid body modes and then use -pc_type gamg or (with external packages -pc_type ml and -pc_type hypre). The iteration count should be much less and solves reasonably fast. If you're interested in using different data structures, our experience is that we can solve similar problem sizes using Q2 elements in a few seconds (2-10) on a single node. Gong Yujie <[email protected]> writes: > Hi, > > I'm using the GMRES with ASM preconditioner with sub-domain solver ILU(2) to > solve an elasticity problem. First, I use 16 cores to test the computation > time, then use 32 cores to run the same code with the same parameters. But I > just get about 10% speed up. From the log file I found that the computation > time of KSPSolve() and MatSolve() just decrease a little bit. My PETSc > version is 3.16.0 and use --with-debugging=0 when configure it. The matrix > size is about 7*10^6. Some detail of the log is shown below: > > 16-cores: > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > MatMult 664 1.0 5.0794e+01 1.6 2.70e+10 1.1 7.1e+04 4.8e+04 > 1.0e+00 7 13 49 20 0 7 13 49 20 0 8010 > MatSolve 663 1.0 1.9868e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 70 0 0 0 33 70 0 0 0 10932 > MatLUFactorNum 1 1.0 6.1501e+00 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 1 7 0 0 0 1 7 0 0 0 35056 > MatILUFactorSym 1 1.0 1.5566e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > KSPSetUp 2 1.0 5.9627e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.5168e+02 1.0 1.90e+11 1.1 1.4e+05 4.8e+04 > 1.3e+03 44 93 98 40 89 44 93 98 40 90 11437 > KSPGMRESOrthog 641 1.0 1.8980e+01 1.7 1.82e+10 1.1 0.0e+00 0.0e+00 > 6.4e+02 3 9 0 0 43 3 9 0 0 44 14578 > PCSetUp 2 1.0 2.2480e+01 1.1 1.40e+10 1.1 5.3e+02 6.5e+05 > 7.0e+00 4 7 0 2 0 4 7 0 2 0 9591 > PCSetUpOnBlocks 1 1.0 2.1555e+01 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 3 7 0 0 0 3 7 0 0 0 10002 > PCApply 663 1.0 2.0296e+02 1.1 1.43e+11 1.1 7.0e+04 4.8e+04 > 1.0e+00 33 70 49 20 0 33 70 49 20 0 10701 > PCApplyOnBlocks 663 1.0 1.9908e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 70 0 0 0 33 70 0 0 0 10910 > > 32-cores: > ------------------------------------------------------------------------------------------------------------------------ > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > ------------------------------------------------------------------------------------------------------------------------ > MatMult 671 1.0 4.7602e+01 2.0 1.39e+10 1.1 1.7e+05 2.8e+04 > 1.0e+00 7 13 49 23 0 7 13 49 23 0 8637 > MatSolve 670 1.0 1.7800e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 71 0 0 0 33 71 0 0 0 12544 > MatLUFactorNum 1 1.0 3.5714e+00 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 1 7 0 0 0 1 7 0 0 0 60743 > MatILUFactorSym 1 1.0 8.4088e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > KSPSetUp 2 1.0 3.8060e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > KSPSolve 1 1.0 2.1680e+02 1.0 9.95e+10 1.1 3.5e+05 2.8e+04 > 1.3e+03 44 93 98 47 89 44 93 98 47 90 13592 > KSPGMRESOrthog 648 1.0 1.6999e+01 2.0 9.39e+09 1.1 0.0e+00 0.0e+00 > 6.5e+02 2 9 0 0 43 2 9 0 0 44 16450 > PCSetUp 2 1.0 1.2439e+01 1.1 7.16e+09 1.1 1.3e+03 3.7e+05 > 7.0e+00 2 7 0 2 0 2 7 0 2 0 17440 > PCSetUpOnBlocks 1 1.0 1.1876e+01 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 2 7 0 0 0 2 7 0 0 0 18267 > PCApply 670 1.0 1.8235e+02 1.1 7.56e+10 1.1 1.7e+05 2.7e+04 > 1.0e+00 34 71 49 23 0 34 71 49 23 0 12245 > PCApplyOnBlocks 670 1.0 1.7838e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 71 0 0 0 33 71 0 0 0 12517 > > Hope you can help me! > > Best Regards, > Yujie
