Ah, "cores". Jed is right if these are cores on one socket. On Sun, Feb 27, 2022 at 10:16 AM Jed Brown <[email protected]> wrote:
> This is pretty typical. You see the factorization time is significantly > better (because their more compute-limited) but MatMult and MatSolve are > about the same because they are limited by memory bandwidth. On most modern > architectures, the bandwidth is saturated with 16 cores or so. > > > https://petsc.org/release/faq/#what-kind-of-parallel-computers-or-clusters-are-needed-to-use-petsc-or-why-do-i-get-little-speedup > > If you haven't yet, I recommend trying to use AMG for this problem. You > should call MatSetNearNullSpace() to set the rigid body modes and then use > -pc_type gamg or (with external packages -pc_type ml and -pc_type hypre). > The iteration count should be much less and solves reasonably fast. > > If you're interested in using different data structures, our experience is > that we can solve similar problem sizes using Q2 elements in a few seconds > (2-10) on a single node. > > Gong Yujie <[email protected]> writes: > > > Hi, > > > > I'm using the GMRES with ASM preconditioner with sub-domain solver > ILU(2) to solve an elasticity problem. First, I use 16 cores to test the > computation time, then use 32 cores to run the same code with the same > parameters. But I just get about 10% speed up. From the log file I found > that the computation time of KSPSolve() and MatSolve() just decrease a > little bit. My PETSc version is 3.16.0 and use --with-debugging=0 when > configure it. The matrix size is about 7*10^6. Some detail of the log is > shown below: > > > > 16-cores: > > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > > ------------------------------------------------------------------------------------------------------------------------ > > MatMult 664 1.0 5.0794e+01 1.6 2.70e+10 1.1 7.1e+04 4.8e+04 > 1.0e+00 7 13 49 20 0 7 13 49 20 0 8010 > > MatSolve 663 1.0 1.9868e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 70 0 0 0 33 70 0 0 0 10932 > > MatLUFactorNum 1 1.0 6.1501e+00 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 1 7 0 0 0 1 7 0 0 0 35056 > > MatILUFactorSym 1 1.0 1.5566e+01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > KSPSetUp 2 1.0 5.9627e-03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 2.5168e+02 1.0 1.90e+11 1.1 1.4e+05 4.8e+04 > 1.3e+03 44 93 98 40 89 44 93 98 40 90 11437 > > KSPGMRESOrthog 641 1.0 1.8980e+01 1.7 1.82e+10 1.1 0.0e+00 0.0e+00 > 6.4e+02 3 9 0 0 43 3 9 0 0 44 14578 > > PCSetUp 2 1.0 2.2480e+01 1.1 1.40e+10 1.1 5.3e+02 6.5e+05 > 7.0e+00 4 7 0 2 0 4 7 0 2 0 9591 > > PCSetUpOnBlocks 1 1.0 2.1555e+01 1.1 1.40e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 3 7 0 0 0 3 7 0 0 0 10002 > > PCApply 663 1.0 2.0296e+02 1.1 1.43e+11 1.1 7.0e+04 4.8e+04 > 1.0e+00 33 70 49 20 0 33 70 49 20 0 10701 > > PCApplyOnBlocks 663 1.0 1.9908e+02 1.1 1.43e+11 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 70 0 0 0 33 70 0 0 0 10910 > > > > 32-cores: > > > ------------------------------------------------------------------------------------------------------------------------ > > Event Count Time (sec) Flop > --- Global --- --- Stage ---- Total > > Max Ratio Max Ratio Max Ratio Mess AvgLen > Reduct %T %F %M %L %R %T %F %M %L %R Mflop/s > > > ------------------------------------------------------------------------------------------------------------------------ > > MatMult 671 1.0 4.7602e+01 2.0 1.39e+10 1.1 1.7e+05 2.8e+04 > 1.0e+00 7 13 49 23 0 7 13 49 23 0 8637 > > MatSolve 670 1.0 1.7800e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 71 0 0 0 33 71 0 0 0 12544 > > MatLUFactorNum 1 1.0 3.5714e+00 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 1 7 0 0 0 1 7 0 0 0 60743 > > MatILUFactorSym 1 1.0 8.4088e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 > 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 > > KSPSetUp 2 1.0 3.8060e-03 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 > 2.0e+00 0 0 0 0 0 0 0 0 0 0 0 > > KSPSolve 1 1.0 2.1680e+02 1.0 9.95e+10 1.1 3.5e+05 2.8e+04 > 1.3e+03 44 93 98 47 89 44 93 98 47 90 13592 > > KSPGMRESOrthog 648 1.0 1.6999e+01 2.0 9.39e+09 1.1 0.0e+00 0.0e+00 > 6.5e+02 2 9 0 0 43 2 9 0 0 44 16450 > > PCSetUp 2 1.0 1.2439e+01 1.1 7.16e+09 1.1 1.3e+03 3.7e+05 > 7.0e+00 2 7 0 2 0 2 7 0 2 0 17440 > > PCSetUpOnBlocks 1 1.0 1.1876e+01 1.1 7.16e+09 1.1 0.0e+00 0.0e+00 > 0.0e+00 2 7 0 0 0 2 7 0 0 0 18267 > > PCApply 670 1.0 1.8235e+02 1.1 7.56e+10 1.1 1.7e+05 2.7e+04 > 1.0e+00 34 71 49 23 0 34 71 49 23 0 12245 > > PCApplyOnBlocks 670 1.0 1.7838e+02 1.1 7.56e+10 1.1 0.0e+00 0.0e+00 > 0.0e+00 33 71 0 0 0 33 71 0 0 0 12517 > > > > Hope you can help me! > > > > Best Regards, > > Yujie >
