There is a problem here. The -log_summary doesn't show all the events associated with the -pc_type gamg preconditioner it should have rows like
VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 Are you sure you ran with -pc_type gamg ? What about running with -info does it print anything about gamg? What about -ksp_view does it indicate it is using the gamg preconditioner? > On Nov 4, 2015, at 9:30 PM, TAY wee-beng <[email protected]> wrote: > > Hi, > > I have attached the 2 logs. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 4/11/2015 1:11 AM, Barry Smith wrote: >> Ok, the convergence looks good. Now run on 8 and 64 processes as before >> with -log_summary and not -ksp_monitor to see how it scales. >> >> Barry >> >>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng <[email protected]> wrote: >>> >>> Hi, >>> >>> I tried and have attached the log. >>> >>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify >>> some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng<[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I tried : >>>>> >>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>> >>>>> 2. -poisson_pc_type gamg >>>> Run with -poisson_ksp_monitor_true_residual >>>> -poisson_ksp_monitor_converged_reason >>>> Does your poisson have Neumann boundary conditions? Do you have any zeros >>>> on the diagonal for the matrix (you shouldn't). >>>> >>>> There may be something wrong with your poisson discretization that was >>>> also messing up hypre >>>> >>>> >>>> >>>>> Both options give: >>>>> >>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 >>>>> NaN NaN NaN >>>>> M Diverged but why?, time = 2 >>>>> reason = -9 >>>>> >>>>> How can I check what's wrong? >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>> hypre is just not scaling well here. I do not know why. Since hypre >>>>>> is a block box for us there is no way to determine why the poor scaling. >>>>>> >>>>>> If you make the same two runs with -pc_type gamg there will be a lot >>>>>> more information in the log summary about in what routines it is scaling >>>>>> well or poorly. >>>>>> >>>>>> Barry >>>>>> >>>>>> >>>>>> >>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng<[email protected]> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I have attached the 2 files. >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then >>>>>>>> (158)x(266)x(150) on 64 processors and send the two -log_summary >>>>>>>> results >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng<[email protected]> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the new results. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send the >>>>>>>>>> new results >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> You can see from the log summary that the PCSetUp is taking a much >>>>>>>>>> smaller percentage of the time meaning that it is reusing the >>>>>>>>>> preconditioner and not rebuilding it each time. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>> >>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 >>>>>>>>>> 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>> >>>>>>>>>> 90% of the time is in the solve but there is no significant amount >>>>>>>>>> of time in other events of the code which is just not possible. I >>>>>>>>>> hope it is due to your IO. >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng<[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 cores. >>>>>>>>>>> >>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want >>>>>>>>>>> to reuse the preconditioner, what must I do? Or what must I not do? >>>>>>>>>>> >>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>> something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>> If you are doing many time steps with the same linear solver >>>>>>>>>>>> then you MUST do your weak scaling studies with MANY time steps >>>>>>>>>>>> since the setup time of AMG only takes place in the first >>>>>>>>>>>> stimestep. So run both 48 and 96 processes with the same large >>>>>>>>>>>> number of time steps. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng<[email protected]> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new log >>>>>>>>>>>>> for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>> >>>>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>>>> something wrong with my coding? >>>>>>>>>>>>> >>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must I >>>>>>>>>>>>> not do? >>>>>>>>>>>>> >>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for >>>>>>>>>>>>> 10 timesteps (log48_10). Is it building the preconditioner at >>>>>>>>>>>>> every timestep? >>>>>>>>>>>>> >>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>> >>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You >>>>>>>>>>>>>> need to be careful and make sure you don't change the solvers >>>>>>>>>>>>>> when you change the number of processors since you can get very >>>>>>>>>>>>>> different inconsistent results >>>>>>>>>>>>>> >>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG >>>>>>>>>>>>>> algebraic multigrid setup and it is is scaling badly. When you >>>>>>>>>>>>>> double the problem size and number of processes it went from >>>>>>>>>>>>>> 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>> >>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 >>>>>>>>>>>>>> 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>> >>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 >>>>>>>>>>>>>> 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>> >>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can >>>>>>>>>>>>>> you use the same preconditioner built with BoomerAMG for all the >>>>>>>>>>>>>> time steps? Algebraic multigrid has a large set up time that you >>>>>>>>>>>>>> often doesn't matter if you have many time steps but if you have >>>>>>>>>>>>>> to rebuild it each timestep it is too large? >>>>>>>>>>>>>> >>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's algebraic >>>>>>>>>>>>>> multigrid scales for your problem/machine. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY >>>>>>>>>>>>>>>>>> wee-beng<[email protected]> wrote: >>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the >>>>>>>>>>>>>>>>>> limitations in memory, the scaling is not linear. So, I am >>>>>>>>>>>>>>>>>> trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory >>>>>>>>>>>>>>>>>> per node) >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my >>>>>>>>>>>>>>>>>> current code with my current set of data, and there is a >>>>>>>>>>>>>>>>>> formula to calculate the estimated parallel efficiency when >>>>>>>>>>>>>>>>>> using the new large set of data >>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time >>>>>>>>>>>>>>>>>> varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time >>>>>>>>>>>>>>>>>> varies with the number of processors for a >>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, >>>>>>>>>>>>>>>>>> giving 140 and 90 mins respectively. This is classified as >>>>>>>>>>>>>>>>>> strong scaling. >>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> The parallel efficiency ‘En’ for a given degree of >>>>>>>>>>>>>>>>>> parallelism ‘n’ indicates how much the program is >>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ‘En’ is >>>>>>>>>>>>>>>>>> given by the following formulae. Although their >>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and >>>>>>>>>>>>>>>>>> weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using >>>>>>>>>>>>>>>>>> Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), >>>>>>>>>>>>>>>>>> my expected parallel efficiency is only 0.5%. The proposal >>>>>>>>>>>>>>>>>> recommends value of > 50%. >>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated serial >>>>>>>>>>>>>>>>>> fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling from >>>>>>>>>>>>>>>>>> one problem and apply it to another without a >>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I >>>>>>>>>>>>>>>>>> would measure weak scaling on your current >>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize that >>>>>>>>>>>>>>>>>> this does not make sense for many scientific >>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain parallel >>>>>>>>>>>>>>>>>> efficiency. >>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse >>>>>>>>>>>>>>>>> for the expected parallel efficiency. From the formula used, >>>>>>>>>>>>>>>>> it's obvious it's doing some sort of exponential >>>>>>>>>>>>>>>>> extrapolation decrease. So unless I can achieve a near > 90% >>>>>>>>>>>>>>>>> speed up when I double the cores and problem size for my >>>>>>>>>>>>>>>>> current 48/96 cores setup, extrapolating from about 96 >>>>>>>>>>>>>>>>> nodes to 10,000 nodes will give a much lower expected >>>>>>>>>>>>>>>>> parallel efficiency for the new case. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory >>>>>>>>>>>>>>>>> requirement, it's impossible to get >90% speed when I double >>>>>>>>>>>>>>>>> the cores and problem size (ie linear increase in >>>>>>>>>>>>>>>>> performance), which means that I can't get >90% speed up when >>>>>>>>>>>>>>>>> I double the cores and problem size for my current 48/96 >>>>>>>>>>>>>>>>> cores setup. Is that so? >>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the problem >>>>>>>>>>>>>>>> and then on the problem doubled in size and number of >>>>>>>>>>>>>>>> processors? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while >>>>>>>>>>>>>>> the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in my >>>>>>>>>>>>>>>>> programming skills, but rather the way the linear equations >>>>>>>>>>>>>>>>> are solved? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), >>>>>>>>>>>>>>>>>> when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>>>>>>>> their experiments is infinitely more interesting than any >>>>>>>>>>>>>>>>>> results to which their experiments lead. >>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>> <log48.txt><log96.txt> >>>>>>>>>>>>> <log48_10.txt><log48.txt><log96.txt> >>>>>>>>>>> <log96_100.txt><log48_100.txt> >>>>>>>>> <log96_100_2.txt><log48_100_2.txt> >>>>>>> <log64_100.txt><log8_100.txt> >>> <log.txt> > > <log64_100_2.txt><log8_100_2.txt>
