> On Nov 5, 2015, at 9:58 AM, TAY wee-beng <[email protected]> wrote: > > Sorry I realised that I didn't use gamg and that's why. But if I use gamg, > the 8 core case worked, but the 64 core case shows p diverged. > > Why is this so? Btw, I have also added nullspace in my code.
You don't need the null space and should not add it. > > Thank you. > > Yours sincerely, > > TAY wee-beng > > On 5/11/2015 12:03 PM, Barry Smith wrote: >> There is a problem here. The -log_summary doesn't show all the events >> associated with the -pc_type gamg preconditioner it should have rows like >> >> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 >> 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >> 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 >> 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 >> 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 >> 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >> >> >> Are you sure you ran with -pc_type gamg ? What about running with -info does >> it print anything about gamg? What about -ksp_view does it indicate it is >> using the gamg preconditioner? >> >> >>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng <[email protected]> wrote: >>> >>> Hi, >>> >>> I have attached the 2 logs. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>> Ok, the convergence looks good. Now run on 8 and 64 processes as before >>>> with -log_summary and not -ksp_monitor to see how it scales. >>>> >>>> Barry >>>> >>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I tried and have attached the log. >>>>> >>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify >>>>> some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng<[email protected]> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I tried : >>>>>>> >>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>> >>>>>>> 2. -poisson_pc_type gamg >>>>>> Run with -poisson_ksp_monitor_true_residual >>>>>> -poisson_ksp_monitor_converged_reason >>>>>> Does your poisson have Neumann boundary conditions? Do you have any >>>>>> zeros on the diagonal for the matrix (you shouldn't). >>>>>> >>>>>> There may be something wrong with your poisson discretization that was >>>>>> also messing up hypre >>>>>> >>>>>> >>>>>> >>>>>>> Both options give: >>>>>>> >>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 >>>>>>> NaN NaN NaN >>>>>>> M Diverged but why?, time = 2 >>>>>>> reason = -9 >>>>>>> >>>>>>> How can I check what's wrong? >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>> hypre is just not scaling well here. I do not know why. Since hypre >>>>>>>> is a block box for us there is no way to determine why the poor >>>>>>>> scaling. >>>>>>>> >>>>>>>> If you make the same two runs with -pc_type gamg there will be a >>>>>>>> lot more information in the log summary about in what routines it is >>>>>>>> scaling well or poorly. >>>>>>>> >>>>>>>> Barry >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng<[email protected]> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I have attached the 2 files. >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then >>>>>>>>>> (158)x(266)x(150) on 64 processors and send the two -log_summary >>>>>>>>>> results >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng<[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the new results. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send >>>>>>>>>>>> the new results >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a >>>>>>>>>>>> much smaller percentage of the time meaning that it is reusing the >>>>>>>>>>>> preconditioner and not rebuilding it each time. >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>> >>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 >>>>>>>>>>>> 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>> >>>>>>>>>>>> 90% of the time is in the solve but there is no significant amount >>>>>>>>>>>> of time in other events of the code which is just not possible. I >>>>>>>>>>>> hope it is due to your IO. >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng<[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 >>>>>>>>>>>>> cores. >>>>>>>>>>>>> >>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must I >>>>>>>>>>>>> not do? >>>>>>>>>>>>> >>>>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>>>> something wrong with my coding? Seems to be so too for my new run. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>> If you are doing many time steps with the same linear solver >>>>>>>>>>>>>> then you MUST do your weak scaling studies with MANY time steps >>>>>>>>>>>>>> since the setup time of AMG only takes place in the first >>>>>>>>>>>>>> stimestep. So run both 48 and 96 processes with the same large >>>>>>>>>>>>>> number of time steps. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new >>>>>>>>>>>>>>> log for 48cores (log48), together with the 96cores log (log96). >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>>>>>> something wrong with my coding? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must >>>>>>>>>>>>>>> I not do? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run for >>>>>>>>>>>>>>> 10 timesteps (log48_10). Is it building the preconditioner at >>>>>>>>>>>>>>> every timestep? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. You >>>>>>>>>>>>>>>> need to be careful and make sure you don't change the solvers >>>>>>>>>>>>>>>> when you change the number of processors since you can get >>>>>>>>>>>>>>>> very different inconsistent results >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG >>>>>>>>>>>>>>>> algebraic multigrid setup and it is is scaling badly. When you >>>>>>>>>>>>>>>> double the problem size and number of processes it went from >>>>>>>>>>>>>>>> 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 >>>>>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 >>>>>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or can >>>>>>>>>>>>>>>> you use the same preconditioner built with BoomerAMG for all >>>>>>>>>>>>>>>> the time steps? Algebraic multigrid has a large set up time >>>>>>>>>>>>>>>> that you often doesn't matter if you have many time steps but >>>>>>>>>>>>>>>> if you have to rebuild it each timestep it is too large? >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's >>>>>>>>>>>>>>>> algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY >>>>>>>>>>>>>>>>>>>> wee-beng<[email protected]> wrote: >>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the >>>>>>>>>>>>>>>>>>>> limitations in memory, the scaling is not linear. So, I am >>>>>>>>>>>>>>>>>>>> trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of >>>>>>>>>>>>>>>>>>>> memory per node) >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my >>>>>>>>>>>>>>>>>>>> current code with my current set of data, and there is a >>>>>>>>>>>>>>>>>>>> formula to calculate the estimated parallel efficiency >>>>>>>>>>>>>>>>>>>> when using the new large set of data >>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed >>>>>>>>>>>>>>>>>>>> time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time >>>>>>>>>>>>>>>>>>>> varies with the number of processors for a >>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current >>>>>>>>>>>>>>>>>>>> cluster, giving 140 and 90 mins respectively. This is >>>>>>>>>>>>>>>>>>>> classified as strong scaling. >>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> The parallel efficiency ‘En’ for a given degree of >>>>>>>>>>>>>>>>>>>> parallelism ‘n’ indicates how much the program is >>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ‘En’ is >>>>>>>>>>>>>>>>>>>> given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong and >>>>>>>>>>>>>>>>>>>> weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using >>>>>>>>>>>>>>>>>>>> Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), >>>>>>>>>>>>>>>>>>>> my expected parallel efficiency is only 0.5%. The proposal >>>>>>>>>>>>>>>>>>>> recommends value of > 50%. >>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated >>>>>>>>>>>>>>>>>>>> serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling >>>>>>>>>>>>>>>>>>>> from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I >>>>>>>>>>>>>>>>>>>> would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize >>>>>>>>>>>>>>>>>>>> that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain >>>>>>>>>>>>>>>>>>>> parallel efficiency. >>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even worse >>>>>>>>>>>>>>>>>>> for the expected parallel efficiency. From the formula >>>>>>>>>>>>>>>>>>> used, it's obvious it's doing some sort of exponential >>>>>>>>>>>>>>>>>>> extrapolation decrease. So unless I can achieve a near > >>>>>>>>>>>>>>>>>>> 90% speed up when I double the cores and problem size for >>>>>>>>>>>>>>>>>>> my current 48/96 cores setup, extrapolating from about >>>>>>>>>>>>>>>>>>> 96 nodes to 10,000 nodes will give a much lower expected >>>>>>>>>>>>>>>>>>> parallel efficiency for the new case. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory >>>>>>>>>>>>>>>>>>> requirement, it's impossible to get >90% speed when I >>>>>>>>>>>>>>>>>>> double the cores and problem size (ie linear increase in >>>>>>>>>>>>>>>>>>> performance), which means that I can't get >90% speed up >>>>>>>>>>>>>>>>>>> when I double the cores and problem size for my current >>>>>>>>>>>>>>>>>>> 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the >>>>>>>>>>>>>>>>>> problem and then on the problem doubled in size and number >>>>>>>>>>>>>>>>>> of processors? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, >>>>>>>>>>>>>>>>> while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie in >>>>>>>>>>>>>>>>>>> my programming skills, but rather the way the linear >>>>>>>>>>>>>>>>>>> equations are solved? >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), >>>>>>>>>>>>>>>>>>>> when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they begin >>>>>>>>>>>>>>>>>>>> their experiments is infinitely more interesting than any >>>>>>>>>>>>>>>>>>>> results to which their experiments lead. >>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>> <log48.txt><log96.txt> >>>>>>>>>>>>>>> <log48_10.txt><log48.txt><log96.txt> >>>>>>>>>>>>> <log96_100.txt><log48_100.txt> >>>>>>>>>>> <log96_100_2.txt><log48_100_2.txt> >>>>>>>>> <log64_100.txt><log8_100.txt> >>>>> <log.txt> >>> <log64_100_2.txt><log8_100_2.txt> >
