Ok the 64 case not converging makes no sense. Run it with ksp_monitor and ksp_converged_reason for the pressure solve turned on and -info
You need to figure out why it is not converging. Barry > On Nov 5, 2015, at 8:47 PM, TAY wee-beng <[email protected]> wrote: > > Hi, > > I have removed the nullspace and attached the new logs. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 6/11/2015 12:07 AM, Barry Smith wrote: >>> On Nov 5, 2015, at 9:58 AM, TAY wee-beng <[email protected]> wrote: >>> >>> Sorry I realised that I didn't use gamg and that's why. But if I use gamg, >>> the 8 core case worked, but the 64 core case shows p diverged. >> Where is the log file for the 8 core case? And where is all the output >> from where it fails with 64 cores? Include -ksp_monitor_true_residual and >> -ksp_converged_reason >> >> Barry >> >>> Why is this so? Btw, I have also added nullspace in my code. >>> >>> Thank you. >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 5/11/2015 12:03 PM, Barry Smith wrote: >>>> There is a problem here. The -log_summary doesn't show all the events >>>> associated with the -pc_type gamg preconditioner it should have rows like >>>> >>>> VecDot 2 1.0 6.1989e-06 1.0 1.00e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1613 >>>> VecMDot 134 1.0 5.4145e-04 1.0 1.64e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 3 0 0 0 0 3 0 0 0 3025 >>>> VecNorm 154 1.0 2.4176e-04 1.0 3.82e+05 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1578 >>>> VecScale 148 1.0 1.6928e-04 1.0 1.76e+05 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1039 >>>> VecCopy 106 1.0 1.2255e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSet 474 1.0 5.1236e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecAXPY 54 1.0 1.3471e-04 1.0 2.35e+05 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1742 >>>> VecAYPX 384 1.0 5.7459e-04 1.0 4.94e+05 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 860 >>>> VecAXPBYCZ 192 1.0 4.7398e-04 1.0 9.88e+05 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 2085 >>>> VecWAXPY 2 1.0 7.8678e-06 1.0 5.00e+03 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 636 >>>> VecMAXPY 148 1.0 8.1539e-04 1.0 1.96e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 3 0 0 0 1 3 0 0 0 2399 >>>> VecPointwiseMult 66 1.0 1.1253e-04 1.0 6.79e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 604 >>>> VecScatterBegin 45 1.0 6.3419e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecSetRandom 6 1.0 3.0994e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecReduceArith 4 1.0 1.3113e-05 1.0 2.00e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 1525 >>>> VecReduceComm 2 1.0 0.0000e+00 0.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> VecNormalize 148 1.0 4.4799e-04 1.0 5.27e+05 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 1 0 0 0 0 1 0 0 0 1177 >>>> MatMult 424 1.0 8.9276e-03 1.0 2.09e+07 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 7 37 0 0 0 7 37 0 0 0 2343 >>>> MatMultAdd 48 1.0 5.0926e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 2 0 0 0 0 2 0 0 0 2069 >>>> MatMultTranspose 48 1.0 9.8586e-04 1.0 1.05e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 2 0 0 0 1 2 0 0 0 1069 >>>> MatSolve 16 1.0 2.2173e-05 1.0 1.02e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 460 >>>> MatSOR 354 1.0 1.0547e-02 1.0 1.72e+07 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 9 31 0 0 0 9 31 0 0 0 1631 >>>> MatLUFactorSym 2 1.0 4.7922e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatLUFactorNum 2 1.0 2.5272e-05 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 307 >>>> MatScale 18 1.0 1.7142e-04 1.0 1.50e+05 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 874 >>>> MatResidual 48 1.0 1.0548e-03 1.0 2.33e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 4 0 0 0 1 4 0 0 0 2212 >>>> MatAssemblyBegin 57 1.0 4.7684e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAssemblyEnd 57 1.0 1.9786e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> MatGetRow 21616 1.0 1.8497e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> MatGetRowIJ 2 1.0 6.9141e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatGetOrdering 2 1.0 6.0797e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatCoarsen 6 1.0 9.3222e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatZeroEntries 2 1.0 3.9101e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatAXPY 6 1.0 1.7998e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 2 0 0 0 0 2 0 0 0 0 0 >>>> MatFDColorCreate 1 1.0 3.2902e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatFDColorSetUp 1 1.0 1.6739e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 0 >>>> MatFDColorApply 2 1.0 1.3199e-03 1.0 2.41e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 4 0 0 0 1 4 0 0 0 1826 >>>> MatFDColorFunc 42 1.0 7.4601e-04 1.0 2.20e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 4 0 0 0 1 4 0 0 0 2956 >>>> MatMatMult 6 1.0 5.1048e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 4 2 0 0 0 4 2 0 0 0 241 >>>> MatMatMultSym 6 1.0 3.2601e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 3 0 0 0 0 3 0 0 0 0 0 >>>> MatMatMultNum 6 1.0 1.8158e-03 1.0 1.23e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 2 2 0 0 0 2 2 0 0 0 679 >>>> MatPtAP 6 1.0 2.1328e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>> MatPtAPSymbolic 6 1.0 1.0073e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 8 0 0 0 0 8 0 0 0 0 0 >>>> MatPtAPNumeric 6 1.0 1.1230e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 9 11 0 0 0 9 11 0 0 0 537 >>>> MatTrnMatMult 2 1.0 7.2789e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 75 >>>> MatTrnMatMultSym 2 1.0 5.7006e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> MatTrnMatMultNum 2 1.0 1.5473e-04 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 352 >>>> MatGetSymTrans 8 1.0 3.1638e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPGMRESOrthog 134 1.0 1.3156e-03 1.0 3.28e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 6 0 0 0 1 6 0 0 0 2491 >>>> KSPSetUp 24 1.0 4.6754e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> KSPSolve 2 1.0 1.1291e-01 1.0 5.32e+07 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 94 95 0 0 0 94 95 0 0 0 471 >>>> PCGAMGGraph_AGG 6 1.0 1.2108e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>> PCGAMGCoarse_AGG 6 1.0 1.1127e-03 1.0 5.44e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 1 0 0 0 0 1 0 0 0 0 49 >>>> PCGAMGProl_AGG 6 1.0 4.1062e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>> PCGAMGPOpt_AGG 6 1.0 1.1200e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>> GAMG: createProl 6 1.0 6.5530e-02 1.0 6.06e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 55 11 0 0 0 55 11 0 0 0 92 >>>> Graph 12 1.0 1.1692e-02 1.0 1.82e+04 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 10 0 0 0 0 10 0 0 0 0 2 >>>> MIS/Agg 6 1.0 1.4496e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SA: col data 6 1.0 7.1526e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 0 >>>> SA: frmProl0 6 1.0 4.0917e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 >>>> 0.0e+00 34 0 0 0 0 34 0 0 0 0 0 >>>> SA: smooth 6 1.0 1.1198e-02 1.0 5.98e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 9 11 0 0 0 9 11 0 0 0 534 >>>> GAMG: partLevel 6 1.0 2.1341e-02 1.0 6.03e+06 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 18 11 0 0 0 18 11 0 0 0 283 >>>> PCSetUp 4 1.0 8.8020e-02 1.0 1.21e+07 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 74 22 0 0 0 74 22 0 0 0 137 >>>> PCSetUpOnBlocks 16 1.0 1.8382e-04 1.0 7.75e+03 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 0 0 0 0 0 0 0 0 0 0 42 >>>> PCApply 16 1.0 2.3858e-02 1.0 3.91e+07 1.0 0.0e+00 0.0e+00 >>>> 0.0e+00 20 70 0 0 0 20 70 0 0 0 1637 >>>> >>>> >>>> Are you sure you ran with -pc_type gamg ? What about running with -info >>>> does it print anything about gamg? What about -ksp_view does it indicate >>>> it is using the gamg preconditioner? >>>> >>>> >>>>> On Nov 4, 2015, at 9:30 PM, TAY wee-beng <[email protected]> wrote: >>>>> >>>>> Hi, >>>>> >>>>> I have attached the 2 logs. >>>>> >>>>> Thank you >>>>> >>>>> Yours sincerely, >>>>> >>>>> TAY wee-beng >>>>> >>>>> On 4/11/2015 1:11 AM, Barry Smith wrote: >>>>>> Ok, the convergence looks good. Now run on 8 and 64 processes as >>>>>> before with -log_summary and not -ksp_monitor to see how it scales. >>>>>> >>>>>> Barry >>>>>> >>>>>>> On Nov 3, 2015, at 6:49 AM, TAY wee-beng <[email protected]> wrote: >>>>>>> >>>>>>> Hi, >>>>>>> >>>>>>> I tried and have attached the log. >>>>>>> >>>>>>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify >>>>>>> some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>>>>>> >>>>>>> Thank you >>>>>>> >>>>>>> Yours sincerely, >>>>>>> >>>>>>> TAY wee-beng >>>>>>> >>>>>>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>>>>>>>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng<[email protected]> wrote: >>>>>>>>> >>>>>>>>> Hi, >>>>>>>>> >>>>>>>>> I tried : >>>>>>>>> >>>>>>>>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>>>>>>>> >>>>>>>>> 2. -poisson_pc_type gamg >>>>>>>> Run with -poisson_ksp_monitor_true_residual >>>>>>>> -poisson_ksp_monitor_converged_reason >>>>>>>> Does your poisson have Neumann boundary conditions? Do you have any >>>>>>>> zeros on the diagonal for the matrix (you shouldn't). >>>>>>>> >>>>>>>> There may be something wrong with your poisson discretization that >>>>>>>> was also messing up hypre >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>>> Both options give: >>>>>>>>> >>>>>>>>> 1 0.00150000 0.00000000 0.00000000 1.00000000 >>>>>>>>> NaN NaN NaN >>>>>>>>> M Diverged but why?, time = 2 >>>>>>>>> reason = -9 >>>>>>>>> >>>>>>>>> How can I check what's wrong? >>>>>>>>> >>>>>>>>> Thank you >>>>>>>>> >>>>>>>>> Yours sincerely, >>>>>>>>> >>>>>>>>> TAY wee-beng >>>>>>>>> >>>>>>>>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>>>>>>>>> hypre is just not scaling well here. I do not know why. Since >>>>>>>>>> hypre is a block box for us there is no way to determine why the >>>>>>>>>> poor scaling. >>>>>>>>>> >>>>>>>>>> If you make the same two runs with -pc_type gamg there will be a >>>>>>>>>> lot more information in the log summary about in what routines it is >>>>>>>>>> scaling well or poorly. >>>>>>>>>> >>>>>>>>>> Barry >>>>>>>>>> >>>>>>>>>> >>>>>>>>>> >>>>>>>>>>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng<[email protected]> wrote: >>>>>>>>>>> >>>>>>>>>>> Hi, >>>>>>>>>>> >>>>>>>>>>> I have attached the 2 files. >>>>>>>>>>> >>>>>>>>>>> Thank you >>>>>>>>>>> >>>>>>>>>>> Yours sincerely, >>>>>>>>>>> >>>>>>>>>>> TAY wee-beng >>>>>>>>>>> >>>>>>>>>>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>>>>>>>>>>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then >>>>>>>>>>>> (158)x(266)x(150) on 64 processors and send the two -log_summary >>>>>>>>>>>> results >>>>>>>>>>>> >>>>>>>>>>>> Barry >>>>>>>>>>>> >>>>>>>>>>>> >>>>>>>>>>>>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng<[email protected]> >>>>>>>>>>>>> wrote: >>>>>>>>>>>>> >>>>>>>>>>>>> Hi, >>>>>>>>>>>>> >>>>>>>>>>>>> I have attached the new results. >>>>>>>>>>>>> >>>>>>>>>>>>> Thank you >>>>>>>>>>>>> >>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>> >>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>> >>>>>>>>>>>>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>>>>>>>>>>>>> Run without the -momentum_ksp_view -poisson_ksp_view and send >>>>>>>>>>>>>> the new results >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> You can see from the log summary that the PCSetUp is taking a >>>>>>>>>>>>>> much smaller percentage of the time meaning that it is reusing >>>>>>>>>>>>>> the preconditioner and not rebuilding it each time. >>>>>>>>>>>>>> >>>>>>>>>>>>>> Barry >>>>>>>>>>>>>> >>>>>>>>>>>>>> Something makes no sense with the output: it gives >>>>>>>>>>>>>> >>>>>>>>>>>>>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 >>>>>>>>>>>>>> 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 >>>>>>>>>>>>>> >>>>>>>>>>>>>> 90% of the time is in the solve but there is no significant >>>>>>>>>>>>>> amount of time in other events of the code which is just not >>>>>>>>>>>>>> possible. I hope it is due to your IO. >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>> >>>>>>>>>>>>>>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> I have attached the new run with 100 time steps for 48 and 96 >>>>>>>>>>>>>>> cores. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what must >>>>>>>>>>>>>>> I not do? >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>>>>>> something wrong with my coding? Seems to be so too for my new >>>>>>>>>>>>>>> run. >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>> >>>>>>>>>>>>>>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>>>>>>>>>>>>>>> If you are doing many time steps with the same linear solver >>>>>>>>>>>>>>>> then you MUST do your weak scaling studies with MANY time >>>>>>>>>>>>>>>> steps since the setup time of AMG only takes place in the >>>>>>>>>>>>>>>> first stimestep. So run both 48 and 96 processes with the same >>>>>>>>>>>>>>>> large number of time steps. >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Sorry I forgot and use the old a.out. I have attached the new >>>>>>>>>>>>>>>>> log for 48cores (log48), together with the 96cores log >>>>>>>>>>>>>>>>> (log96). >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Why does the number of processes increase so much? Is there >>>>>>>>>>>>>>>>> something wrong with my coding? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I >>>>>>>>>>>>>>>>> want to reuse the preconditioner, what must I do? Or what >>>>>>>>>>>>>>>>> must I not do? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Lastly, I only simulated 2 time steps previously. Now I run >>>>>>>>>>>>>>>>> for 10 timesteps (log48_10). Is it building the >>>>>>>>>>>>>>>>> preconditioner at every timestep? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Also, what about momentum eqn? Is it working well? >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> I will try the gamg later too. >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Thank you >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> Yours sincerely, >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> TAY wee-beng >>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>> You used gmres with 48 processes but richardson with 96. >>>>>>>>>>>>>>>>>> You need to be careful and make sure you don't change the >>>>>>>>>>>>>>>>>> solvers when you change the number of processors since you >>>>>>>>>>>>>>>>>> can get very different inconsistent results >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Anyways all the time is being spent in the BoomerAMG >>>>>>>>>>>>>>>>>> algebraic multigrid setup and it is is scaling badly. When >>>>>>>>>>>>>>>>>> you double the problem size and number of processes it went >>>>>>>>>>>>>>>>>> from 3.2445e+01 to 4.3599e+02 seconds. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 >>>>>>>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 >>>>>>>>>>>>>>>>>> 0.0e+00 0.0e+00 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Now is the Poisson problem changing at each timestep or >>>>>>>>>>>>>>>>>> can you use the same preconditioner built with BoomerAMG for >>>>>>>>>>>>>>>>>> all the time steps? Algebraic multigrid has a large set up >>>>>>>>>>>>>>>>>> time that you often doesn't matter if you have many time >>>>>>>>>>>>>>>>>> steps but if you have to rebuild it each timestep it is too >>>>>>>>>>>>>>>>>> large? >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> You might also try -pc_type gamg and see how PETSc's >>>>>>>>>>>>>>>>>> algebraic multigrid scales for your problem/machine. >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng<[email protected]> >>>>>>>>>>>>>>>>>>> wrote: >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>>>>>>>>>>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY >>>>>>>>>>>>>>>>>>>>> wee-beng<[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>>>>>>>>>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY >>>>>>>>>>>>>>>>>>>>>> wee-beng<[email protected]> wrote: >>>>>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> I understand that as mentioned in the faq, due to the >>>>>>>>>>>>>>>>>>>>>> limitations in memory, the scaling is not linear. So, I >>>>>>>>>>>>>>>>>>>>>> am trying to write a proposal to use a supercomputer. >>>>>>>>>>>>>>>>>>>>>> Its specs are: >>>>>>>>>>>>>>>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of >>>>>>>>>>>>>>>>>>>>>> memory per node) >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> 8 cores / processor >>>>>>>>>>>>>>>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) >>>>>>>>>>>>>>>>>>>>>> Interconnect >>>>>>>>>>>>>>>>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>>>>>>>>>>>>>>>> One of the requirement is to give the performance of my >>>>>>>>>>>>>>>>>>>>>> current code with my current set of data, and there is a >>>>>>>>>>>>>>>>>>>>>> formula to calculate the estimated parallel efficiency >>>>>>>>>>>>>>>>>>>>>> when using the new large set of data >>>>>>>>>>>>>>>>>>>>>> There are 2 ways to give performance: >>>>>>>>>>>>>>>>>>>>>> 1. Strong scaling, which is defined as how the elapsed >>>>>>>>>>>>>>>>>>>>>> time varies with the number of processors for a fixed >>>>>>>>>>>>>>>>>>>>>> problem. >>>>>>>>>>>>>>>>>>>>>> 2. Weak scaling, which is defined as how the elapsed >>>>>>>>>>>>>>>>>>>>>> time varies with the number of processors for a >>>>>>>>>>>>>>>>>>>>>> fixed problem size per processor. >>>>>>>>>>>>>>>>>>>>>> I ran my cases with 48 and 96 cores with my current >>>>>>>>>>>>>>>>>>>>>> cluster, giving 140 and 90 mins respectively. This is >>>>>>>>>>>>>>>>>>>>>> classified as strong scaling. >>>>>>>>>>>>>>>>>>>>>> Cluster specs: >>>>>>>>>>>>>>>>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>>>>>>>>>>>>>>>> 8 cores / processor (CPU) >>>>>>>>>>>>>>>>>>>>>> 6 CPU / node >>>>>>>>>>>>>>>>>>>>>> So 48 Cores / CPU >>>>>>>>>>>>>>>>>>>>>> Not sure abt the memory / node >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> The parallel efficiency ‘En’ for a given degree of >>>>>>>>>>>>>>>>>>>>>> parallelism ‘n’ indicates how much the program is >>>>>>>>>>>>>>>>>>>>>> efficiently accelerated by parallel processing. ‘En’ is >>>>>>>>>>>>>>>>>>>>>> given by the following formulae. Although their >>>>>>>>>>>>>>>>>>>>>> derivation processes are different depending on strong >>>>>>>>>>>>>>>>>>>>>> and weak scaling, derived formulae are the >>>>>>>>>>>>>>>>>>>>>> same. >>>>>>>>>>>>>>>>>>>>>> From the estimated time, my parallel efficiency using >>>>>>>>>>>>>>>>>>>>>> Amdahl's law on the current old cluster was 52.7%. >>>>>>>>>>>>>>>>>>>>>> So is my results acceptable? >>>>>>>>>>>>>>>>>>>>>> For the large data set, if using 2205 nodes >>>>>>>>>>>>>>>>>>>>>> (2205X8cores), my expected parallel efficiency is only >>>>>>>>>>>>>>>>>>>>>> 0.5%. The proposal recommends value of > 50%. >>>>>>>>>>>>>>>>>>>>>> The problem with this analysis is that the estimated >>>>>>>>>>>>>>>>>>>>>> serial fraction from Amdahl's Law changes as a function >>>>>>>>>>>>>>>>>>>>>> of problem size, so you cannot take the strong scaling >>>>>>>>>>>>>>>>>>>>>> from one problem and apply it to another without a >>>>>>>>>>>>>>>>>>>>>> model of this dependence. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Weak scaling does model changes with problem size, so I >>>>>>>>>>>>>>>>>>>>>> would measure weak scaling on your current >>>>>>>>>>>>>>>>>>>>>> cluster, and extrapolate to the big machine. I realize >>>>>>>>>>>>>>>>>>>>>> that this does not make sense for many scientific >>>>>>>>>>>>>>>>>>>>>> applications, but neither does requiring a certain >>>>>>>>>>>>>>>>>>>>>> parallel efficiency. >>>>>>>>>>>>>>>>>>>>> Ok I check the results for my weak scaling it is even >>>>>>>>>>>>>>>>>>>>> worse for the expected parallel efficiency. From the >>>>>>>>>>>>>>>>>>>>> formula used, it's obvious it's doing some sort of >>>>>>>>>>>>>>>>>>>>> exponential extrapolation decrease. So unless I can >>>>>>>>>>>>>>>>>>>>> achieve a near > 90% speed up when I double the cores and >>>>>>>>>>>>>>>>>>>>> problem size for my current 48/96 cores setup, >>>>>>>>>>>>>>>>>>>>> extrapolating from about 96 nodes to 10,000 nodes will >>>>>>>>>>>>>>>>>>>>> give a much lower expected parallel efficiency for the >>>>>>>>>>>>>>>>>>>>> new case. >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> However, it's mentioned in the FAQ that due to memory >>>>>>>>>>>>>>>>>>>>> requirement, it's impossible to get >90% speed when I >>>>>>>>>>>>>>>>>>>>> double the cores and problem size (ie linear increase in >>>>>>>>>>>>>>>>>>>>> performance), which means that I can't get >90% speed up >>>>>>>>>>>>>>>>>>>>> when I double the cores and problem size for my current >>>>>>>>>>>>>>>>>>>>> 48/96 cores setup. Is that so? >>>>>>>>>>>>>>>>>>>> What is the output of -ksp_view -log_summary on the >>>>>>>>>>>>>>>>>>>> problem and then on the problem doubled in size and number >>>>>>>>>>>>>>>>>>>> of processors? >>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>> Barry >>>>>>>>>>>>>>>>>>> Hi, >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> I have attached the output >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> 48 cores: log48 >>>>>>>>>>>>>>>>>>> 96 cores: log96 >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, >>>>>>>>>>>>>>>>>>> while the Poisson eqn uses hypre BoomerAMG. >>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>>>>>>>>>>>>>>>> So is it fair to say that the main problem does not lie >>>>>>>>>>>>>>>>>>>>> in my programming skills, but rather the way the linear >>>>>>>>>>>>>>>>>>>>> equations are solved? >>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>> Thanks. >>>>>>>>>>>>>>>>>>>>>> Thanks, >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Matt >>>>>>>>>>>>>>>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), >>>>>>>>>>>>>>>>>>>>>> when using 17640 (2205X8) cores? >>>>>>>>>>>>>>>>>>>>>> Btw, I do not have access to the system. >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> Sent using CloudMagic Email >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> >>>>>>>>>>>>>>>>>>>>>> -- >>>>>>>>>>>>>>>>>>>>>> What most experimenters take for granted before they >>>>>>>>>>>>>>>>>>>>>> begin their experiments is infinitely more interesting >>>>>>>>>>>>>>>>>>>>>> than any results to which their experiments lead. >>>>>>>>>>>>>>>>>>>>>> -- Norbert Wiener >>>>>>>>>>>>>>>>>>> <log48.txt><log96.txt> >>>>>>>>>>>>>>>>> <log48_10.txt><log48.txt><log96.txt> >>>>>>>>>>>>>>> <log96_100.txt><log48_100.txt> >>>>>>>>>>>>> <log96_100_2.txt><log48_100_2.txt> >>>>>>>>>>> <log64_100.txt><log8_100.txt> >>>>>>> <log.txt> >>>>> <log64_100_2.txt><log8_100_2.txt> > > <log8_100_3.txt><log64_100_3.txt>
