Run without the -momentum_ksp_view -poisson_ksp_view and send the new results
You can see from the log summary that the PCSetUp is taking a much smaller percentage of the time meaning that it is reusing the preconditioner and not rebuilding it each time. Barry Something makes no sense with the output: it gives KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 5.0e+02 90100 66100 24 90100 66100 24 165 90% of the time is in the solve but there is no significant amount of time in other events of the code which is just not possible. I hope it is due to your IO. > On Nov 1, 2015, at 10:02 PM, TAY wee-beng <[email protected]> wrote: > > Hi, > > I have attached the new run with 100 time steps for 48 and 96 cores. > > Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse > the preconditioner, what must I do? Or what must I not do? > > Why does the number of processes increase so much? Is there something wrong > with my coding? Seems to be so too for my new run. > > Thank you > > Yours sincerely, > > TAY wee-beng > > On 2/11/2015 9:49 AM, Barry Smith wrote: >> If you are doing many time steps with the same linear solver then you MUST >> do your weak scaling studies with MANY time steps since the setup time of >> AMG only takes place in the first stimestep. So run both 48 and 96 processes >> with the same large number of time steps. >> >> Barry >> >> >> >>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng <[email protected]> wrote: >>> >>> Hi, >>> >>> Sorry I forgot and use the old a.out. I have attached the new log for >>> 48cores (log48), together with the 96cores log (log96). >>> >>> Why does the number of processes increase so much? Is there something wrong >>> with my coding? >>> >>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse >>> the preconditioner, what must I do? Or what must I not do? >>> >>> Lastly, I only simulated 2 time steps previously. Now I run for 10 >>> timesteps (log48_10). Is it building the preconditioner at every timestep? >>> >>> Also, what about momentum eqn? Is it working well? >>> >>> I will try the gamg later too. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>>> You used gmres with 48 processes but richardson with 96. You need to be >>>> careful and make sure you don't change the solvers when you change the >>>> number of processors since you can get very different inconsistent results >>>> >>>> Anyways all the time is being spent in the BoomerAMG algebraic >>>> multigrid setup and it is is scaling badly. When you double the problem >>>> size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>>> >>>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 >>>> 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>>> >>>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 >>>> 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>>> >>>> Now is the Poisson problem changing at each timestep or can you use the >>>> same preconditioner built with BoomerAMG for all the time steps? Algebraic >>>> multigrid has a large set up time that you often doesn't matter if you >>>> have many time steps but if you have to rebuild it each timestep it is too >>>> large? >>>> >>>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid >>>> scales for your problem/machine. >>>> >>>> Barry >>>> >>>> >>>> >>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng <[email protected]> wrote: >>>>> >>>>> >>>>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng <[email protected]> wrote: >>>>>>> >>>>>>> >>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng <[email protected]> >>>>>>>> wrote: >>>>>>>> Hi, >>>>>>>> >>>>>>>> I understand that as mentioned in the faq, due to the limitations in >>>>>>>> memory, the scaling is not linear. So, I am trying to write a proposal >>>>>>>> to use a supercomputer. >>>>>>>> Its specs are: >>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>>>>>>> >>>>>>>> 8 cores / processor >>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>>>>>>> Each cabinet contains 96 computing nodes, >>>>>>>> One of the requirement is to give the performance of my current code >>>>>>>> with my current set of data, and there is a formula to calculate the >>>>>>>> estimated parallel efficiency when using the new large set of data >>>>>>>> There are 2 ways to give performance: >>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies >>>>>>>> with the number of processors for a fixed >>>>>>>> problem. >>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with >>>>>>>> the number of processors for a >>>>>>>> fixed problem size per processor. >>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving >>>>>>>> 140 and 90 mins respectively. This is classified as strong scaling. >>>>>>>> Cluster specs: >>>>>>>> CPU: AMD 6234 2.4GHz >>>>>>>> 8 cores / processor (CPU) >>>>>>>> 6 CPU / node >>>>>>>> So 48 Cores / CPU >>>>>>>> Not sure abt the memory / node >>>>>>>> >>>>>>>> The parallel efficiency ‘En’ for a given degree of parallelism ‘n’ >>>>>>>> indicates how much the program is >>>>>>>> efficiently accelerated by parallel processing. ‘En’ is given by the >>>>>>>> following formulae. Although their >>>>>>>> derivation processes are different depending on strong and weak >>>>>>>> scaling, derived formulae are the >>>>>>>> same. >>>>>>>> From the estimated time, my parallel efficiency using Amdahl's law on >>>>>>>> the current old cluster was 52.7%. >>>>>>>> So is my results acceptable? >>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected >>>>>>>> parallel efficiency is only 0.5%. The proposal recommends value of > >>>>>>>> 50%. >>>>>>>> The problem with this analysis is that the estimated serial fraction >>>>>>>> from Amdahl's Law changes as a function >>>>>>>> of problem size, so you cannot take the strong scaling from one >>>>>>>> problem and apply it to another without a >>>>>>>> model of this dependence. >>>>>>>> >>>>>>>> Weak scaling does model changes with problem size, so I would measure >>>>>>>> weak scaling on your current >>>>>>>> cluster, and extrapolate to the big machine. I realize that this does >>>>>>>> not make sense for many scientific >>>>>>>> applications, but neither does requiring a certain parallel efficiency. >>>>>>> Ok I check the results for my weak scaling it is even worse for the >>>>>>> expected parallel efficiency. From the formula used, it's obvious it's >>>>>>> doing some sort of exponential extrapolation decrease. So unless I can >>>>>>> achieve a near > 90% speed up when I double the cores and problem size >>>>>>> for my current 48/96 cores setup, extrapolating from about 96 nodes >>>>>>> to 10,000 nodes will give a much lower expected parallel efficiency for >>>>>>> the new case. >>>>>>> >>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's >>>>>>> impossible to get >90% speed when I double the cores and problem size >>>>>>> (ie linear increase in performance), which means that I can't get >90% >>>>>>> speed up when I double the cores and problem size for my current 48/96 >>>>>>> cores setup. Is that so? >>>>>> What is the output of -ksp_view -log_summary on the problem and then >>>>>> on the problem doubled in size and number of processors? >>>>>> >>>>>> Barry >>>>> Hi, >>>>> >>>>> I have attached the output >>>>> >>>>> 48 cores: log48 >>>>> 96 cores: log96 >>>>> >>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the >>>>> Poisson eqn uses hypre BoomerAMG. >>>>> >>>>> Problem size doubled from 158x266x150 to 158x266x300. >>>>>>> So is it fair to say that the main problem does not lie in my >>>>>>> programming skills, but rather the way the linear equations are solved? >>>>>>> >>>>>>> Thanks. >>>>>>>> Thanks, >>>>>>>> >>>>>>>> Matt >>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using >>>>>>>> 17640 (2205X8) cores? >>>>>>>> Btw, I do not have access to the system. >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> Sent using CloudMagic Email >>>>>>>> >>>>>>>> >>>>>>>> >>>>>>>> -- >>>>>>>> What most experimenters take for granted before they begin their >>>>>>>> experiments is infinitely more interesting than any results to which >>>>>>>> their experiments lead. >>>>>>>> -- Norbert Wiener >>>>> <log48.txt><log96.txt> >>> <log48_10.txt><log48.txt><log96.txt> > > <log96_100.txt><log48_100.txt>
