Re: [petsc-users] Scaling with number of cores

Barry Smith Sun, 01 Nov 2015 20:29:16 -0800

  Run without the -momentum_ksp_view -poisson_ksp_view and send the new results



  You can see from the log summary that the PCSetUp is taking a much smaller 
percentage of the time meaning that it is reusing the preconditioner and not 
rebuilding it each time.

Barry

  Something makes no sense with the output: it gives

KSPSolve             199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 
5.0e+02 90100 66100 24  90100 66100 24   165

90% of the time is in the solve but there is no significant amount of time in 
other events of the code which is just not possible. I hope it is due to your 
IO.



> On Nov 1, 2015, at 10:02 PM, TAY wee-beng <[email protected]> wrote:
> 
> Hi,
> 
> I have attached the new run with 100 time steps for 48 and 96 cores.
> 
> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse 
> the preconditioner, what must I do? Or what must I not do?
> 
> Why does the number of processes increase so much? Is there something wrong 
> with my coding? Seems to be so too for my new run.
> 
> Thank you
> 
> Yours sincerely,
> 
> TAY wee-beng
> 
> On 2/11/2015 9:49 AM, Barry Smith wrote:
>>   If you are doing many time steps with the same linear solver then you MUST 
>> do your weak scaling studies with MANY time steps since the setup time of 
>> AMG only takes place in the first stimestep. So run both 48 and 96 processes 
>> with the same large number of time steps.
>> 
>>   Barry
>> 
>> 
>> 
>>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng <[email protected]> wrote:
>>> 
>>> Hi,
>>> 
>>> Sorry I forgot and use the old a.out. I have attached the new log for 
>>> 48cores (log48), together with the 96cores log (log96).
>>> 
>>> Why does the number of processes increase so much? Is there something wrong 
>>> with my coding?
>>> 
>>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse 
>>> the preconditioner, what must I do? Or what must I not do?
>>> 
>>> Lastly, I only simulated 2 time steps previously. Now I run for 10 
>>> timesteps (log48_10). Is it building the preconditioner at every timestep?
>>> 
>>> Also, what about momentum eqn? Is it working well?
>>> 
>>> I will try the gamg later too.
>>> 
>>> Thank you
>>> 
>>> Yours sincerely,
>>> 
>>> TAY wee-beng
>>> 
>>> On 2/11/2015 12:30 AM, Barry Smith wrote:
>>>>   You used gmres with 48 processes but richardson with 96. You need to be 
>>>> careful and make sure you don't change the solvers when you change the 
>>>> number of processors since you can get very different inconsistent results
>>>> 
>>>>    Anyways all the time is being spent in the BoomerAMG algebraic 
>>>> multigrid setup and it is is scaling badly. When you double the problem 
>>>> size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds.
>>>> 
>>>> PCSetUp                3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 
>>>> 4.0e+00 62  8  0  0  4  62  8  0  0  5    11
>>>> 
>>>> PCSetUp                3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 
>>>> 4.0e+00 85 18  0  0  6  85 18  0  0  6     2
>>>> 
>>>>   Now is the Poisson problem changing at each timestep or can you use the 
>>>> same preconditioner built with BoomerAMG for all the time steps? Algebraic 
>>>> multigrid has a large set up time that you often doesn't matter if you 
>>>> have many time steps but if you have to rebuild it each timestep it is too 
>>>> large?
>>>> 
>>>>   You might also try -pc_type gamg and see how PETSc's algebraic multigrid 
>>>> scales for your problem/machine.
>>>> 
>>>>   Barry
>>>> 
>>>> 
>>>> 
>>>>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng <[email protected]> wrote:
>>>>> 
>>>>> 
>>>>> On 1/11/2015 10:00 AM, Barry Smith wrote:
>>>>>>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng <[email protected]> wrote:
>>>>>>> 
>>>>>>> 
>>>>>>> On 1/11/2015 12:47 AM, Matthew Knepley wrote:
>>>>>>>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng <[email protected]> 
>>>>>>>> wrote:
>>>>>>>> Hi,
>>>>>>>> 
>>>>>>>> I understand that as mentioned in the faq, due to the limitations in 
>>>>>>>> memory, the scaling is not linear. So, I am trying to write a proposal 
>>>>>>>> to use a supercomputer.
>>>>>>>> Its specs are:
>>>>>>>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node)
>>>>>>>> 
>>>>>>>> 8 cores / processor
>>>>>>>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect
>>>>>>>> Each cabinet contains 96 computing nodes,
>>>>>>>> One of the requirement is to give the performance of my current code 
>>>>>>>> with my current set of data, and there is a formula to calculate the 
>>>>>>>> estimated parallel efficiency when using the new large set of data
>>>>>>>> There are 2 ways to give performance:
>>>>>>>> 1. Strong scaling, which is defined as how the elapsed time varies 
>>>>>>>> with the number of processors for a fixed
>>>>>>>> problem.
>>>>>>>> 2. Weak scaling, which is defined as how the elapsed time varies with 
>>>>>>>> the number of processors for a
>>>>>>>> fixed problem size per processor.
>>>>>>>> I ran my cases with 48 and 96 cores with my current cluster, giving 
>>>>>>>> 140 and 90 mins respectively. This is classified as strong scaling.
>>>>>>>> Cluster specs:
>>>>>>>> CPU: AMD 6234 2.4GHz
>>>>>>>> 8 cores / processor (CPU)
>>>>>>>> 6 CPU / node
>>>>>>>> So 48 Cores / CPU
>>>>>>>> Not sure abt the memory / node
>>>>>>>> 
>>>>>>>> The parallel efficiency ‘En’ for a given degree of parallelism ‘n’ 
>>>>>>>> indicates how much the program is
>>>>>>>> efficiently accelerated by parallel processing. ‘En’ is given by the 
>>>>>>>> following formulae. Although their
>>>>>>>> derivation processes are different depending on strong and weak 
>>>>>>>> scaling, derived formulae are the
>>>>>>>> same.
>>>>>>>> From the estimated time, my parallel efficiency using  Amdahl's law on 
>>>>>>>> the current old cluster was 52.7%.
>>>>>>>> So is my results acceptable?
>>>>>>>> For the large data set, if using 2205 nodes (2205X8cores), my expected 
>>>>>>>> parallel efficiency is only 0.5%. The proposal recommends value of > 
>>>>>>>> 50%.
>>>>>>>> The problem with this analysis is that the estimated serial fraction 
>>>>>>>> from Amdahl's Law  changes as a function
>>>>>>>> of problem size, so you cannot take the strong scaling from one 
>>>>>>>> problem and apply it to another without a
>>>>>>>> model of this dependence.
>>>>>>>> 
>>>>>>>> Weak scaling does model changes with problem size, so I would measure 
>>>>>>>> weak scaling on your current
>>>>>>>> cluster, and extrapolate to the big machine. I realize that this does 
>>>>>>>> not make sense for many scientific
>>>>>>>> applications, but neither does requiring a certain parallel efficiency.
>>>>>>> Ok I check the results for my weak scaling it is even worse for the 
>>>>>>> expected parallel efficiency. From the formula used, it's obvious it's 
>>>>>>> doing some sort of exponential extrapolation decrease. So unless I can 
>>>>>>> achieve a near > 90% speed up when I double the cores and problem size 
>>>>>>> for my current 48/96 cores setup,     extrapolating from about 96 nodes 
>>>>>>> to 10,000 nodes will give a much lower expected parallel efficiency for 
>>>>>>> the new case.
>>>>>>> 
>>>>>>> However, it's mentioned in the FAQ that due to memory requirement, it's 
>>>>>>> impossible to get >90% speed when I double the cores and problem size 
>>>>>>> (ie linear increase in performance), which means that I can't get >90% 
>>>>>>> speed up when I double the cores and problem size for my current 48/96 
>>>>>>> cores setup. Is that so?
>>>>>>   What is the output of -ksp_view -log_summary on the problem and then 
>>>>>> on the problem doubled in size and number of processors?
>>>>>> 
>>>>>>   Barry
>>>>> Hi,
>>>>> 
>>>>> I have attached the output
>>>>> 
>>>>> 48 cores: log48
>>>>> 96 cores: log96
>>>>> 
>>>>> There are 2 solvers - The momentum linear eqn uses bcgs, while the 
>>>>> Poisson eqn uses hypre BoomerAMG.
>>>>> 
>>>>> Problem size doubled from 158x266x150 to 158x266x300.
>>>>>>> So is it fair to say that the main problem does not lie in my 
>>>>>>> programming skills, but rather the way the linear equations are solved?
>>>>>>> 
>>>>>>> Thanks.
>>>>>>>>   Thanks,
>>>>>>>> 
>>>>>>>>      Matt
>>>>>>>> Is it possible for this type of scaling in PETSc (>50%), when using 
>>>>>>>> 17640 (2205X8) cores?
>>>>>>>> Btw, I do not have access to the system.
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> Sent using CloudMagic Email
>>>>>>>> 
>>>>>>>> 
>>>>>>>> 
>>>>>>>> -- 
>>>>>>>> What most experimenters take for granted before they begin their 
>>>>>>>> experiments is infinitely more interesting than any results to which 
>>>>>>>> their experiments lead.
>>>>>>>> -- Norbert Wiener
>>>>> <log48.txt><log96.txt>
>>> <log48_10.txt><log48.txt><log96.txt>
> 
> <log96_100.txt><log48_100.txt>

Re: [petsc-users] Scaling with number of cores

Reply via email to