> On Nov 3, 2015, at 9:04 AM, TAY wee-beng <[email protected]> wrote: > > > On 3/11/2015 9:01 PM, Matthew Knepley wrote: >> On Tue, Nov 3, 2015 at 6:58 AM, TAY wee-beng <[email protected]> wrote: >> >> On 3/11/2015 8:52 PM, Matthew Knepley wrote: >>> On Tue, Nov 3, 2015 at 6:49 AM, TAY wee-beng <[email protected]> wrote: >>> Hi, >>> >>> I tried and have attached the log. >>> >>> Ya, my Poisson eqn has Neumann boundary condition. Do I need to specify >>> some null space stuff? Like KSPSetNullSpace or MatNullSpaceCreate? >>> >>> Yes, you need to attach the constant null space to the matrix. >>> >>> Thanks, >>> >>> Matt >> Ok so can you point me to a suitable example so that I know which one to use >> specifically? >> >> https://bitbucket.org/petsc/petsc/src/9ae8fd060698c4d6fc0d13188aca8a1828c138ab/src/snes/examples/tutorials/ex12.c?at=master&fileviewer=file-view-default#ex12.c-761 >> >> Matt > Hi, > > Actually, I realised that for my Poisson eqn, I have neumann and dirichlet > BC. Dirichlet BC is at the output grids by specifying pressure = 0. So do I > still need the null space?
No, > > My Poisson eqn LHS is fixed but RHS is changing with every timestep. > > If I need to use null space, how do I know if the null space contains the > constant vector and what the the no. of vectors? I follow the example given > and added: > > call MatNullSpaceCreate(MPI_COMM_WORLD,PETSC_TRUE,0,NULL,nullsp,ierr) > > call MatSetNullSpace(A,nullsp,ierr) > > call MatNullSpaceDestroy(nullsp,ierr) > > Is that all? > > Before this, I was using HYPRE geometric solver and the matrix / vector in > the subroutine was written based on HYPRE. It worked pretty well and fast. > > However, it's a black box and it's hard to diagnose problems. > > I always had the PETSc subroutine to solve my Poisson eqn but I used KSPBCGS > or KSPGMRES with HYPRE's boomeramg as the PC. It worked but was slow. > > Matt: Thanks, I will see how it goes using the nullspace and may try > "-mg_coarse_pc_type svd" later. >> >> Thanks. >>> >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 12:45 PM, Barry Smith wrote: >>> On Nov 2, 2015, at 10:37 PM, TAY wee-beng<[email protected]> wrote: >>> >>> Hi, >>> >>> I tried : >>> >>> 1. -poisson_pc_gamg_agg_nsmooths 1 -poisson_pc_type gamg >>> >>> 2. -poisson_pc_type gamg >>> Run with -poisson_ksp_monitor_true_residual >>> -poisson_ksp_monitor_converged_reason >>> Does your poisson have Neumann boundary conditions? Do you have any zeros >>> on the diagonal for the matrix (you shouldn't). >>> >>> There may be something wrong with your poisson discretization that was >>> also messing up hypre >>> >>> >>> >>> Both options give: >>> >>> 1 0.00150000 0.00000000 0.00000000 1.00000000 >>> NaN NaN NaN >>> M Diverged but why?, time = 2 >>> reason = -9 >>> >>> How can I check what's wrong? >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 3/11/2015 3:18 AM, Barry Smith wrote: >>> hypre is just not scaling well here. I do not know why. Since hypre is >>> a block box for us there is no way to determine why the poor scaling. >>> >>> If you make the same two runs with -pc_type gamg there will be a lot >>> more information in the log summary about in what routines it is scaling >>> well or poorly. >>> >>> Barry >>> >>> >>> >>> On Nov 2, 2015, at 3:17 AM, TAY wee-beng<[email protected]> wrote: >>> >>> Hi, >>> >>> I have attached the 2 files. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 2:55 PM, Barry Smith wrote: >>> Run (158/2)x(266/2)x(150/2) grid on 8 processes and then >>> (158)x(266)x(150) on 64 processors and send the two -log_summary results >>> >>> Barry >>> >>> >>> On Nov 2, 2015, at 12:19 AM, TAY wee-beng<[email protected]> wrote: >>> >>> Hi, >>> >>> I have attached the new results. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 12:27 PM, Barry Smith wrote: >>> Run without the -momentum_ksp_view -poisson_ksp_view and send the new >>> results >>> >>> >>> You can see from the log summary that the PCSetUp is taking a much >>> smaller percentage of the time meaning that it is reusing the >>> preconditioner and not rebuilding it each time. >>> >>> Barry >>> >>> Something makes no sense with the output: it gives >>> >>> KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05 >>> 5.0e+02 90100 66100 24 90100 66100 24 165 >>> >>> 90% of the time is in the solve but there is no significant amount of time >>> in other events of the code which is just not possible. I hope it is due to >>> your IO. >>> >>> >>> >>> On Nov 1, 2015, at 10:02 PM, TAY wee-beng<[email protected]> wrote: >>> >>> Hi, >>> >>> I have attached the new run with 100 time steps for 48 and 96 cores. >>> >>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse >>> the preconditioner, what must I do? Or what must I not do? >>> >>> Why does the number of processes increase so much? Is there something wrong >>> with my coding? Seems to be so too for my new run. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 9:49 AM, Barry Smith wrote: >>> If you are doing many time steps with the same linear solver then you >>> MUST do your weak scaling studies with MANY time steps since the setup time >>> of AMG only takes place in the first stimestep. So run both 48 and 96 >>> processes with the same large number of time steps. >>> >>> Barry >>> >>> >>> >>> On Nov 1, 2015, at 7:35 PM, TAY wee-beng<[email protected]> wrote: >>> >>> Hi, >>> >>> Sorry I forgot and use the old a.out. I have attached the new log for >>> 48cores (log48), together with the 96cores log (log96). >>> >>> Why does the number of processes increase so much? Is there something wrong >>> with my coding? >>> >>> Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse >>> the preconditioner, what must I do? Or what must I not do? >>> >>> Lastly, I only simulated 2 time steps previously. Now I run for 10 >>> timesteps (log48_10). Is it building the preconditioner at every timestep? >>> >>> Also, what about momentum eqn? Is it working well? >>> >>> I will try the gamg later too. >>> >>> Thank you >>> >>> Yours sincerely, >>> >>> TAY wee-beng >>> >>> On 2/11/2015 12:30 AM, Barry Smith wrote: >>> You used gmres with 48 processes but richardson with 96. You need to be >>> careful and make sure you don't change the solvers when you change the >>> number of processors since you can get very different inconsistent results >>> >>> Anyways all the time is being spent in the BoomerAMG algebraic >>> multigrid setup and it is is scaling badly. When you double the problem >>> size and number of processes it went from 3.2445e+01 to 4.3599e+02 seconds. >>> >>> PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 >>> 4.0e+00 62 8 0 0 4 62 8 0 0 5 11 >>> >>> PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00 >>> 4.0e+00 85 18 0 0 6 85 18 0 0 6 2 >>> >>> Now is the Poisson problem changing at each timestep or can you use the >>> same preconditioner built with BoomerAMG for all the time steps? Algebraic >>> multigrid has a large set up time that you often doesn't matter if you have >>> many time steps but if you have to rebuild it each timestep it is too large? >>> >>> You might also try -pc_type gamg and see how PETSc's algebraic multigrid >>> scales for your problem/machine. >>> >>> Barry >>> >>> >>> >>> On Nov 1, 2015, at 7:30 AM, TAY wee-beng<[email protected]> wrote: >>> >>> >>> On 1/11/2015 10:00 AM, Barry Smith wrote: >>> On Oct 31, 2015, at 8:43 PM, TAY wee-beng<[email protected]> wrote: >>> >>> >>> On 1/11/2015 12:47 AM, Matthew Knepley wrote: >>> On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng<[email protected]> wrote: >>> Hi, >>> >>> I understand that as mentioned in the faq, due to the limitations in >>> memory, the scaling is not linear. So, I am trying to write a proposal to >>> use a supercomputer. >>> Its specs are: >>> Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node) >>> >>> 8 cores / processor >>> Interconnect: Tofu (6-dimensional mesh/torus) Interconnect >>> Each cabinet contains 96 computing nodes, >>> One of the requirement is to give the performance of my current code with >>> my current set of data, and there is a formula to calculate the estimated >>> parallel efficiency when using the new large set of data >>> There are 2 ways to give performance: >>> 1. Strong scaling, which is defined as how the elapsed time varies with the >>> number of processors for a fixed >>> problem. >>> 2. Weak scaling, which is defined as how the elapsed time varies with the >>> number of processors for a >>> fixed problem size per processor. >>> I ran my cases with 48 and 96 cores with my current cluster, giving 140 and >>> 90 mins respectively. This is classified as strong scaling. >>> Cluster specs: >>> CPU: AMD 6234 2.4GHz >>> 8 cores / processor (CPU) >>> 6 CPU / node >>> So 48 Cores / CPU >>> Not sure abt the memory / node >>> >>> The parallel efficiency ‘En’ for a given degree of parallelism ‘n’ >>> indicates how much the program is >>> efficiently accelerated by parallel processing. ‘En’ is given by the >>> following formulae. Although their >>> derivation processes are different depending on strong and weak scaling, >>> derived formulae are the >>> same. >>> From the estimated time, my parallel efficiency using Amdahl's law on the >>> current old cluster was 52.7%. >>> So is my results acceptable? >>> For the large data set, if using 2205 nodes (2205X8cores), my expected >>> parallel efficiency is only 0.5%. The proposal recommends value of > 50%. >>> The problem with this analysis is that the estimated serial fraction from >>> Amdahl's Law changes as a function >>> of problem size, so you cannot take the strong scaling from one problem and >>> apply it to another without a >>> model of this dependence. >>> >>> Weak scaling does model changes with problem size, so I would measure weak >>> scaling on your current >>> cluster, and extrapolate to the big machine. I realize that this does not >>> make sense for many scientific >>> applications, but neither does requiring a certain parallel efficiency. >>> Ok I check the results for my weak scaling it is even worse for the >>> expected parallel efficiency. From the formula used, it's obvious it's >>> doing some sort of exponential extrapolation decrease. So unless I can >>> achieve a near > 90% speed up when I double the cores and problem size for >>> my current 48/96 cores setup, extrapolating from about 96 nodes to >>> 10,000 nodes will give a much lower expected parallel efficiency for the >>> new case. >>> >>> However, it's mentioned in the FAQ that due to memory requirement, it's >>> impossible to get >90% speed when I double the cores and problem size (ie >>> linear increase in performance), which means that I can't get >90% speed up >>> when I double the cores and problem size for my current 48/96 cores setup. >>> Is that so? >>> What is the output of -ksp_view -log_summary on the problem and then on >>> the problem doubled in size and number of processors? >>> >>> Barry >>> Hi, >>> >>> I have attached the output >>> >>> 48 cores: log48 >>> 96 cores: log96 >>> >>> There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson >>> eqn uses hypre >>> BoomerAMG. >>> >>> Problem size doubled from 158x266x150 to 158x266x300. >>> So is it fair to say that the main problem does not lie in my programming >>> skills, but rather the way the linear equations are solved? >>> >>> Thanks. >>> Thanks, >>> >>> Matt >>> Is it possible for this type of scaling in PETSc (>50%), when using 17640 >>> (2205X8) cores? >>> Btw, I do not have access to the system. >>> >>> >>> >>> Sent using CloudMagic Email >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >>> <log48.txt><log96.txt> >>> <log48_10.txt><log48.txt><log96.txt> >>> <log96_100.txt><log48_100.txt> >>> <log96_100_2.txt><log48_100_2.txt> >>> <log64_100.txt><log8_100.txt> >>> >>> >>> >>> >>> -- >>> What most experimenters take for granted before they begin their >>> experiments is infinitely more interesting than any results to which their >>> experiments lead. >>> -- Norbert Wiener >> >> >> >> >> -- >> What most experimenters take for granted before they begin their experiments >> is infinitely more interesting than any results to which their experiments >> lead. >> -- Norbert Wiener >
