The man page of slurm/sbatch is cumbersome. But, you may think of : 1. tasks "as MPI processus" 2. cpus "as threads"
You should always set resources the most precise way when possible, that is (never use --tasks but prefer) to: 1. use --nodes=n. 2. use --tasks-per-node=t. 3. use --cpus-per-tasks=c. 4. for a start, make sure that t*c = number of cores you have per node. 5. use --exclusive unless you may have VERY different timing if you run twice the same job. 6. make sure mpi is configured correctly (run twice [or more] the same mono-thread application: get the same timing ?) 7. if using OpenMP or multithread applications, make sure you have set affinity properly (GOMP_CPU_AFFINITY whith gnu, KMP_AFFINITY with intel). 8. make sure you have enough memory (--mem) unless performance may be degraded (swap). The rule of thumb 4 may NOT be respected but if so, you need to be aware WHY you want to do that (for KNL, it may [or not] make sense [depending on cache modes]). Remember than any multi-threaded (OpenMP or not) application may be victim of false sharing (https://en.wikipedia.org/wiki/False_sharing): in this case, profile (using cache metrics) may help to understand if this is the problem, and track it if so (you may use perf-record for that). Understanding HW is not an easy thing: you really need to go step by step unless you have no chance to understand anything in the end. Hope this may help !... Franck Note: activating/deactivating hyper-threading (if available - generally in BIOS when possible) may also change performances. ----- Mail original ----- > De: "Barry Smith" <[email protected]> > À: "Damian Kaliszan" <[email protected]> > Cc: "PETSc" <[email protected]> > Envoyé: Mardi 4 Juillet 2017 19:04:36 > Objet: Re: [petsc-users] Is OpenMP still available for PETSc? > > > You may need to ask a slurm expert. I have no idea what cpus-per-task > means > > > > On Jul 4, 2017, at 4:16 AM, Damian Kaliszan <[email protected]> wrote: > > > > Hi, > > > > Yes, this is exactly what I meant. > > Please find attached output for 2 input datasets and for 2 various slurm > > configs each: > > > > A/ Matrix size=8000000x8000000 > > > > 1/ slurm-14432809.out, 930 ksp steps, ~90 secs > > > > > > #SBATCH --nodes=2 > > #SBATCH --ntasks=32 > > #SBATCH --ntasks-per-node=16 > > #SBATCH --cpus-per-task=4 > > > > 2/ slurm-14432810.out , 100.000 ksp steps, ~9700 secs > > > > #SBATCH --nodes=2 > > #SBATCH --ntasks=32 > > #SBATCH --ntasks-per-node=16 > > #SBATCH --cpus-per-task=2 > > > > > > > > B/ Matrix size=1000x1000 > > > > 1/ slurm-23716.out, 511 ksp steps, ~ 28 secs > > #SBATCH --nodes=1 > > #SBATCH --ntasks=64 > > #SBATCH --ntasks-per-node=64 > > #SBATCH --cpus-per-task=4 > > > > > > 2/ slurm-23718.out, 94 ksp steps, ~ 4 secs > > > > #SBATCH --nodes=1 > > #SBATCH --ntasks=4 > > #SBATCH --ntasks-per-node=4 > > #SBATCH --cpus-per-task=4 > > > > > > I would really appreciate any help...:) > > > > Best, > > Damian > > > > > > > > W liście datowanym 3 lipca 2017 (16:29:15) napisano: > > > > > > On Mon, Jul 3, 2017 at 9:23 AM, Damian Kaliszan <[email protected]> > > wrote: > > Hi, > > > > > > >> 1) You can call Bcast on PETSC_COMM_WORLD > > > > To be honest I can't find Bcast method in petsc4py.PETSc.Comm (I'm > > using petsc4py) > > > > >> 2) If you are using WORLD, the number of iterates will be the same on > > >> each process since iteration is collective. > > > > Yes, this is how it should be. But what I noticed is that for > > different --cpus-per-task numbers in slurm script I get different > > number of solver iterations which is in turn related to timings. The > > imparity is huge. For example for some configurations where > > --cpus-per-task=1 I receive 900 > > iterations and for --cpus-per-task=2 I receive valid number of 100.000 > > which is set as max > > iter number set when setting solver tolerances. > > > > I am trying to understand what you are saying. You mean that you make 2 > > different runs and get a different > > number of iterates with a KSP? In order to answer questions about > > convergence, we need to see the output > > of > > > > -ksp_view -ksp_monitor_true_residual -ksp_converged_reason > > > > for all cases. > > > > Thanks, > > > > Matt > > > > Best, > > Damian > > > > > > > > > > -- > > What most experimenters take for granted before they begin their > > experiments is infinitely more interesting than any results to which their > > experiments lead. > > -- Norbert Wiener > > > > http://www.caam.rice.edu/~mk51/ > > > > > > > > > > > > ------------------------------------------------------- > > Damian Kaliszan > > > > Poznan Supercomputing and Networking Center > > HPC and Data Centres Technologies > > ul. Jana Pawła II 10 > > 61-139 Poznan > > POLAND > > > > phone (+48 61) 858 5109 > > e-mail [email protected] > > www - http://www.man.poznan.pl/ > > ------------------------------------------------------- > > <slum_output.zip> > >
