On Mon, Jun 19, 2017 at 7:56 AM, Damian Kaliszan <[email protected]> wrote:
> Hi, > > Please find attached 2 output files from 64MPI/1 OMP vs 64/2 OMPs examples, > 23321 vs 23325 slurm task ids. > This is on 1 KNL? Then aren't you oversubscribing using 2 threads? This produces horrible performance, like you see in this log. Matt > Best, > Damian > > > W liście datowanym 19 czerwca 2017 (15:39:53) napisano: > > > On Mon, Jun 19, 2017 at 7:32 AM, Damian Kaliszan <[email protected]> > wrote: > Hi, > Thank you for the answer and the article. > I use SLURM (srun) for job submission by running > 'srun script.py script_parameters' command inside batch script so this is > SPMD model. > What I noticed is that the problems I'm having now didn't happened > before on CPU E5-2697 v3 nodes (28 cores - the best perormance I had > was using 14MPIs/2OMP per node). Problems started to appear when I moved > to KNLs. > The funny thing is that switching OMP on/off (by setting > OMP_NUM_THREADS to 1) doesn't help for all #NODES/# MPI/ #OMP > combinations. For example, for 2 nodes, 16 MPIs, for OMP=1 and 2 the > timings are huge and for 4 is OK. > > Lets narrow this down to MPI_Barrier(). What memory mode is KNL in? Did > you require > KNL to use only MCDRAM? Please show the MPI_Barrier()/MPI_Send() numbers > for the different configurations. > This measures just latency. We could also look at VecScale() to look at > memory bandwidth achieved. > > Thanks, > > Matt > > Playing with affinitty didn't help so far. > In other words at first glance results look completely random (I can > provide more such examples). > > > > Best, > Damian > > W liście datowanym 19 czerwca 2017 (14:50:25) napisano: > > > On Mon, Jun 19, 2017 at 6:42 AM, Damian Kaliszan <[email protected]> > wrote: > Hi, > > Regarding my previous post > I looked into both logs of 64MPI/1 OMP vs. 64MPI/2 OMP. > > > What attracted my attention is huge difference in MPI timings in the > following places: > > Average time to get PetscTime(): 2.14577e-07 > Average time for MPI_Barrier(): 3.9196e-05 > Average time for zero size MPI_Send(): 5.45382e-06 > > vs. > > Average time to get PetscTime(): 4.05312e-07 > Average time for MPI_Barrier(): 0.348399 > Average time for zero size MPI_Send(): 0.029937 > > Isn't something wrong with PETSc library itself?... > > I don't think so. This is bad interaction of MPI and your threading > mechanism. MPI_Barrier() and MPI_Send() are lower > level than PETSc. What threading mode did you choose for MPI? This can > have a performance impact. > > Also, the justifications for threading in this context are weak (or > non-existent): http://www.orau.gov/hpcor2015/whitepapers/Exascale_ > Computing_without_Threads-Barry_Smith.pdf > > Thanks, > > Matt > > > Best, > Damian > > Wiadomość przekazana > Od: Damian Kaliszan <[email protected]> > Do: PETSc users list <[email protected]> > Data: 16 czerwca 2017, 14:57:10 > Temat: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > > ===8<===============Treść oryginalnej wiadomości=============== > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > ===8<===========Koniec treści oryginalnej wiadomości=========== > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Pawła II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail [email protected] > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > > ---------- Forwarded message ---------- > From: Damian Kaliszan <[email protected]> > To: PETSc users list <[email protected]> > Cc: > Bcc: > Date: Fri, 16 Jun 2017 14:57:10 +0200 > Subject: [petsc-users] strange PETSc/KSP GMRES timings for MPI+OMP > configuration on KNLs > Hi, > > For several days I've been trying to figure out what is going wrong > with my python app timings solving Ax=b with KSP (GMRES) solver when > trying to run on Intel's KNL 7210/7230. > > I downsized the problem to 1000x1000 A matrix and a single node and > observed the following: > > > I'm attaching 2 extreme timings where configurations differ only by 1 OMP > thread (64MPI/1 OMP vs 64/2 OMPs), > 23321 vs 23325 slurm task ids. > > Any help will be appreciated.... > > Best, > Damian > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Pawła II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail [email protected] > www - http://www.man.poznan.pl/ > ------------------------------------------------------- > > > > > -- > What most experimenters take for granted before they begin their > experiments is infinitely more interesting than any results to which their > experiments lead. > -- Norbert Wiener > > http://www.caam.rice.edu/~mk51/ > > > > ------------------------------------------------------- > Damian Kaliszan > > Poznan Supercomputing and Networking Center > HPC and Data Centres Technologies > ul. Jana Pawła II 10 > 61-139 Poznan > POLAND > > phone (+48 61) 858 5109 > e-mail [email protected] > www - http://www.man.poznan.pl/ > ------------------------------------------------------- -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener http://www.caam.rice.edu/~mk51/
