Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Fande Kong
In case someone wants to learn more about the hierarchical partitioning algorithm. Here is a reference https://arxiv.org/pdf/1809.02666.pdf Thanks Fande > On Mar 25, 2020, at 5:18 PM, Mark Adams wrote: > >  > > >> On Wed, Mar 25, 2020 at 6:40 PM Fande Kong wrote: >>> >>> On

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Mark Adams
On Wed, Mar 25, 2020 at 6:40 PM Fande Kong wrote: > > > On Wed, Mar 25, 2020 at 12:18 PM Mark Adams wrote: > >> Also, a better test is see where streams pretty much saturates, then run >> that many processors per node and do the same test by increasing the nodes. >> This will tell you how well

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Zhang, Junchao via petsc-users
MPI rank distribution (e.g., 8 ranks per node or 16 ranks per node) is usually managed by workload managers like Slurm, PBS through your job scripts, which is out of petsc’s control. From: Amin Sadeghi Date: Wednesday, March 25, 2020 at 4:40 PM To: Junchao Zhang Cc: Mark Adams , PETSc users

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Fande Kong
On Wed, Mar 25, 2020 at 12:18 PM Mark Adams wrote: > Also, a better test is see where streams pretty much saturates, then run > that many processors per node and do the same test by increasing the nodes. > This will tell you how well your network communication is doing. > > But this result has a

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Amin Sadeghi
That's great. Thanks for creating this great piece of software! Amin On Wed, Mar 25, 2020 at 5:56 PM Matthew Knepley wrote: > On Wed, Mar 25, 2020 at 5:41 PM Amin Sadeghi > wrote: > >> Junchao, thank you for doing the experiment, I guess TACC Frontera nodes >> have higher memory bandwidth

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Matthew Knepley
On Wed, Mar 25, 2020 at 5:41 PM Amin Sadeghi wrote: > Junchao, thank you for doing the experiment, I guess TACC Frontera nodes > have higher memory bandwidth (maybe more modern CPU architecture, although > I'm not familiar as to which hardware affect memory bandwidth) than Compute > Canada's

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Amin Sadeghi
Junchao, thank you for doing the experiment, I guess TACC Frontera nodes have higher memory bandwidth (maybe more modern CPU architecture, although I'm not familiar as to which hardware affect memory bandwidth) than Compute Canada's Graham. Mark, I did as you suggested. As you suspected, running

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Junchao Zhang
I repeated your experiment on one node of TACC Frontera, 1 rank: 85.0s 16 ranks: 8.2s, 10x speedup 32 ranks: 5.7s, 15x speedup --Junchao Zhang On Wed, Mar 25, 2020 at 1:18 PM Mark Adams wrote: > Also, a better test is see where streams pretty much saturates, then run > that many processors

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Mark Adams
Also, a better test is see where streams pretty much saturates, then run that many processors per node and do the same test by increasing the nodes. This will tell you how well your network communication is doing. But this result has a lot of stuff in "network communication" that can be further

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Matthew Knepley
On Wed, Mar 25, 2020 at 2:11 PM Amin Sadeghi wrote: > Thank you Matt and Mark for the explanation. That makes sense. Please > correct me if I'm wrong, I think instead of asking for the whole node with > 32 cores, if I ask for more nodes, say 4 or 8, but each with 8 cores, then > I should see

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Amin Sadeghi
Thank you Matt and Mark for the explanation. That makes sense. Please correct me if I'm wrong, I think instead of asking for the whole node with 32 cores, if I ask for more nodes, say 4 or 8, but each with 8 cores, then I should see much better speedups. Is that correct? On Wed, Mar 25, 2020 at

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Mark Adams
I would guess that you are saturating the memory bandwidth. After you make PETSc (make all) it will suggest that you test it (make test) and suggest that you run streams (make streams). I see Matt answered but let me add that when you make streams you will seed the memory rate for 1,2,3, ... NP

Re: [petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Matthew Knepley
On Wed, Mar 25, 2020 at 1:01 PM Amin Sadeghi wrote: > Hi, > > I ran KSP example 45 on a single node with 32 cores and 125GB memory using > 1, 16 and 32 MPI processes. Here's a comparison of the time spent during > KSP.solve: > > - 1 MPI process: ~98 sec, speedup: 1X > - 16 MPI processes: ~12

[petsc-users] Poor speed up for KSP example 45

2020-03-25 Thread Amin Sadeghi
Hi, I ran KSP example 45 on a single node with 32 cores and 125GB memory using 1, 16 and 32 MPI processes. Here's a comparison of the time spent during KSP.solve: - 1 MPI process: ~98 sec, speedup: 1X - 16 MPI processes: ~12 sec, speedup: ~8X - 32 MPI processes: ~11 sec, speedup: ~9X Since the

Re: [petsc-users] [petsc4py] Assembly fails

2020-03-25 Thread Matthew Knepley
On Wed, Mar 25, 2020 at 12:29 PM Alejandro Aragon - 3ME < a.m.ara...@tudelft.nl> wrote: > Dear everyone, > > I’m new to petsc4py and I’m trying to run a simple finite element code > that uses DMPLEX to load a .msh file (created by Gmsh). In version 3.10 the > code was working but I recently

[petsc-users] [petsc4py] Assembly fails

2020-03-25 Thread Alejandro Aragon - 3ME
Dear everyone, I’m new to petsc4py and I’m trying to run a simple finite element code that uses DMPLEX to load a .msh file (created by Gmsh). In version 3.10 the code was working but I recently upgraded to 3.12 and I get the following error: (.pydev) ➜ testmodule git:(e0bc9ae) ✗ mpirun -np 2