On Wed, Mar 25, 2020 at 2:11 PM Amin Sadeghi <aminthefr...@gmail.com> wrote:
> Thank you Matt and Mark for the explanation. That makes sense. Please > correct me if I'm wrong, I think instead of asking for the whole node with > 32 cores, if I ask for more nodes, say 4 or 8, but each with 8 cores, then > I should see much better speedups. Is that correct? > Yes, exactly Matt > On Wed, Mar 25, 2020 at 2:04 PM Mark Adams <mfad...@lbl.gov> wrote: > >> I would guess that you are saturating the memory bandwidth. After >> you make PETSc (make all) it will suggest that you test it (make test) and >> suggest that you run streams (make streams). >> >> I see Matt answered but let me add that when you make streams you will >> seed the memory rate for 1,2,3, ... NP processes. If your machine is decent >> you should see very good speed up at the beginning and then it will start >> to saturate. You are seeing about 50% of perfect speedup at 16 process. I >> would expect that you will see something similar with streams. Without >> knowing your machine, your results look typical. >> >> On Wed, Mar 25, 2020 at 1:05 PM Amin Sadeghi <aminthefr...@gmail.com> >> wrote: >> >>> Hi, >>> >>> I ran KSP example 45 on a single node with 32 cores and 125GB memory >>> using 1, 16 and 32 MPI processes. Here's a comparison of the time spent >>> during KSP.solve: >>> >>> - 1 MPI process: ~98 sec, speedup: 1X >>> - 16 MPI processes: ~12 sec, speedup: ~8X >>> - 32 MPI processes: ~11 sec, speedup: ~9X >>> >>> Since the problem size is large enough (8M unknowns), I expected a >>> speedup much closer to 32X, rather than 9X. Is this expected? If yes, how >>> can it be improved? >>> >>> I've attached three log files for more details. >>> >>> Sincerely, >>> Amin >>> >> -- What most experimenters take for granted before they begin their experiments is infinitely more interesting than any results to which their experiments lead. -- Norbert Wiener https://www.cse.buffalo.edu/~knepley/ <http://www.cse.buffalo.edu/~knepley/>