Thank you for your feedback.
@Stefano: the use of my communicator was intentional, since I later
intend to distribute M independent calculations to N processes, each
process then only needing to do M/N calculations. Of course I don't
expect speed up in my example since the number of calculations is
constant and not dependent on N, but I would hope that the time each
process takes does not increase too drastically with N.
@Barry: I tried to do the STREAMS benchmark, these are my results:
1 23467.9961 Rate (MB/s) 1
2 26852.0536 Rate (MB/s) 1.1442
3 29715.4762 Rate (MB/s) 1.26621
4 34132.2490 Rate (MB/s) 1.45442
5 34924.3020 Rate (MB/s) 1.48817
6 34315.5290 Rate (MB/s) 1.46223
7 33134.9545 Rate (MB/s) 1.41192
8 33234.9141 Rate (MB/s) 1.41618
9 32584.3349 Rate (MB/s) 1.38846
10 32582.3962 Rate (MB/s) 1.38838
11 32098.2903 Rate (MB/s) 1.36775
12 32064.8779 Rate (MB/s) 1.36632
13 31692.0541 Rate (MB/s) 1.35044
14 31274.2421 Rate (MB/s) 1.33263
15 31574.0196 Rate (MB/s) 1.34541
16 30906.7773 Rate (MB/s) 1.31698
I also attached the resulting plot. As it seems, I get very bad MPI
speedup (red curve, right?), even decreasing if I use too many
threads. I don't fully understand the reasons given in the discussion
you linked since this is all very new to me, but I take that this is a
problem with my computer which I can't easily fix, right?
----- Message from Barry Smith <[email protected]> ---------
Date: Thu, 11 Jan 2024 11:56:24 -0500
From: Barry Smith <[email protected]>
Subject: Re: [petsc-users] Parallel processes run significantly slower
To: Steffen Wilksen | Universitaet Bremen <[email protected]>
Cc: PETSc users list <[email protected]>
Take a look at the discussion
in https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html and I suggest you run the streams benchmark from the branch barry/2023-09-15/fix-log-pcmpi on your machine to get a baseline for what kind of speedup you can expect.
Then let us know your thoughts.
Barry
On Jan 11, 2024, at 11:37 AM, Stefano Zampini
<[email protected]> wrote:
You are creating the matrix on the wrong communicator
if you want it parallel. You are using PETSc.COMM_SELF
On Thu, Jan 11, 2024, 19:28 Steffen Wilksen |
Universitaet Bremen <[email protected]> wrote:
_Hi all,
I'm trying to do repeated matrix-vector-multiplication of large
sparse matrices in python using petsc4py. Even the most simple
method of parallelization, dividing up the calculation to run on
multiple processes indenpendtly, does not seem to give a
singnificant speed up for large matrices. I constructed a minimal
working example, which I run using
mpiexec -n N python parallel_example.py,
where N is the number of processes. Instead of taking
approximately the same time irrespective of the number of
processes used, the calculation is much slower when starting more
MPI processes. This translates to little to no speed up when
splitting up a fixed number of calculations over N processes. As
an example, running with N=1 takes 9s, while running with N=4
takes 34s. When running with smaller matrices, the problem is not
as severe (only slower by a factor of 1.5 when setting
MATSIZE=1e+5 instead of MATSIZE=1e+6). I get the same problems
when just starting the script four times manually without using MPI.
I attached both the script and the log file for running the script
with N=4. Any help would be greatly appreciated. Calculations are
done on my laptop, arch linux version 6.6.8 and PETSc version
3.20.2.
Kind Regards
Steffen_
_----- End message from Barry Smith <[email protected]> -----_