Hi Junchao,
I tried it out, but unfortunately, this does not seem to give any
imporvements, the code is still much slower when starting more
processes.
----- Message from Junchao Zhang <[email protected]> ---------
Date: Fri, 12 Jan 2024 09:41:39 -0600
From: Junchao Zhang <[email protected]>
Subject: Re: [petsc-users] Parallel processes run significantly slower
To: Steffen Wilksen | Universitaet Bremen <[email protected]>
Cc: Barry Smith <[email protected]>, PETSc users list
<[email protected]>
Hi, Steffen, Would it be an MPI process binding
issue? Could you try running with
mpiexec --bind-to core -n N python parallel_example.py
--Junchao Zhang
On Fri, Jan 12, 2024 at 8:52 AM Steffen Wilksen | Universitaet
Bremen <[email protected]> wrote:
_Thank you for your feedback.
@Stefano: the use of my communicator was intentional, since I later
intend to distribute M independent calculations to N processes,
each process then only needing to do M/N calculations. Of course I
don't expect speed up in my example since the number of
calculations is constant and not dependent on N, but I would hope
that the time each process takes does not increase too drastically
with N.
@Barry: I tried to do the STREAMS benchmark, these are my results:
1 23467.9961 Rate (MB/s) 1
2 26852.0536 Rate (MB/s) 1.1442
3 29715.4762 Rate (MB/s) 1.26621
4 34132.2490 Rate (MB/s) 1.45442
5 34924.3020 Rate (MB/s) 1.48817
6 34315.5290 Rate (MB/s) 1.46223
7 33134.9545 Rate (MB/s) 1.41192
8 33234.9141 Rate (MB/s) 1.41618
9 32584.3349 Rate (MB/s) 1.38846
10 32582.3962 Rate (MB/s) 1.38838
11 32098.2903 Rate (MB/s) 1.36775
12 32064.8779 Rate (MB/s) 1.36632
13 31692.0541 Rate (MB/s) 1.35044
14 31274.2421 Rate (MB/s) 1.33263
15 31574.0196 Rate (MB/s) 1.34541
16 30906.7773 Rate (MB/s) 1.31698
I also attached the resulting plot. As it seems, I get very bad MPI
speedup (red curve, right?), even decreasing if I use too many
threads. I don't fully understand the reasons given in the
discussion you linked since this is all very new to me, but I take
that this is a problem with my computer which I can't easily fix,
right?
----- Message from Barry Smith <[email protected]> ---------
Date: Thu, 11 Jan 2024 11:56:24 -0500
From: Barry Smith <[email protected]>
Subject: Re: [petsc-users] Parallel processes run significantly slower
To: Steffen Wilksen | Universitaet Bremen <[email protected]>
Cc: PETSc users list <[email protected]>_
_ _
_ Take a look at the discussion
in https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html and I suggest you run the streams benchmark from the branch barry/2023-09-15/fix-log-pcmpi on your machine to get a baseline for what kind of speedup you can expect. _
_ _
_ Then let us know your thoughts._
_ _
_ Barry_
_On Jan 11, 2024, at 11:37 AM, Stefano Zampini
<[email protected]> wrote:_
_You are creating the matrix on the wrong
communicator if you want it parallel. You are using
PETSc.COMM_SELF_
_On Thu, Jan 11, 2024, 19:28 Steffen
Wilksen | Universitaet Bremen <[email protected]> wrote:_
__Hi all,
I'm trying to do repeated matrix-vector-multiplication of large
sparse matrices in python using petsc4py. Even the most simple
method of parallelization, dividing up the calculation to run on
multiple processes indenpendtly, does not seem to give a
singnificant speed up for large matrices. I constructed a
minimal working example, which I run using
mpiexec -n N python parallel_example.py,
where N is the number of processes. Instead of taking
approximately the same time irrespective of the number of
processes used, the calculation is much slower when starting
more MPI processes. This translates to little to no speed up
when splitting up a fixed number of calculations over N
processes. As an example, running with N=1 takes 9s, while
running with N=4 takes 34s. When running with smaller matrices,
the problem is not as severe (only slower by a factor of 1.5
when setting MATSIZE=1e+5 instead of MATSIZE=1e+6). I get the
same problems when just starting the script four times manually
without using MPI.
I attached both the script and the log file for running the
script with N=4. Any help would be greatly appreciated.
Calculations are done on my laptop, arch linux version 6.6.8 and
PETSc version 3.20.2.
Kind Regards
Steffen__
__----- End message from Barry Smith <[email protected]> -----__
_----- End message from Junchao Zhang <[email protected]> -----_