Hi, Steffen, Would it be an MPI process binding issue? Could you try running with
mpiexec --bind-to core -n N python parallel_example.py --Junchao Zhang On Fri, Jan 12, 2024 at 8:52 AM Steffen Wilksen | Universitaet Bremen < swilk...@itp.uni-bremen.de> wrote: > Thank you for your feedback. > @Stefano: the use of my communicator was intentional, since I later intend > to distribute M independent calculations to N processes, each process then > only needing to do M/N calculations. Of course I don't expect speed up in > my example since the number of calculations is constant and not dependent > on N, but I would hope that the time each process takes does not increase > too drastically with N. > @Barry: I tried to do the STREAMS benchmark, these are my results: > 1 23467.9961 Rate (MB/s) 1 > 2 26852.0536 Rate (MB/s) 1.1442 > 3 29715.4762 Rate (MB/s) 1.26621 > 4 34132.2490 Rate (MB/s) 1.45442 > 5 34924.3020 Rate (MB/s) 1.48817 > 6 34315.5290 Rate (MB/s) 1.46223 > 7 33134.9545 Rate (MB/s) 1.41192 > 8 33234.9141 Rate (MB/s) 1.41618 > 9 32584.3349 Rate (MB/s) 1.38846 > 10 32582.3962 Rate (MB/s) 1.38838 > 11 32098.2903 Rate (MB/s) 1.36775 > 12 32064.8779 Rate (MB/s) 1.36632 > 13 31692.0541 Rate (MB/s) 1.35044 > 14 31274.2421 Rate (MB/s) 1.33263 > 15 31574.0196 Rate (MB/s) 1.34541 > 16 30906.7773 Rate (MB/s) 1.31698 > > I also attached the resulting plot. As it seems, I get very bad MPI > speedup (red curve, right?), even decreasing if I use too many threads. I > don't fully understand the reasons given in the discussion you linked since > this is all very new to me, but I take that this is a problem with my > computer which I can't easily fix, right? > > > ----- Message from Barry Smith <bsm...@petsc.dev> --------- > Date: Thu, 11 Jan 2024 11:56:24 -0500 > From: Barry Smith <bsm...@petsc.dev> > Subject: Re: [petsc-users] Parallel processes run significantly slower > To: Steffen Wilksen | Universitaet Bremen <swilk...@itp.uni-bremen.de > > > Cc: PETSc users list <petsc-users@mcs.anl.gov> > > > Take a look at the discussion in > https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html > and > I suggest you run the streams benchmark from the branch > barry/2023-09-15/fix-log-pcmpi > on your machine to get a baseline for what kind of speedup you can expect. > > Then let us know your thoughts. > > Barry > > > > On Jan 11, 2024, at 11:37 AM, Stefano Zampini <stefano.zamp...@gmail.com> > wrote: > > You are creating the matrix on the wrong communicator if you want it > parallel. You are using PETSc.COMM_SELF > > On Thu, Jan 11, 2024, 19:28 Steffen Wilksen | Universitaet Bremen < > swilk...@itp.uni-bremen.de> wrote: > >> >> >> >> >> >> >> >> >> >> >> *Hi all, I'm trying to do repeated matrix-vector-multiplication of large >> sparse matrices in python using petsc4py. Even the most simple method of >> parallelization, dividing up the calculation to run on multiple processes >> indenpendtly, does not seem to give a singnificant speed up for large >> matrices. I constructed a minimal working example, which I run using >> mpiexec -n N python parallel_example.py, where N is the number of >> processes. Instead of taking approximately the same time irrespective of >> the number of processes used, the calculation is much slower when starting >> more MPI processes. This translates to little to no speed up when splitting >> up a fixed number of calculations over N processes. As an example, running >> with N=1 takes 9s, while running with N=4 takes 34s. When running with >> smaller matrices, the problem is not as severe (only slower by a factor of >> 1.5 when setting MATSIZE=1e+5 instead of MATSIZE=1e+6). I get the same >> problems when just starting the script four times manually without using >> MPI. I attached both the script and the log file for running the script >> with N=4. Any help would be greatly appreciated. Calculations are done on >> my laptop, arch linux version 6.6.8 and PETSc version 3.20.2. Kind Regards >> Steffen* >> > > > > *----- End message from Barry Smith <bsm...@petsc.dev <bsm...@petsc.dev>> > -----* > > >