Re: [petsc-users] Parallel processes run significantly slower

Steffen Wilksen | Universitaet Bremen Fri, 12 Jan 2024 09:14:06 -0800

 Hi Junchao,

I tried it out, but unfortunately, this does not seem to give anyimporvements, the code is still much slower when starting moreprocesses.


----- Message from Junchao Zhang <[email protected]> ---------
   Date: Fri, 12 Jan 2024 09:41:39 -0600
   From: Junchao Zhang <[email protected]>
Subject: Re: [petsc-users] Parallel processes run significantly slower
     To: Steffen Wilksen | Universitaet Bremen <[email protected]>

Cc: Barry Smith <[email protected]>, PETSc users list<[email protected]>

Hi, Steffen, Would it be an MPI process bindingissue? Could you try running with
mpiexec --bind-to core -n N python parallel_example.py
               --Junchao Zhang
On Fri, Jan 12, 2024 at 8:52 AM Steffen Wilksen | UniversitaetBremen <[email protected]> wrote:
_Thank you for your feedback.
@Stefano: the use of my communicator was intentional, since I laterintend to distribute M independent calculations to N processes,each process then only needing to do M/N calculations. Of course Idon't expect speed up in my example since the number ofcalculations is constant and not dependent on N, but I would hopethat the time each process takes does not increase too drasticallywith N.
@Barry: I tried to do the STREAMS benchmark, these are my results:
1  23467.9961   Rate (MB/s) 1
2  26852.0536   Rate (MB/s) 1.1442
3  29715.4762   Rate (MB/s) 1.26621
4  34132.2490   Rate (MB/s) 1.45442
5  34924.3020   Rate (MB/s) 1.48817
6  34315.5290   Rate (MB/s) 1.46223
7  33134.9545   Rate (MB/s) 1.41192
8  33234.9141   Rate (MB/s) 1.41618
9  32584.3349   Rate (MB/s) 1.38846
10  32582.3962   Rate (MB/s) 1.38838
11  32098.2903   Rate (MB/s) 1.36775
12  32064.8779   Rate (MB/s) 1.36632
13  31692.0541   Rate (MB/s) 1.35044
14  31274.2421   Rate (MB/s) 1.33263
15  31574.0196   Rate (MB/s) 1.34541
16  30906.7773   Rate (MB/s) 1.31698
I also attached the resulting plot. As it seems, I get very bad MPIspeedup (red curve, right?), even decreasing if I use too manythreads. I don't fully understand the reasons given in thediscussion you linked since this is all very new to me, but I takethat this is a problem with my computer which I can't easily fix,right?
----- Message from Barry Smith <[email protected]> ---------
   Date: Thu, 11 Jan 2024 11:56:24 -0500
   From: Barry Smith <[email protected]>
Subject: Re: [petsc-users] Parallel processes run significantly slower
     To: Steffen Wilksen | Universitaet Bremen <[email protected]>
     Cc: PETSc users list <[email protected]>_
_ _
_ Take a look at the discussionin https://petsc.gitlab.io/-/petsc/-/jobs/5814862879/artifacts/public/html/manual/streams.html and I suggest you run the streams benchmark from the branch barry/2023-09-15/fix-log-pcmpi on your machine to get a baseline for what kind of speedup you can expect. __ _
      _    Then let us know your thoughts._
      _ _
      _   Barry_
_On Jan 11, 2024, at 11:37 AM, Stefano Zampini<[email protected]> wrote:_
_You are creating the matrix on the wrongcommunicator if you want it parallel. You are usingPETSc.COMM_SELF_
_On Thu, Jan 11, 2024, 19:28 SteffenWilksen | Universitaet Bremen <[email protected]> wrote:_
__Hi all,
I'm trying to do repeated matrix-vector-multiplication of largesparse matrices in python using petsc4py. Even the most simplemethod of parallelization, dividing up the calculation to run onmultiple processes indenpendtly, does not seem to give asingnificant speed up for large matrices. I constructed aminimal working example, which I run using
mpiexec -n N python parallel_example.py,
where N is the number of processes. Instead of takingapproximately the same time irrespective of the number ofprocesses used, the calculation is much slower when startingmore MPI processes. This translates to little to no speed upwhen splitting up a fixed number of calculations over Nprocesses. As an example, running with N=1 takes 9s, whilerunning with N=4 takes 34s. When running with smaller matrices,the problem is not as severe (only slower by a factor of 1.5when setting MATSIZE=1e+5 instead of MATSIZE=1e+6). I get thesame problems when just starting the script four times manuallywithout using MPI.I attached both the script and the log file for running thescript with N=4. Any help would be greatly appreciated.Calculations are done on my laptop, arch linux version 6.6.8 andPETSc version 3.20.2.
Kind Regards
Steffen__
__----- End message from Barry Smith <[email protected]> -----__

 


_----- End message from Junchao Zhang <[email protected]> -----_

Re: [petsc-users] Parallel processes run significantly slower

Reply via email to