Hi all,
I'm trying to do repeated matrix-vector-multiplication of large sparse
matrices in python using petsc4py. Even the most simple method of
parallelization, dividing up the calculation to run on multiple
processes indenpendtly, does not seem to give a singnificant speed up
for large matrices. I constructed a minimal working example, which I
run using
mpiexec -n N python parallel_example.py,
where N is the number of processes. Instead of taking approximately
the same time irrespective of the number of processes used, the
calculation is much slower when starting more MPI processes. This
translates to little to no speed up when splitting up a fixed number
of calculations over N processes. As an example, running with N=1
takes 9s, while running with N=4 takes 34s. When running with smaller
matrices, the problem is not as severe (only slower by a factor of 1.5
when setting MATSIZE=1e+5 instead of MATSIZE=1e+6). I get the same
problems when just starting the script four times manually without
using MPI.
I attached both the script and the log file for running the script
with N=4. Any help would be greatly appreciated. Calculations are done
on my laptop, arch linux version 6.6.8 and PETSc version 3.20.2.
Kind Regards
Steffen
import sys
import time
import petsc4py
petsc4py.init(sys.argv)
from petsc4py import PETSc
MATSIZE = 1e+6
mat = PETSc.Mat().createAIJ((MATSIZE, MATSIZE), comm=PETSc.COMM_SELF)
mat.setPreallocationNNZ(50)
mat.setRandom()
mat.assemble()
x, b = mat.createVecs()
x.setRandom()
time_start = time.time()
for _ in range(100):
mat.mult(x, b)
if PETSc.COMM_WORLD.rank == 0:
print(f"{time.time() - time_start:.2f}s")