Hi,

> I have a question on the speedup of PETSc when using OpenMP. I can get
good speedup when using MPI, but no speedup when using OpenMP.
The example is ex2f with m=100 and n=100. The number of available
processors is 16 (32 threads) and the OS is Windows Server 2012. The log
files for 4 and 8 processors are attached.

The commands I used to run with 4 processors are as follows:
Run using MPI
mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary
log_100x100_mpi_p4.log

Run using OpenMP
Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 -m
100 -n 100 -log_summary log_100x100_openmp_p4.log

The PETSc used for this test is PETSc for Windows
http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this is
not the problem because the same problem exists when I use PETSc-dev in
Cygwin. I don't know if this problem exists in Linux, would anybody help
to test?

For the 100x100 case considered, the execution times per call are somewhere in the millisecond to sub-millisecond range (e.g. 1.3ms for 68 calls to VecScale with 4 processors). I'd say this is too small in order to see any reasonable performance gain when running multiple threads, consider problem sizes of about 1000x1000 instead.

Moreover, keep in mind that typically you won't get a perfectly linear scaling with the number of processor cores, because ultimately the memory bandwidth is the limiting factor for standard vector operations.

Best regards,
Karli

Reply via email to