Re: [petsc-users] [petsc-maint] Speedup problem when using OpenMP?

Karl Rupp Tue, 05 Nov 2013 01:22:51 -0800

Hi Danyang,

> This does not make any difference. I have scaled up the matrix but the

performance does not change. If I run with OpenMP, the iteration number
is always the same whatever how many processors are used. This seems
quite strange as the iteration number usually increase as the number of
processors increased when run with MPI. I think I should move to the
ubuntu system to make further test, to see if this is a windows problem.


OpenMP and MPI are two different parallelization approaches:

- With MPI, we split up the system matrix into different strips, whereeach of the strips is assigned to one MPI process. This then leads(among others) to block-Jacobi preconditioner techniques, where youusually see an increase in iteration counts. In the ex2 case, however,this even leads to a reduction of iteration counts.

- With OpenMP, the system matrix is contiguous in memory, so one stillcomputes preconditioners for the full matrix (as is for example the casewith ILU). Thus, the use of OpenMP is transparent with respect to thealgorithms employed, so you don't see any change in iteration counts.The typical vector operations like VecScale() (should) make use ofOpenMP, but apparently this is not the case. I'm double-checking on mymachine (Linux Mint Maya, based on Ubuntu 12.04 LTS) and let you know.


Best regards,
Karli

On 04/11/2013 6:51 AM, Karl Rupp wrote:

Hi,

> I have a question on the speedup of PETSc when using OpenMP. I can get

good speedup when using MPI, but no speedup when using OpenMP.
The example is ex2f with m=100 and n=100. The number of available
processors is 16 (32 threads) and the OS is Windows Server 2012. The log
files for 4 and 8 processors are attached.

The commands I used to run with 4 processors are as follows:
Run using MPI
mpiexec -n 4 Petsc-windows-ex2f.exe -m 100 -n 100 -log_summary
log_100x100_mpi_p4.log

Run using OpenMP
Petsc-windows-ex2f.exe -threadcomm_type openmp -threadcomm_nthreads 4 -m
100 -n 100 -log_summary log_100x100_openmp_p4.log

The PETSc used for this test is PETSc for Windows
http://www.mic-tc.ch/downloads/PETScForWindows.zip, but I guess this is
not the problem because the same problem exists when I use PETSc-dev in
Cygwin. I don't know if this problem exists in Linux, would anybody help
to test?


For the 100x100 case considered, the execution times per call are
somewhere in the millisecond to sub-millisecond range (e.g. 1.3ms for
68 calls to VecScale with 4 processors). I'd say this is too small in
order to see any reasonable performance gain when running multiple
threads, consider problem sizes of about 1000x1000 instead.

Moreover, keep in mind that typically you won't get a perfectly linear
scaling with the number of processor cores, because ultimately the
memory bandwidth is the limiting factor for standard vector operations.

Best regards,
Karli

Re: [petsc-users] [petsc-maint] Speedup problem when using OpenMP?

Reply via email to