[petsc-users] Slow speed when using PETSc multigrid

TAY wee-beng Wed, 06 Jun 2012 22:04:07 +0200

Hi,

I have used 3 KSP, 2 to solve momentum eqns and 1 for the multigrid. I 
have used


call KSPSetOptionsPrefix(ksp,"mg_",ierr) for the multigrid.

I run with :

-log_summary -mg_ksp_view so as to single out the multigrid ksp, but I'm 
not sure if it's really working...

Here's the output:

---------------------------------------------- PETSc Performance 
Summary: ----------------------------------------------

./a.out on a petsc-3.2 named n12-50 with 4 processors, by wtay Wed Jun  
6 21:57:33 2012
Using Petsc Development HG revision: 
c76fb3cac2a4ad0dfc9436df80f678898c867e86  HG Date: Thu May 31 00:33:26 
2012 -0500

                          Max       Max/Min        Avg      Total
Time (sec):           1.064e+01      1.00000   1.064e+01
Objects:              2.700e+01      1.00000   2.700e+01
Flops:                4.756e+08      1.00811   4.744e+08  1.897e+09
Flops/sec:            4.468e+07      1.00811   4.457e+07  1.783e+08
MPI Messages:         4.080e+02      2.00000   3.060e+02  1.224e+03
MPI Message Lengths:  2.328e+06      2.00000   5.706e+03  6.984e+06
MPI Reductions:       8.750e+02      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                             e.g., VecAXPY() for real vectors of length 
N --> 2N flops
                             and VecAXPY() for complex vectors of length 
N --> 8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages 
---  -- Message Lengths --  -- Reductions --
                         Avg     %Total     Avg     %Total   counts   
%Total     Avg         %Total   counts   %Total
  0:      Main Stage: 1.0644e+01 100.0%  1.8975e+09 100.0%  1.224e+03 
100.0%  5.706e+03      100.0%  8.740e+02  99.9%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on 
interpreting output.
Phase summary info:
    Count: number of times phase was executed
    Time and Flops: Max - maximum over all processors
                    Ratio - ratio of maximum to minimum over all processors
    Mess: number of messages sent
    Avg. len: average message length
    Reduct: number of global reductions
    Global: entire computation
    Stage: stages of a computation. Set stages with PetscLogStagePush() 
and PetscLogStagePop().
       %T - percent time in this phase         %f - percent flops in 
this phase
       %M - percent messages in this phase     %L - percent message 
lengths in this phase
       %R - percent reductions in this phase
    Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time 
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     
Flops                             --- Global ---  --- Stage ---   Total
                    Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg 
len Reduct  %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage

MatMult              202 1.0 5.5096e-01 1.0 1.38e+08 1.0 1.2e+03 5.7e+03 
0.0e+00  5 29 99100  0   5 29 99100  0   998
MatSolve             252 1.0 6.9136e-01 1.1 1.71e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  6 36  0  0  0   6 36  0  0  0   986
MatLUFactorNum        50 1.0 4.6002e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  4 15  0  0  0   4 15  0  0  0   634
MatILUFactorSym        1 1.0 9.5899e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin      50 1.0 1.6270e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+02  0  0  0  0 11   0  0  0  0 11     0
MatAssemblyEnd        50 1.0 1.0896e-01 1.0 0.00e+00 0.0 1.2e+01 1.4e+03 
8.0e+00  1  0  1  0  1   1  0  1  0  1     0
MatGetRowIJ            1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetOrdering         1 1.0 7.2002e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSetUp             100 1.0 2.9130e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPSolve              50 1.0 2.0737e+00 1.0 4.76e+08 1.0 1.2e+03 5.7e+03 
4.6e+02 19100 99100 52  19100 99100 53   915
VecDot               202 1.0 7.3588e-02 1.1 1.63e+07 1.0 0.0e+00 0.0e+00 
2.0e+02  1  3  0  0 23   1  3  0  0 23   885
VecDotNorm2          101 1.0 3.9155e-02 1.7 1.63e+07 1.0 0.0e+00 0.0e+00 
1.0e+02  0  3  0  0 12   0  3  0  0 12  1664
VecNorm              151 1.0 5.8769e-02 1.7 1.22e+07 1.0 0.0e+00 0.0e+00 
1.5e+02  0  3  0  0 17   0  3  0  0 17   829
VecCopy              100 1.0 2.3459e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet               403 1.0 5.9994e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPBYCZ           202 1.0 6.6376e-02 1.0 3.26e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  1  7  0  0  0   1  7  0  0  0  1963
VecWAXPY             202 1.0 6.9311e-02 1.0 1.63e+07 1.0 0.0e+00 0.0e+00 
0.0e+00  1  3  0  0  0   1  3  0  0  0   940
VecAssemblyBegin     100 1.0 4.0355e-0214.1 0.00e+00 0.0 0.0e+00 0.0e+00 
3.0e+02  0  0  0  0 34   0  0  0  0 34     0
VecAssemblyEnd       100 1.0 5.0378e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecScatterBegin      202 1.0 6.2275e-03 1.5 0.00e+00 0.0 1.2e+03 5.7e+03 
0.0e+00  0  0 99100  0   0  0 99100  0     0
VecScatterEnd        202 1.0 2.0878e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
PCSetUp              100 1.0 4.7225e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00 
5.0e+00  4 15  0  0  1   4 15  0  0  1   617
PCSetUpOnBlocks       50 1.0 4.7191e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00 
3.0e+00  4 15  0  0  0   4 15  0  0  0   618
PCApply              252 1.0 7.3425e-01 1.1 1.71e+08 1.0 0.0e+00 0.0e+00 
0.0e+00  7 36  0  0  0   7 36  0  0  0   928
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

               Matrix     4              4     16900896     0
        Krylov Solver     2              2         2168     0
               Vector    12             12      2604080     0
       Vector Scatter     1              1         1060     0
            Index Set     5              5       167904     0
       Preconditioner     2              2         1800     0
               Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 4.00543e-06
Average time for zero size MPI_Send(): 1.22786e-05
#PETSc Option Table entries:

-log_summary
-mg_ksp_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Thu May 31 09:53:43 2012
Configure options: --with-mpi-dir=/opt/openmpi-1.5.3/ 
--with-blas-lapack-dir=/opt/intelcpro-11.1.059/mkl/lib/em64t/ 
--with-debugging=0 --download-hypre=1 
--prefix=/home/wtay/Lib/petsc-3.2-dev_shared_rel --known-mpi-shared=1 
--with-shared-libraries
-----------------------------------------
Libraries compiled on Thu May 31 09:53:43 2012 on hpc12
Machine characteristics: 
Linux-2.6.32-220.2.1.el6.x86_64-x86_64-with-centos-6.2-Final
Using PETSc directory: /home/wtay/Codes/petsc-dev
Using PETSc arch: petsc-3.2-dev_shared_rel
-----------------------------------------

Using C compiler: /opt/openmpi-1.5.3/bin/mpicc  -fPIC -wd1572 
-Qoption,cpp,--extended_float_type -O3  ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/openmpi-1.5.3/bin/mpif90  -fPIC -O3   
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------

Using include paths: 
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/include 
-I/home/wtay/Codes/petsc-dev/include 
-I/home/wtay/Codes/petsc-dev/include 
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/include 
-I/opt/openmpi-1.5.3/include
-----------------------------------------

Using C linker: /opt/openmpi-1.5.3/bin/mpicc
Using Fortran linker: /opt/openmpi-1.5.3/bin/mpif90
Using libraries: 
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib 
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib -lpetsc -lX11 
-lpthread 
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib 
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib -lHYPRE 
-lmpi_cxx -Wl,-rpath,/opt/openmpi-1.5.3/lib 
-Wl,-rpath,/opt/intelcpro-11.1.059/lib/intel64 
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lstdc++ 
-Wl,-rpath,/opt/intelcpro-11.1.059/mkl/lib/em64t 
-L/opt/intelcpro-11.1.059/mkl/lib/em64t -lmkl_intel_lp64 
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl 
-L/opt/openmpi-1.5.3/lib -lmpi -lnsl -lutil 
-L/opt/intelcpro-11.1.059/lib/intel64 -limf 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lsvml -lipgo -ldecimal -lgcc_s 
-lirc -lpthread -lirc_s -lmpi_f90 -lmpi_f77 -lm -lm -lifport -lifcore 
-lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lnsl 
-lutil -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lpthread -lirc_s -ldl
-----------------------------------------

Yours sincerely,

TAY wee-beng


On 5/6/2012 1:34 AM, Barry Smith wrote:
>     Also run with -ksp_view to see exasctly what solver options it is using. 
> For example the number of levels, smoother on each level etc. My guess is 
> that the below is running on one level (because I don't see you supplying 
> options to control the number of levels etc).
>
>     Barry
>
> On Jun 4, 2012, at 4:15 PM, Jed Brown wrote:
>
>> Always send -log_summary when asking about performance.
>>
>> On Mon, Jun 4, 2012 at 4:11 PM, TAY wee-beng<zonexo at gmail.com>  wrote:
>> Hi,
>>
>> I tried using PETSc multigrid on my 2D CFD code. I had converted ksp eg. 
>> ex29 to Fortran and then added into my code to solve the Poisson equation.
>>
>> The main subroutines are:
>>
>> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>
>> call 
>> DMDACreate2d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,i3,i3,PETSC_DECIDE,PETSC_DECIDE,i1,i1,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da,ierr)
>> call DMSetFunction(da,ComputeRHS,ierr)
>> call DMSetJacobian(da,ComputeMatrix,ierr)
>> call KSPSetDM(ksp,da,ierr)
>>
>> call KSPSetFromOptions(ksp,ierr)
>> call KSPSetUp(ksp,ierr)
>> call KSPSolve(ksp,PETSC_NULL_OBJECT,PETSC_NULL_OBJECT,ierr)
>> call KSPGetSolution(ksp,x,ierr)
>> call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> call KSPDestroy(ksp,ierr)
>> call DMDestroy(da,ierr)
>> call PetscFinalize(ierr)
>>
>>
>> Since the LHS matrix doesn't change, I only set up at the 1st time step, 
>> thereafter I only called ComputeRHS every time step.
>>
>> I was using HYPRE's geometric multigrid and the speed was much faster.
>>
>> What other options can I tweak to improve the speed? Or should I call the 
>> subroutines above at every timestep?
>>
>> Thanks!
>>
>>
>> -- 
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>>

[petsc-users] Slow speed when using PETSc multigrid

Reply via email to