Hi,
I have used 3 KSP, 2 to solve momentum eqns and 1 for the multigrid. I
have used
call KSPSetOptionsPrefix(ksp,"mg_",ierr) for the multigrid.
I run with :
-log_summary -mg_ksp_view so as to single out the multigrid ksp, but I'm
not sure if it's really working...
Here's the output:
---------------------------------------------- PETSc Performance
Summary: ----------------------------------------------
./a.out on a petsc-3.2 named n12-50 with 4 processors, by wtay Wed Jun
6 21:57:33 2012
Using Petsc Development HG revision:
c76fb3cac2a4ad0dfc9436df80f678898c867e86 HG Date: Thu May 31 00:33:26
2012 -0500
Max Max/Min Avg Total
Time (sec): 1.064e+01 1.00000 1.064e+01
Objects: 2.700e+01 1.00000 2.700e+01
Flops: 4.756e+08 1.00811 4.744e+08 1.897e+09
Flops/sec: 4.468e+07 1.00811 4.457e+07 1.783e+08
MPI Messages: 4.080e+02 2.00000 3.060e+02 1.224e+03
MPI Message Lengths: 2.328e+06 2.00000 5.706e+03 6.984e+06
MPI Reductions: 8.750e+02 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length
N --> 2N flops
and VecAXPY() for complex vectors of length
N --> 8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages
--- -- Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts
%Total Avg %Total counts %Total
0: Main Stage: 1.0644e+01 100.0% 1.8975e+09 100.0% 1.224e+03
100.0% 5.706e+03 100.0% 8.740e+02 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on
interpreting output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush()
and PetscLogStagePop().
%T - percent time in this phase %f - percent flops in
this phase
%M - percent messages in this phase %L - percent message
lengths in this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time
over all processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec)
Flops --- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg
len Reduct %T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 202 1.0 5.5096e-01 1.0 1.38e+08 1.0 1.2e+03 5.7e+03
0.0e+00 5 29 99100 0 5 29 99100 0 998
MatSolve 252 1.0 6.9136e-01 1.1 1.71e+08 1.0 0.0e+00 0.0e+00
0.0e+00 6 36 0 0 0 6 36 0 0 0 986
MatLUFactorNum 50 1.0 4.6002e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
0.0e+00 4 15 0 0 0 4 15 0 0 0 634
MatILUFactorSym 1 1.0 9.5899e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 50 1.0 1.6270e-03 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+02 0 0 0 0 11 0 0 0 0 11 0
MatAssemblyEnd 50 1.0 1.0896e-01 1.0 0.00e+00 0.0 1.2e+01 1.4e+03
8.0e+00 1 0 1 0 1 1 0 1 0 1 0
MatGetRowIJ 1 1.0 2.8610e-06 1.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 7.2002e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 100 1.0 2.9130e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 50 1.0 2.0737e+00 1.0 4.76e+08 1.0 1.2e+03 5.7e+03
4.6e+02 19100 99100 52 19100 99100 53 915
VecDot 202 1.0 7.3588e-02 1.1 1.63e+07 1.0 0.0e+00 0.0e+00
2.0e+02 1 3 0 0 23 1 3 0 0 23 885
VecDotNorm2 101 1.0 3.9155e-02 1.7 1.63e+07 1.0 0.0e+00 0.0e+00
1.0e+02 0 3 0 0 12 0 3 0 0 12 1664
VecNorm 151 1.0 5.8769e-02 1.7 1.22e+07 1.0 0.0e+00 0.0e+00
1.5e+02 0 3 0 0 17 0 3 0 0 17 829
VecCopy 100 1.0 2.3459e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 403 1.0 5.9994e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPBYCZ 202 1.0 6.6376e-02 1.0 3.26e+07 1.0 0.0e+00 0.0e+00
0.0e+00 1 7 0 0 0 1 7 0 0 0 1963
VecWAXPY 202 1.0 6.9311e-02 1.0 1.63e+07 1.0 0.0e+00 0.0e+00
0.0e+00 1 3 0 0 0 1 3 0 0 0 940
VecAssemblyBegin 100 1.0 4.0355e-0214.1 0.00e+00 0.0 0.0e+00 0.0e+00
3.0e+02 0 0 0 0 34 0 0 0 0 34 0
VecAssemblyEnd 100 1.0 5.0378e-04 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 202 1.0 6.2275e-03 1.5 0.00e+00 0.0 1.2e+03 5.7e+03
0.0e+00 0 0 99100 0 0 0 99100 0 0
VecScatterEnd 202 1.0 2.0878e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 100 1.0 4.7225e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
5.0e+00 4 15 0 0 1 4 15 0 0 1 617
PCSetUpOnBlocks 50 1.0 4.7191e-01 1.0 7.31e+07 1.0 0.0e+00 0.0e+00
3.0e+00 4 15 0 0 0 4 15 0 0 0 618
PCApply 252 1.0 7.3425e-01 1.1 1.71e+08 1.0 0.0e+00 0.0e+00
0.0e+00 7 36 0 0 0 7 36 0 0 0 928
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 4 4 16900896 0
Krylov Solver 2 2 2168 0
Vector 12 12 2604080 0
Vector Scatter 1 1 1060 0
Index Set 5 5 167904 0
Preconditioner 2 2 1800 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.09673e-06
Average time for MPI_Barrier(): 4.00543e-06
Average time for zero size MPI_Send(): 1.22786e-05
#PETSc Option Table entries:
-log_summary
-mg_ksp_view
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Thu May 31 09:53:43 2012
Configure options: --with-mpi-dir=/opt/openmpi-1.5.3/
--with-blas-lapack-dir=/opt/intelcpro-11.1.059/mkl/lib/em64t/
--with-debugging=0 --download-hypre=1
--prefix=/home/wtay/Lib/petsc-3.2-dev_shared_rel --known-mpi-shared=1
--with-shared-libraries
-----------------------------------------
Libraries compiled on Thu May 31 09:53:43 2012 on hpc12
Machine characteristics:
Linux-2.6.32-220.2.1.el6.x86_64-x86_64-with-centos-6.2-Final
Using PETSc directory: /home/wtay/Codes/petsc-dev
Using PETSc arch: petsc-3.2-dev_shared_rel
-----------------------------------------
Using C compiler: /opt/openmpi-1.5.3/bin/mpicc -fPIC -wd1572
-Qoption,cpp,--extended_float_type -O3 ${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/openmpi-1.5.3/bin/mpif90 -fPIC -O3
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/include
-I/home/wtay/Codes/petsc-dev/include
-I/home/wtay/Codes/petsc-dev/include
-I/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/include
-I/opt/openmpi-1.5.3/include
-----------------------------------------
Using C linker: /opt/openmpi-1.5.3/bin/mpicc
Using Fortran linker: /opt/openmpi-1.5.3/bin/mpif90
Using libraries:
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib -lpetsc -lX11
-lpthread
-Wl,-rpath,/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib
-L/home/wtay/Codes/petsc-dev/petsc-3.2-dev_shared_rel/lib -lHYPRE
-lmpi_cxx -Wl,-rpath,/opt/openmpi-1.5.3/lib
-Wl,-rpath,/opt/intelcpro-11.1.059/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lstdc++
-Wl,-rpath,/opt/intelcpro-11.1.059/mkl/lib/em64t
-L/opt/intelcpro-11.1.059/mkl/lib/em64t -lmkl_intel_lp64
-lmkl_intel_thread -lmkl_core -liomp5 -lpthread -ldl
-L/opt/openmpi-1.5.3/lib -lmpi -lnsl -lutil
-L/opt/intelcpro-11.1.059/lib/intel64 -limf
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lsvml -lipgo -ldecimal -lgcc_s
-lirc -lpthread -lirc_s -lmpi_f90 -lmpi_f77 -lm -lm -lifport -lifcore
-lm -lm -lm -lmpi_cxx -lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lnsl
-lutil -limf -lsvml -lipgo -ldecimal -lgcc_s -lirc -lpthread -lirc_s -ldl
-----------------------------------------
Yours sincerely,
TAY wee-beng
On 5/6/2012 1:34 AM, Barry Smith wrote:
> Also run with -ksp_view to see exasctly what solver options it is using.
> For example the number of levels, smoother on each level etc. My guess is
> that the below is running on one level (because I don't see you supplying
> options to control the number of levels etc).
>
> Barry
>
> On Jun 4, 2012, at 4:15 PM, Jed Brown wrote:
>
>> Always send -log_summary when asking about performance.
>>
>> On Mon, Jun 4, 2012 at 4:11 PM, TAY wee-beng<zonexo at gmail.com> wrote:
>> Hi,
>>
>> I tried using PETSc multigrid on my 2D CFD code. I had converted ksp eg.
>> ex29 to Fortran and then added into my code to solve the Poisson equation.
>>
>> The main subroutines are:
>>
>> call KSPCreate(PETSC_COMM_WORLD,ksp,ierr)
>>
>> call
>> DMDACreate2d(PETSC_COMM_WORLD,DMDA_BOUNDARY_NONE,DMDA_BOUNDARY_NONE,DMDA_STENCIL_STAR,i3,i3,PETSC_DECIDE,PETSC_DECIDE,i1,i1,PETSC_NULL_INTEGER,PETSC_NULL_INTEGER,da,ierr)
>> call DMSetFunction(da,ComputeRHS,ierr)
>> call DMSetJacobian(da,ComputeMatrix,ierr)
>> call KSPSetDM(ksp,da,ierr)
>>
>> call KSPSetFromOptions(ksp,ierr)
>> call KSPSetUp(ksp,ierr)
>> call KSPSolve(ksp,PETSC_NULL_OBJECT,PETSC_NULL_OBJECT,ierr)
>> call KSPGetSolution(ksp,x,ierr)
>> call VecView(x,PETSC_VIEWER_STDOUT_WORLD,ierr)
>> call KSPDestroy(ksp,ierr)
>> call DMDestroy(da,ierr)
>> call PetscFinalize(ierr)
>>
>>
>> Since the LHS matrix doesn't change, I only set up at the 1st time step,
>> thereafter I only called ComputeRHS every time step.
>>
>> I was using HYPRE's geometric multigrid and the speed was much faster.
>>
>> What other options can I tweak to improve the speed? Or should I call the
>> subroutines above at every timestep?
>>
>> Thanks!
>>
>>
>> --
>> Yours sincerely,
>>
>> TAY wee-beng
>>
>>