Date: Tue, 12 Nov 2013 14:22:35 -0600
Subject: Re: [petsc-users] approaches to reduce computing time
From: [email protected]
To: [email protected]
CC: [email protected]; [email protected]
On Tue, Nov 12, 2013 at 2:14 PM, Roc Wang <[email protected]> wrote:
Thanks Jed,
I have questions about load balance and PC type below.
> From: [email protected]
> To: [email protected]; [email protected]
> Subject: Re: [petsc-users] approaches to reduce computing time
> Date: Sun, 10 Nov 2013 12:20:18 -0700
>
> Roc Wang <[email protected]> writes:
>
> > Hi all,
> >
> > I am trying to minimize the computing time to solve a large sparse
> > matrix. The matrix dimension is with m=321 n=321 and p=321. I am trying to
> > reduce the computing time from two directions: 1 finding a Pre-conditioner
> > to reduce the number of iterations which reduces the time numerically, 2
> > requesting more cores.
> >
> > ----For the first method, I tried several methods:
> > 1 default KSP and PC,
> > 2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp -ksp_pc_type
> > jacobi,
> > 3 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10,
> > 4 -ksp_type lgmres -ksp_gmres_restart 50 -ksp_lgmres_augment 10,
> > 5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type
> > asm (PCASM)
> >
> > The iterations and timing is like the following with 128 cores requested:
> > case# iter timing (s)
> > 1 1436 816
> > 2 3 12658
> > 3 1069 669.64
> > 4 872 768.12
> > 5 927 513.14
> >
> > It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment can
> > help to reduce the iterations but not the timing (comparing case 3 and 4).
> > Second, the PCASM helps a lot. Although the second option is able to
> > reduce iterations, the timing increases very much. Is it because more
> > operations are needed in the PC?
> >
> > My questions here are: 1. Which direction should I take to select
> > -ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger
> > restart with large augment is better or larger restart with smaller
> > augment is better?
>
> Look at the -log_summary. By increasing the restart, the work in
> KSPGMRESOrthog will increase linearly, but the number of iterations
> might decrease enough to compensate. There is no general rule here
> since it depends on the relative expense of operations for your problem
> on your machine.
>
> > ----For the second method, I tried with -ksp_type lgmres -ksp_gmres_restart
> > 40 -ksp_lgmres_augment 10 -pc_type asm with different number of cores. I
> > found the speedup ratio increases slowly when more than 32 to 64 cores are
> > requested. I searched the milling list archives and found that I am very
> > likely running into the memory bandwidth bottleneck.
> > http://www.mail-archive.com/[email protected]/msg19152.html:
> >
> > # of cores iter timing
> > 1 923 19541.83
> > 4 929 5897.06
> > 8 932 4854.72
> > 16 924 1494.33
> > 32 924 1480.88
> > 64 928 686.89
> > 128 927 627.33
> > 256 926 552.93
>
> The bandwidth issue has more to do with using multiple cores within a
> node rather than between nodes. Likely the above is a load balancing
> problem or bad communication.
I use DM to manage the distributed data. The DM was created by calling
DMDACreate3d() and let PETSc decide the local number of nodes in each
direction. To my understand the load of each core is determined at this stage.
If the load balance is done when DMDACreate3d() is called and use PETSC_DECIDE
option? Or how should make the load balanced after DM is created?
We do not have a way to do fine-grained load balancing for the DMDA since it is
intended for very simple topologies. You can seeif it is load imbalance from
the division by running with a cube that is evenly divisible with a cube number
of processes.
Matt
So, I have nothing to do to make the load balanced if I use DMDA? Would you
please take a look at the attached log summary files and give me some
suggestions on how to improve the speedup ratio? Thanks.
>
> > My question here is: Is there any other PC can help on both reducing
> > iterations and increasing scalability? Thanks.
>
> Always send -log_summary with questions like this, but algebraic multigrid is
> a good place to start.
Please take a look at the attached log file, they are for 128 cores and 256
cores, respectively. Based on the log files, what should be done to increase
the scalability? Thanks.
--
What most experimenters take for granted before they begin their experiments is
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./x.r on a arch-linux2-c-opt named node11.cocoa5 with 128 processors, by pzw2
Sun Nov 10 11:31:45 2013
Using Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013
Max Max/Min Avg Total
Time (sec): 2.507e+02 1.00188 2.504e+02
Objects: 1.370e+02 1.00000 1.370e+02
Flops: 2.651e+10 1.05073 2.556e+10 3.272e+12
Flops/sec: 1.059e+08 1.05109 1.021e+08 1.307e+10
MPI Messages: 9.515e+03 1.96510 7.573e+03 9.693e+05
MPI Message Lengths: 1.821e+09 8.81374 4.244e+04 4.114e+10
MPI Reductions: 2.143e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 2.5038e+02 100.0% 3.2720e+12 100.0% 9.693e+05 100.0%
4.244e+04 100.0% 2.142e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 8 1.0 1.9149e+01 1.0 0.00e+00 0.0 2.5e+04 3.1e+05
1.6e+01 8 0 3 19 1 8 0 3 19 1 0
VecMDot 927 1.0 9.2982e+01 1.3 9.63e+09 1.1 0.0e+00 0.0e+00
9.3e+02 34 36 0 0 43 34 36 0 0 43 12733
VecNorm 976 1.0 3.6295e+01 1.6 5.25e+08 1.1 0.0e+00 0.0e+00
9.8e+02 12 2 0 0 46 12 2 0 0 46 1779
VecScale 1000 1.0 1.5538e+00 2.7 2.69e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 21287
VecCopy 283 1.0 9.1090e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1587 1.0 2.6465e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAXPY 234 1.0 1.0124e+00 3.4 1.26e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 15290
VecMAXPY 976 1.0 4.6520e+01 1.4 1.05e+10 1.1 0.0e+00 0.0e+00
0.0e+00 16 40 0 0 0 16 40 0 0 0 27827
VecScatterBegin 2308 1.0 6.4895e+00 2.2 0.00e+00 0.0 9.5e+05 3.8e+04
0.0e+00 2 0 98 87 0 2 0 98 87 0 0
VecScatterEnd 2308 1.0 2.2144e+01 4.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 5 0 0 0 0 5 0 0 0 0 0
MatMult 766 1.0 3.0886e+01 1.7 2.66e+09 1.1 4.7e+05 3.5e+04
0.0e+00 9 10 48 40 0 9 10 48 40 0 10633
MatSolve 767 1.0 2.1649e+01 1.8 2.77e+09 1.1 0.0e+00 0.0e+00
0.0e+00 7 11 0 0 0 7 11 0 0 0 16223
MatLUFactorNum 1 1.0 1.1517e-01 2.5 6.05e+06 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 6599
MatILUFactorSym 1 1.0 1.0871e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 3 1.0 5.6961e-01 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 2.4171e-01 1.3 0.00e+00 0.0 1.2e+03 8.8e+03
8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 1.0014e-05 5.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 1 1.0 1.2215e+00 2.0 0.00e+00 0.0 3.0e+03 8.1e+04
7.0e+00 0 0 0 1 0 0 0 0 1 0 0
MatGetOrdering 1 1.0 1.4469e-02 4.2 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 1 1.0 4.1109e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 3.0 2.9133e-02153.3 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 927 1.0 1.2676e+02 1.1 1.93e+10 1.1 0.0e+00 0.0e+00
9.3e+02 48 72 0 0 43 48 72 0 0 43 18681
KSPSetUp 2 1.0 9.2526e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.2e+01 0 0 0 0 2 0 0 0 0 2 0
KSPSolve 1 1.0 2.1155e+02 1.0 2.65e+10 1.1 9.4e+05 3.5e+04
2.1e+03 84100 97 80 96 84100 97 80 96 15467
PCSetUp 2 1.0 1.8918e+00 1.5 6.05e+06 1.1 4.3e+03 6.0e+04
2.3e+01 1 0 0 1 1 1 0 0 1 1 402
PCSetUpOnBlocks 1 1.0 2.2074e-01 2.7 6.05e+06 1.1 0.0e+00 0.0e+00
3.0e+00 0 0 0 0 0 0 0 0 0 0 3443
PCApply 767 1.0 3.5912e+01 1.6 2.77e+09 1.1 4.7e+05 3.5e+04
0.0e+00 12 11 48 40 0 12 11 48 40 0 9779
Generate Vector 1 1.0 3.4500e+01 1.8 0.00e+00 0.0 2.5e+04 3.1e+05
2.4e+01 8 0 3 19 1 8 0 3 19 1 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 86 86 176910552 0
Vector Scatter 8 8 8480 0
Matrix 5 5 86898828 0
Distributed Mesh 2 2 6787792 0
Bipartite Graph 4 4 2800 0
Index Set 22 22 4585380 0
IS L to G Mapping 3 3 4520244 0
Krylov Solver 2 2 31856 0
Preconditioner 2 2 1840 0
Viewer 3 2 1448 0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 0.0152486
Average time for zero size MPI_Send(): 0.00012082
#PETSc Option Table entries:
-ksp_gmres_restart 40
-ksp_lgmres_augment 10
-ksp_type lgmres
-ksp_view
-log_summary
-my_ksp_monitor true
-pc_type asm
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Nov 9 12:01:53 2013
Configure options: --download-f-blas-lapack
--with-mpi-dir=/usr/local/OpenMPI-1.6.4_Intel --download-hypre=1
--download-hdf5=1 --download-superlu_dist --download-parmetis -download-metis
--with-debugging=no
-----------------------------------------
Libraries compiled on Sat Nov 9 12:01:53 2013 on cocoa5.aero.psu.edu
Machine characteristics:
Linux-2.6.32-279.5.1.el6.x86_64-x86_64-with-centos-6.3-Final
Using PETSc directory: /home/pzw2/ZSoft/petsc-3.3-p6
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS}
${CFLAGS}
Using Fortran compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90 -O3
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include
-I/home/pzw2/ZSoft/petsc-3.3-p6/include -I/home/pzw2/ZSoft/petsc-3.3-p6/include
-I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include
-I/usr/local/OpenMPI-1.6.4_Intel/include
-I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi/opal/mca/hwloc/hwloc132/hwloc/include
-I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi
-----------------------------------------
Using C linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc
Using Fortran linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90
Using libraries: -Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lpetsc -lX11
-Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lsuperlu_dist_3.1
-lparmetis -lmetis -lpthread -lHYPRE -L/usr/local/OpenMPI-1.6.4_Intel/lib
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpi_cxx -lstdc++ -lflapack -lfblas
-lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -lz -lmpi_f90 -lmpi_f77 -lm
-lm -L/opt/intel/composer_xe_2011_sp1.10.319/compiler/lib/intel64 -limf -lm -lm
-lifport -lifcore -lsvml -lm -lipgo -lirc -lirc_s -lm -lm -lm -lmpi_cxx
-lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lrt -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./x.r on a arch-linux2-c-opt named node31.cocoa5 with 256 processors, by pzw2
Sun Nov 10 11:30:07 2013
Using Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013
Max Max/Min Avg Total
Time (sec): 1.724e+02 1.00082 1.724e+02
Objects: 1.370e+02 1.00000 1.370e+02
Flops: 1.341e+10 1.06368 1.279e+10 3.275e+12
Flops/sec: 7.782e+07 1.06388 7.422e+07 1.900e+10
MPI Messages: 9.695e+03 1.92820 8.145e+03 2.085e+06
MPI Message Lengths: 1.770e+09 14.32685 2.471e+04 5.151e+10
MPI Reductions: 2.141e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 1.7236e+02 100.0% 3.2749e+12 100.0% 2.085e+06 100.0%
2.471e+04 100.0% 2.140e+03 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %f - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %f %M %L %R %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecView 8 1.0 2.0219e+01 1.0 0.00e+00 0.0 9.8e+04 8.0e+04
1.6e+01 12 0 5 15 1 12 0 5 15 1 0
VecMDot 926 1.0 6.7355e+01 1.3 4.87e+09 1.1 0.0e+00 0.0e+00
9.3e+02 35 36 0 0 43 35 36 0 0 43 17557
VecNorm 975 1.0 2.8405e+01 1.6 2.66e+08 1.1 0.0e+00 0.0e+00
9.8e+02 14 2 0 0 46 14 2 0 0 46 2271
VecScale 999 1.0 6.5648e-01 3.2 1.36e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 1 0 0 0 0 1 0 0 0 50334
VecCopy 283 1.0 4.4710e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 1585 1.0 1.4419e+00 2.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPY 234 1.0 4.2862e-01 3.6 6.37e+07 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 36115
VecMAXPY 975 1.0 2.3674e+01 1.8 5.32e+09 1.1 0.0e+00 0.0e+00
0.0e+00 11 39 0 0 0 11 39 0 0 0 54618
VecScatterBegin 2305 1.0 3.9411e+00 3.2 0.00e+00 0.0 2.0e+06 2.3e+04
0.0e+00 1 0 97 89 0 1 0 97 89 0 0
VecScatterEnd 2305 1.0 1.7004e+01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 6 0 0 0 0 6 0 0 0 0 0
MatMult 765 1.0 1.5970e+01 2.0 1.34e+09 1.1 9.8e+05 2.2e+04
0.0e+00 7 10 47 42 0 7 10 47 42 0 20539
MatSolve 766 1.0 1.4117e+01 3.1 1.41e+09 1.1 0.0e+00 0.0e+00
0.0e+00 5 11 0 0 0 5 11 0 0 0 25326
MatLUFactorNum 1 1.0 6.7332e-02 2.7 3.07e+06 1.1 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 11481
MatILUFactorSym 1 1.0 5.3601e-02 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 3 1.0 3.5198e-01 3.2 0.00e+00 0.0 0.0e+00 0.0e+00
4.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 3 1.0 2.0012e-01 1.4 0.00e+00 0.0 2.6e+03 5.5e+03
8.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRowIJ 1 1.0 1.2159e-05 6.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetSubMatrice 1 1.0 8.8131e-01 2.4 0.00e+00 0.0 6.4e+03 5.0e+04
7.0e+00 0 0 0 1 0 0 0 0 1 0 0
MatGetOrdering 1 1.0 1.2251e-02 6.7 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatIncreaseOvrlp 1 1.0 3.9644e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatView 3 3.0 4.4928e-02198.8 0.00e+00 0.0 0.0e+00 0.0e+00
1.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPGMRESOrthog 926 1.0 8.4192e+01 1.2 9.74e+09 1.1 0.0e+00 0.0e+00
9.3e+02 45 72 0 0 43 45 72 0 0 43 28091
KSPSetUp 2 1.0 1.1193e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
5.2e+01 1 0 0 0 2 1 0 0 0 2 0
KSPSolve 1 1.0 1.3453e+02 1.0 1.34e+10 1.1 2.0e+06 2.2e+04
2.1e+03 78100 94 84 96 78100 94 84 96 24342
PCSetUp 2 1.0 1.4820e+00 1.5 3.07e+06 1.1 9.0e+03 3.7e+04
2.3e+01 1 0 0 1 1 1 0 0 1 1 522
PCSetUpOnBlocks 1 1.0 1.1506e-01 2.7 3.07e+06 1.1 0.0e+00 0.0e+00
3.0e+00 0 0 0 0 0 0 0 0 0 0 6718
PCApply 766 1.0 2.2657e+01 2.2 1.41e+09 1.1 9.8e+05 2.2e+04
0.0e+00 9 11 47 42 0 9 11 47 42 0 15780
Generate Vector 1 1.0 3.2557e+01 1.6 0.00e+00 0.0 9.8e+04 8.0e+04
2.4e+01 12 0 5 15 1 12 0 5 15 1 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 86 86 89650392 0
Vector Scatter 8 8 8480 0
Matrix 5 5 44222508 0
Distributed Mesh 2 2 3481552 0
Bipartite Graph 4 4 2800 0
Index Set 22 22 2362180 0
IS L to G Mapping 3 3 2316084 0
Krylov Solver 2 2 31856 0
Preconditioner 2 2 1840 0
Viewer 3 2 1448 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.0160028
Average time for zero size MPI_Send(): 0.000165327
#PETSc Option Table entries:
-ksp_gmres_restart 40
-ksp_lgmres_augment 10
-ksp_type lgmres
-ksp_view
-log_summary
-my_ksp_monitor true
-pc_type asm
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Nov 9 12:01:53 2013
Configure options: --download-f-blas-lapack
--with-mpi-dir=/usr/local/OpenMPI-1.6.4_Intel --download-hypre=1
--download-hdf5=1 --download-superlu_dist --download-parmetis -download-metis
--with-debugging=no
-----------------------------------------
Libraries compiled on Sat Nov 9 12:01:53 2013 on cocoa5.aero.psu.edu
Machine characteristics:
Linux-2.6.32-279.5.1.el6.x86_64-x86_64-with-centos-6.3-Final
Using PETSc directory: /home/pzw2/ZSoft/petsc-3.3-p6
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc -Wall
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O ${COPTFLAGS}
${CFLAGS}
Using Fortran compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90 -O3
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths: -I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include
-I/home/pzw2/ZSoft/petsc-3.3-p6/include -I/home/pzw2/ZSoft/petsc-3.3-p6/include
-I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include
-I/usr/local/OpenMPI-1.6.4_Intel/include
-I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi/opal/mca/hwloc/hwloc132/hwloc/include
-I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi
-----------------------------------------
Using C linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc
Using Fortran linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90
Using libraries: -Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lpetsc -lX11
-Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lsuperlu_dist_3.1
-lparmetis -lmetis -lpthread -lHYPRE -L/usr/local/OpenMPI-1.6.4_Intel/lib
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpi_cxx -lstdc++ -lflapack -lfblas
-lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -lz -lmpi_f90 -lmpi_f77 -lm
-lm -L/opt/intel/composer_xe_2011_sp1.10.319/compiler/lib/intel64 -limf -lm -lm
-lifport -lifcore -lsvml -lm -lipgo -lirc -lirc_s -lm -lm -lm -lmpi_cxx
-lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lrt -lnsl -lutil -lgcc_s -lpthread -ldl
-----------------------------------------