Re: [petsc-users] approaches to reduce computing time

Roc Wang Tue, 12 Nov 2013 12:49:18 -0800


Date: Tue, 12 Nov 2013 14:22:35 -0600
Subject: Re: [petsc-users] approaches to reduce computing time
From: [email protected]
To: [email protected]
CC: [email protected]; [email protected]


On Tue, Nov 12, 2013 at 2:14 PM, Roc Wang <[email protected]> wrote:




Thanks Jed,

I have questions about load balance and PC type below.

> From: [email protected]
> To: [email protected]; [email protected]

> Subject: Re: [petsc-users] approaches to reduce computing time
> Date: Sun, 10 Nov 2013 12:20:18 -0700
> 
> Roc Wang <[email protected]> writes:

> 
> > Hi all,
> >
> >    I am trying to minimize the computing time to solve a large sparse 
> > matrix. The matrix dimension is with m=321 n=321 and p=321. I am trying to 
> > reduce the computing time from two directions: 1 finding a Pre-conditioner 
> > to reduce the number of iterations which reduces the time numerically, 2 
> > requesting more cores.

> >
> > ----For the first method, I tried several methods:
> >  1 default KSP and PC,
> >  2 -ksp_type fgmres -ksp_gmres_restart 30 -pc_type ksp  -ksp_pc_type 
> > jacobi, 
> >  3 -ksp_type lgmres  -ksp_gmres_restart 40 -ksp_lgmres_augment 10,

> >  4 -ksp_type lgmres  -ksp_gmres_restart 50 -ksp_lgmres_augment 10,
> >  5 -ksp_type lgmres -ksp_gmres_restart 40 -ksp_lgmres_augment 10 -pc_type 
> > asm (PCASM)
> >
> > The iterations and timing is like the following with 128 cores requested:

> > case# iter      timing (s)
> > 1       1436        816  
> > 2             3    12658
> > 3       1069        669.64
> > 4         872        768.12
> > 5       927          513.14

> >
> > It can be seen that change -ksp_gmres_restart and -ksp_lgmres_augment can 
> > help to reduce the iterations but not the timing (comparing case 3 and 4). 
> > Second, the PCASM helps a lot.  Although the second option is able to 
> > reduce iterations, the timing increases very much. Is it because more 
> > operations are needed in the PC?

> >
> > My questions here are: 1. Which direction should I take to select
> > -ksp_gmres_restart and -ksp_lgmres_augment? For example, if larger
> > restart with large augment is better or larger restart with smaller

> > augment is better?
> 
> Look at the -log_summary.  By increasing the restart, the work in
> KSPGMRESOrthog will increase linearly, but the number of iterations
> might decrease enough to compensate.  There is no general rule here

> since it depends on the relative expense of operations for your problem
> on your machine.
> 
> > ----For the second method, I tried with -ksp_type lgmres -ksp_gmres_restart 
> > 40 -ksp_lgmres_augment 10 -pc_type asm with different number of cores.   I 
> > found the speedup ratio increases slowly when  more than 32 to 64 cores are 
> > requested. I searched the milling list archives and found that I am very 
> > likely running into the memory bandwidth bottleneck. 
> > http://www.mail-archive.com/[email protected]/msg19152.html:

> >
> > # of cores       iter     timing
> >     1                 923   19541.83
> >     4                 929     5897.06
> >     8                 932     4854.72
> >   16                 924     1494.33

> >   32                 924     1480.88
> >   64                 928       686.89
> > 128                 927       627.33
> > 256                 926       552.93
> 
> The bandwidth issue has more to do with using multiple cores within a

> node rather than between nodes.  Likely the above is a load balancing
> problem or bad communication.

I use DM to manage the distributed data.  The DM was created by calling 
DMDACreate3d() and let PETSc decide the local number of nodes in each 
direction. To my understand the load of each core is determined at this stage.  
 If the load balance is done when DMDACreate3d() is called and use PETSC_DECIDE 
option? Or how should make the load balanced after DM is created?


We do not have a way to do fine-grained load balancing for the DMDA since it is 
intended for very simple topologies. You can seeif it is load imbalance from 
the division by running with a cube that is evenly divisible with a cube number 
of processes.

   Matt

So, I have nothing to do to make the load balanced if I use DMDA?  Would you 
please take a look at the attached log summary files and give me some 
suggestions on how to improve the speedup ratio? Thanks.
 > 
> > My question here is:    Is there any other PC can help on both reducing 
> > iterations and increasing scalability? Thanks. 

> 
> Always send -log_summary with questions like this, but algebraic multigrid is 
> a good place to start.

Please take a look at the attached log file, they are for 128 cores and 256 
cores, respectively.  Based on the log files, what should be done to increase 
the scalability? Thanks.

                                          


-- 
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.

-- Norbert Wiener

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./x.r on a arch-linux2-c-opt named node11.cocoa5 with 128 processors, by pzw2 
Sun Nov 10 11:31:45 2013
Using Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.507e+02      1.00188   2.504e+02
Objects:              1.370e+02      1.00000   1.370e+02
Flops:                2.651e+10      1.05073   2.556e+10  3.272e+12
Flops/sec:            1.059e+08      1.05109   1.021e+08  1.307e+10
MPI Messages:         9.515e+03      1.96510   7.573e+03  9.693e+05
MPI Message Lengths:  1.821e+09      8.81374   4.244e+04  4.114e+10
MPI Reductions:       2.143e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 2.5038e+02 100.0%  3.2720e+12 100.0%  9.693e+05 100.0%  
4.244e+04      100.0%  2.142e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecView                8 1.0 1.9149e+01 1.0 0.00e+00 0.0 2.5e+04 3.1e+05 
1.6e+01  8  0  3 19  1   8  0  3 19  1     0
VecMDot              927 1.0 9.2982e+01 1.3 9.63e+09 1.1 0.0e+00 0.0e+00 
9.3e+02 34 36  0  0 43  34 36  0  0 43 12733
VecNorm              976 1.0 3.6295e+01 1.6 5.25e+08 1.1 0.0e+00 0.0e+00 
9.8e+02 12  2  0  0 46  12  2  0  0 46  1779
VecScale            1000 1.0 1.5538e+00 2.7 2.69e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 21287
VecCopy              283 1.0 9.1090e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1587 1.0 2.6465e+00 2.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAXPY              234 1.0 1.0124e+00 3.4 1.26e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 15290
VecMAXPY             976 1.0 4.6520e+01 1.4 1.05e+10 1.1 0.0e+00 0.0e+00 
0.0e+00 16 40  0  0  0  16 40  0  0  0 27827
VecScatterBegin     2308 1.0 6.4895e+00 2.2 0.00e+00 0.0 9.5e+05 3.8e+04 
0.0e+00  2  0 98 87  0   2  0 98 87  0     0
VecScatterEnd       2308 1.0 2.2144e+01 4.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  5  0  0  0  0   5  0  0  0  0     0
MatMult              766 1.0 3.0886e+01 1.7 2.66e+09 1.1 4.7e+05 3.5e+04 
0.0e+00  9 10 48 40  0   9 10 48 40  0 10633
MatSolve             767 1.0 2.1649e+01 1.8 2.77e+09 1.1 0.0e+00 0.0e+00 
0.0e+00  7 11  0  0  0   7 11  0  0  0 16223
MatLUFactorNum         1 1.0 1.1517e-01 2.5 6.05e+06 1.1 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0  6599
MatILUFactorSym        1 1.0 1.0871e-01 3.7 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       3 1.0 5.6961e-01 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 2.4171e-01 1.3 0.00e+00 0.0 1.2e+03 8.8e+03 
8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.0014e-05 5.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       1 1.0 1.2215e+00 2.0 0.00e+00 0.0 3.0e+03 8.1e+04 
7.0e+00  0  0  0  1  0   0  0  0  1  0     0
MatGetOrdering         1 1.0 1.4469e-02 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       1 1.0 4.1109e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 3.0 2.9133e-02153.3 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       927 1.0 1.2676e+02 1.1 1.93e+10 1.1 0.0e+00 0.0e+00 
9.3e+02 48 72  0  0 43  48 72  0  0 43 18681
KSPSetUp               2 1.0 9.2526e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
5.2e+01  0  0  0  0  2   0  0  0  0  2     0
KSPSolve               1 1.0 2.1155e+02 1.0 2.65e+10 1.1 9.4e+05 3.5e+04 
2.1e+03 84100 97 80 96  84100 97 80 96 15467
PCSetUp                2 1.0 1.8918e+00 1.5 6.05e+06 1.1 4.3e+03 6.0e+04 
2.3e+01  1  0  0  1  1   1  0  0  1  1   402
PCSetUpOnBlocks        1 1.0 2.2074e-01 2.7 6.05e+06 1.1 0.0e+00 0.0e+00 
3.0e+00  0  0  0  0  0   0  0  0  0  0  3443
PCApply              767 1.0 3.5912e+01 1.6 2.77e+09 1.1 4.7e+05 3.5e+04 
0.0e+00 12 11 48 40  0  12 11 48 40  0  9779
Generate Vector        1 1.0 3.4500e+01 1.8 0.00e+00 0.0 2.5e+04 3.1e+05 
2.4e+01  8  0  3 19  1   8  0  3 19  1     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    86             86    176910552     0
      Vector Scatter     8              8         8480     0
              Matrix     5              5     86898828     0
    Distributed Mesh     2              2      6787792     0
     Bipartite Graph     4              4         2800     0
           Index Set    22             22      4585380     0
   IS L to G Mapping     3              3      4520244     0
       Krylov Solver     2              2        31856     0
      Preconditioner     2              2         1840     0
              Viewer     3              2         1448     0
========================================================================================================================
Average time to get PetscTime(): 1.90735e-07
Average time for MPI_Barrier(): 0.0152486
Average time for zero size MPI_Send(): 0.00012082
#PETSc Option Table entries:
-ksp_gmres_restart 40
-ksp_lgmres_augment 10
-ksp_type lgmres
-ksp_view
-log_summary
-my_ksp_monitor true
-pc_type asm
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Nov  9 12:01:53 2013
Configure options: --download-f-blas-lapack 
--with-mpi-dir=/usr/local/OpenMPI-1.6.4_Intel --download-hypre=1 
--download-hdf5=1 --download-superlu_dist --download-parmetis -download-metis 
--with-debugging=no
-----------------------------------------
Libraries compiled on Sat Nov  9 12:01:53 2013 on cocoa5.aero.psu.edu 
Machine characteristics: 
Linux-2.6.32-279.5.1.el6.x86_64-x86_64-with-centos-6.3-Final
Using PETSc directory: /home/pzw2/ZSoft/petsc-3.3-p6
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc  -Wall 
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} 
${CFLAGS}
Using Fortran compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90  -O3   
${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include 
-I/home/pzw2/ZSoft/petsc-3.3-p6/include -I/home/pzw2/ZSoft/petsc-3.3-p6/include 
-I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include 
-I/usr/local/OpenMPI-1.6.4_Intel/include 
-I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi/opal/mca/hwloc/hwloc132/hwloc/include
 -I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi
-----------------------------------------

Using C linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc
Using Fortran linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90
Using libraries: -Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib 
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lpetsc -lX11 
-Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib 
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lsuperlu_dist_3.1 
-lparmetis -lmetis -lpthread -lHYPRE -L/usr/local/OpenMPI-1.6.4_Intel/lib 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpi_cxx -lstdc++ -lflapack -lfblas 
-lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -lz -lmpi_f90 -lmpi_f77 -lm 
-lm -L/opt/intel/composer_xe_2011_sp1.10.319/compiler/lib/intel64 -limf -lm -lm 
-lifport -lifcore -lsvml -lm -lipgo -lirc -lirc_s -lm -lm -lm -lmpi_cxx 
-lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lrt -lnsl -lutil -lgcc_s -lpthread -ldl 
-----------------------------------------

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./x.r on a arch-linux2-c-opt named node31.cocoa5 with 256 processors, by pzw2 
Sun Nov 10 11:30:07 2013
Using Petsc Release Version 3.3.0, Patch 6, Mon Feb 11 12:26:34 CST 2013 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.724e+02      1.00082   1.724e+02
Objects:              1.370e+02      1.00000   1.370e+02
Flops:                1.341e+10      1.06368   1.279e+10  3.275e+12
Flops/sec:            7.782e+07      1.06388   7.422e+07  1.900e+10
MPI Messages:         9.695e+03      1.92820   8.145e+03  2.085e+06
MPI Message Lengths:  1.770e+09     14.32685   2.471e+04  5.151e+10
MPI Reductions:       2.141e+03      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 1.7236e+02 100.0%  3.2749e+12 100.0%  2.085e+06 100.0%  
2.471e+04      100.0%  2.140e+03 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %f - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %f %M %L %R  %T %f %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecView                8 1.0 2.0219e+01 1.0 0.00e+00 0.0 9.8e+04 8.0e+04 
1.6e+01 12  0  5 15  1  12  0  5 15  1     0
VecMDot              926 1.0 6.7355e+01 1.3 4.87e+09 1.1 0.0e+00 0.0e+00 
9.3e+02 35 36  0  0 43  35 36  0  0 43 17557
VecNorm              975 1.0 2.8405e+01 1.6 2.66e+08 1.1 0.0e+00 0.0e+00 
9.8e+02 14  2  0  0 46  14  2  0  0 46  2271
VecScale             999 1.0 6.5648e-01 3.2 1.36e+08 1.1 0.0e+00 0.0e+00 
0.0e+00  0  1  0  0  0   0  1  0  0  0 50334
VecCopy              283 1.0 4.4710e-01 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecSet              1585 1.0 1.4419e+00 2.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAXPY              234 1.0 4.2862e-01 3.6 6.37e+07 1.1 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 36115
VecMAXPY             975 1.0 2.3674e+01 1.8 5.32e+09 1.1 0.0e+00 0.0e+00 
0.0e+00 11 39  0  0  0  11 39  0  0  0 54618
VecScatterBegin     2305 1.0 3.9411e+00 3.2 0.00e+00 0.0 2.0e+06 2.3e+04 
0.0e+00  1  0 97 89  0   1  0 97 89  0     0
VecScatterEnd       2305 1.0 1.7004e+01 4.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  6  0  0  0  0   6  0  0  0  0     0
MatMult              765 1.0 1.5970e+01 2.0 1.34e+09 1.1 9.8e+05 2.2e+04 
0.0e+00  7 10 47 42  0   7 10 47 42  0 20539
MatSolve             766 1.0 1.4117e+01 3.1 1.41e+09 1.1 0.0e+00 0.0e+00 
0.0e+00  5 11  0  0  0   5 11  0  0  0 25326
MatLUFactorNum         1 1.0 6.7332e-02 2.7 3.07e+06 1.1 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0 11481
MatILUFactorSym        1 1.0 5.3601e-02 3.5 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       3 1.0 3.5198e-01 3.2 0.00e+00 0.0 0.0e+00 0.0e+00 
4.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         3 1.0 2.0012e-01 1.4 0.00e+00 0.0 2.6e+03 5.5e+03 
8.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRowIJ            1 1.0 1.2159e-05 6.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetSubMatrice       1 1.0 8.8131e-01 2.4 0.00e+00 0.0 6.4e+03 5.0e+04 
7.0e+00  0  0  0  1  0   0  0  0  1  0     0
MatGetOrdering         1 1.0 1.2251e-02 6.7 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatIncreaseOvrlp       1 1.0 3.9644e-01 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 
2.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatView                3 3.0 4.4928e-02198.8 0.00e+00 0.0 0.0e+00 0.0e+00 
1.0e+00  0  0  0  0  0   0  0  0  0  0     0
KSPGMRESOrthog       926 1.0 8.4192e+01 1.2 9.74e+09 1.1 0.0e+00 0.0e+00 
9.3e+02 45 72  0  0 43  45 72  0  0 43 28091
KSPSetUp               2 1.0 1.1193e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
5.2e+01  1  0  0  0  2   1  0  0  0  2     0
KSPSolve               1 1.0 1.3453e+02 1.0 1.34e+10 1.1 2.0e+06 2.2e+04 
2.1e+03 78100 94 84 96  78100 94 84 96 24342
PCSetUp                2 1.0 1.4820e+00 1.5 3.07e+06 1.1 9.0e+03 3.7e+04 
2.3e+01  1  0  0  1  1   1  0  0  1  1   522
PCSetUpOnBlocks        1 1.0 1.1506e-01 2.7 3.07e+06 1.1 0.0e+00 0.0e+00 
3.0e+00  0  0  0  0  0   0  0  0  0  0  6718
PCApply              766 1.0 2.2657e+01 2.2 1.41e+09 1.1 9.8e+05 2.2e+04 
0.0e+00  9 11 47 42  0   9 11 47 42  0 15780
Generate Vector        1 1.0 3.2557e+01 1.6 0.00e+00 0.0 9.8e+04 8.0e+04 
2.4e+01 12  0  5 15  1  12  0  5 15  1     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    86             86     89650392     0
      Vector Scatter     8              8         8480     0
              Matrix     5              5     44222508     0
    Distributed Mesh     2              2      3481552     0
     Bipartite Graph     4              4         2800     0
           Index Set    22             22      2362180     0
   IS L to G Mapping     3              3      2316084     0
       Krylov Solver     2              2        31856     0
      Preconditioner     2              2         1840     0
              Viewer     3              2         1448     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.0160028
Average time for zero size MPI_Send(): 0.000165327
#PETSc Option Table entries:
-ksp_gmres_restart 40
-ksp_lgmres_augment 10
-ksp_type lgmres
-ksp_view
-log_summary
-my_ksp_monitor true
-pc_type asm
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure run at: Sat Nov  9 12:01:53 2013
Configure options: --download-f-blas-lapack 
--with-mpi-dir=/usr/local/OpenMPI-1.6.4_Intel --download-hypre=1 
--download-hdf5=1 --download-superlu_dist --download-parmetis -download-metis 
--with-debugging=no
-----------------------------------------
Libraries compiled on Sat Nov  9 12:01:53 2013 on cocoa5.aero.psu.edu 
Machine characteristics: 
Linux-2.6.32-279.5.1.el6.x86_64-x86_64-with-centos-6.3-Final
Using PETSc directory: /home/pzw2/ZSoft/petsc-3.3-p6
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc  -Wall 
-Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  ${COPTFLAGS} 
${CFLAGS}
Using Fortran compiler: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90  -O3   
${FOPTFLAGS} ${FFLAGS} 
-----------------------------------------

Using include paths: -I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include 
-I/home/pzw2/ZSoft/petsc-3.3-p6/include -I/home/pzw2/ZSoft/petsc-3.3-p6/include 
-I/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/include 
-I/usr/local/OpenMPI-1.6.4_Intel/include 
-I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi/opal/mca/hwloc/hwloc132/hwloc/include
 -I/usr/local/OpenMPI-1.6.4_Intel/include/openmpi
-----------------------------------------

Using C linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpicc
Using Fortran linker: /usr/local/OpenMPI-1.6.4_Intel/bin/mpif90
Using libraries: -Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib 
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lpetsc -lX11 
-Wl,-rpath,/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib 
-L/home/pzw2/ZSoft/petsc-3.3-p6/arch-linux2-c-opt/lib -lsuperlu_dist_3.1 
-lparmetis -lmetis -lpthread -lHYPRE -L/usr/local/OpenMPI-1.6.4_Intel/lib 
-L/usr/lib/gcc/x86_64-redhat-linux/4.4.6 -lmpi_cxx -lstdc++ -lflapack -lfblas 
-lhdf5_fortran -lhdf5 -lhdf5hl_fortran -lhdf5_hl -lz -lmpi_f90 -lmpi_f77 -lm 
-lm -L/opt/intel/composer_xe_2011_sp1.10.319/compiler/lib/intel64 -limf -lm -lm 
-lifport -lifcore -lsvml -lm -lipgo -lirc -lirc_s -lm -lm -lm -lmpi_cxx 
-lstdc++ -lmpi_cxx -lstdc++ -ldl -lmpi -lrt -lnsl -lutil -lgcc_s -lpthread -ldl 
-----------------------------------------

Re: [petsc-users] approaches to reduce computing time

Reply via email to