Hi,
I have attached the 2 files.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 2:55 PM, Barry Smith wrote:
Run (158/2)x(266/2)x(150/2) grid on 8 processes and then (158)x(266)x(150)
on 64 processors and send the two -log_summary results
Barry
On Nov 2, 2015, at 12:19 AM, TAY wee-beng <[email protected]> wrote:
Hi,
I have attached the new results.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 12:27 PM, Barry Smith wrote:
Run without the -momentum_ksp_view -poisson_ksp_view and send the new results
You can see from the log summary that the PCSetUp is taking a much smaller
percentage of the time meaning that it is reusing the preconditioner and not
rebuilding it each time.
Barry
Something makes no sense with the output: it gives
KSPSolve 199 1.0 2.3298e+03 1.0 5.20e+09 1.8 3.8e+04 9.9e+05
5.0e+02 90100 66100 24 90100 66100 24 165
90% of the time is in the solve but there is no significant amount of time in
other events of the code which is just not possible. I hope it is due to your
IO.
On Nov 1, 2015, at 10:02 PM, TAY wee-beng <[email protected]> wrote:
Hi,
I have attached the new run with 100 time steps for 48 and 96 cores.
Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the
preconditioner, what must I do? Or what must I not do?
Why does the number of processes increase so much? Is there something wrong
with my coding? Seems to be so too for my new run.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 9:49 AM, Barry Smith wrote:
If you are doing many time steps with the same linear solver then you MUST
do your weak scaling studies with MANY time steps since the setup time of AMG
only takes place in the first stimestep. So run both 48 and 96 processes with
the same large number of time steps.
Barry
On Nov 1, 2015, at 7:35 PM, TAY wee-beng <[email protected]> wrote:
Hi,
Sorry I forgot and use the old a.out. I have attached the new log for 48cores
(log48), together with the 96cores log (log96).
Why does the number of processes increase so much? Is there something wrong
with my coding?
Only the Poisson eqn 's RHS changes, the LHS doesn't. So if I want to reuse the
preconditioner, what must I do? Or what must I not do?
Lastly, I only simulated 2 time steps previously. Now I run for 10 timesteps
(log48_10). Is it building the preconditioner at every timestep?
Also, what about momentum eqn? Is it working well?
I will try the gamg later too.
Thank you
Yours sincerely,
TAY wee-beng
On 2/11/2015 12:30 AM, Barry Smith wrote:
You used gmres with 48 processes but richardson with 96. You need to be
careful and make sure you don't change the solvers when you change the number
of processors since you can get very different inconsistent results
Anyways all the time is being spent in the BoomerAMG algebraic multigrid
setup and it is is scaling badly. When you double the problem size and number
of processes it went from 3.2445e+01 to 4.3599e+02 seconds.
PCSetUp 3 1.0 3.2445e+01 1.0 9.58e+06 2.0 0.0e+00 0.0e+00
4.0e+00 62 8 0 0 4 62 8 0 0 5 11
PCSetUp 3 1.0 4.3599e+02 1.0 9.58e+06 2.0 0.0e+00 0.0e+00
4.0e+00 85 18 0 0 6 85 18 0 0 6 2
Now is the Poisson problem changing at each timestep or can you use the same
preconditioner built with BoomerAMG for all the time steps? Algebraic multigrid
has a large set up time that you often doesn't matter if you have many time
steps but if you have to rebuild it each timestep it is too large?
You might also try -pc_type gamg and see how PETSc's algebraic multigrid
scales for your problem/machine.
Barry
On Nov 1, 2015, at 7:30 AM, TAY wee-beng <[email protected]> wrote:
On 1/11/2015 10:00 AM, Barry Smith wrote:
On Oct 31, 2015, at 8:43 PM, TAY wee-beng <[email protected]> wrote:
On 1/11/2015 12:47 AM, Matthew Knepley wrote:
On Sat, Oct 31, 2015 at 11:34 AM, TAY wee-beng <[email protected]> wrote:
Hi,
I understand that as mentioned in the faq, due to the limitations in memory,
the scaling is not linear. So, I am trying to write a proposal to use a
supercomputer.
Its specs are:
Compute nodes: 82,944 nodes (SPARC64 VIIIfx; 16GB of memory per node)
8 cores / processor
Interconnect: Tofu (6-dimensional mesh/torus) Interconnect
Each cabinet contains 96 computing nodes,
One of the requirement is to give the performance of my current code with my
current set of data, and there is a formula to calculate the estimated parallel
efficiency when using the new large set of data
There are 2 ways to give performance:
1. Strong scaling, which is defined as how the elapsed time varies with the
number of processors for a fixed
problem.
2. Weak scaling, which is defined as how the elapsed time varies with the
number of processors for a
fixed problem size per processor.
I ran my cases with 48 and 96 cores with my current cluster, giving 140 and 90
mins respectively. This is classified as strong scaling.
Cluster specs:
CPU: AMD 6234 2.4GHz
8 cores / processor (CPU)
6 CPU / node
So 48 Cores / CPU
Not sure abt the memory / node
The parallel efficiency ‘En’ for a given degree of parallelism ‘n’ indicates
how much the program is
efficiently accelerated by parallel processing. ‘En’ is given by the following
formulae. Although their
derivation processes are different depending on strong and weak scaling,
derived formulae are the
same.
From the estimated time, my parallel efficiency using Amdahl's law on the
current old cluster was 52.7%.
So is my results acceptable?
For the large data set, if using 2205 nodes (2205X8cores), my expected parallel
efficiency is only 0.5%. The proposal recommends value of > 50%.
The problem with this analysis is that the estimated serial fraction from
Amdahl's Law changes as a function
of problem size, so you cannot take the strong scaling from one problem and
apply it to another without a
model of this dependence.
Weak scaling does model changes with problem size, so I would measure weak
scaling on your current
cluster, and extrapolate to the big machine. I realize that this does not make
sense for many scientific
applications, but neither does requiring a certain parallel efficiency.
Ok I check the results for my weak scaling it is even worse for the expected
parallel efficiency. From the formula used, it's obvious it's doing some sort of
exponential extrapolation decrease. So unless I can achieve a near > 90% speed
up when I double the cores and problem size for my current 48/96 cores setup,
extrapolating from about 96 nodes to 10,000 nodes will give a much lower expected
parallel efficiency for the new case.
However, it's mentioned in the FAQ that due to memory requirement, it's impossible to
get >90% speed when I double the cores and problem size (ie linear increase in
performance), which means that I can't get >90% speed up when I double the cores
and problem size for my current 48/96 cores setup. Is that so?
What is the output of -ksp_view -log_summary on the problem and then on the
problem doubled in size and number of processors?
Barry
Hi,
I have attached the output
48 cores: log48
96 cores: log96
There are 2 solvers - The momentum linear eqn uses bcgs, while the Poisson eqn
uses hypre BoomerAMG.
Problem size doubled from 158x266x150 to 158x266x300.
So is it fair to say that the main problem does not lie in my programming
skills, but rather the way the linear equations are solved?
Thanks.
Thanks,
Matt
Is it possible for this type of scaling in PETSc (>50%), when using 17640
(2205X8) cores?
Btw, I do not have access to the system.
Sent using CloudMagic Email
--
What most experimenters take for granted before they begin their experiments is
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener
<log48.txt><log96.txt>
<log48_10.txt><log48.txt><log96.txt>
<log96_100.txt><log48_100.txt>
<log96_100_2.txt><log48_100_2.txt>
0.000000000000000E+000 0.353000000000000 0.000000000000000E+000
90.0000000000000 0.000000000000000E+000 0.000000000000000E+000
1.00000000000000 0.400000000000000 0 -400000
z grid divid too small!
myid,each procs z size 32 2
z grid divid too small!
myid,each procs z size 50 2
z grid divid too small!
myid,each procs z size 34 2
AB,AA,BB -2.47900002275128 2.50750002410496
3.46600006963126 3.40250006661518
size_x,size_y,size_z 158 266 150
z grid divid too small!
myid,each procs z size 41 2
z grid divid too small!
myid,each procs z size 52 2
z grid divid too small!
myid,each procs z size 60 2
z grid divid too small!
myid,each procs z size 27 2
z grid divid too small!
myid,each procs z size 29 2
z grid divid too small!
myid,each procs z size 39 2
z grid divid too small!
myid,each procs z size 23 2
z grid divid too small!
myid,each procs z size 26 2
z grid divid too small!
myid,each procs z size 24 2
z grid divid too small!
myid,each procs z size 25 2
z grid divid too small!
myid,each procs z size 49 2
z grid divid too small!
myid,each procs z size 57 2
z grid divid too small!
myid,each procs z size 37 2
z grid divid too small!
myid,each procs z size 61 2
z grid divid too small!
myid,each procs z size 28 2
z grid divid too small!
myid,each procs z size 31 2
z grid divid too small!
myid,each procs z size 54 2
z grid divid too small!
myid,each procs z size 35 2
z grid divid too small!
myid,each procs z size 51 2
z grid divid too small!
myid,each procs z size 53 2
z grid divid too small!
myid,each procs z size 22 2
z grid divid too small!
myid,each procs z size 33 2
z grid divid too small!
myid,each procs z size 48 2
z grid divid too small!
myid,each procs z size 44 2
z grid divid too small!
myid,each procs z size 43 2
z grid divid too small!
myid,each procs z size 30 2
z grid divid too small!
myid,each procs z size 62 2
z grid divid too small!
myid,each procs z size 45 2
z grid divid too small!
myid,each procs z size 47 2
z grid divid too small!
myid,each procs z size 40 2
z grid divid too small!
myid,each procs z size 42 2
z grid divid too small!
myid,each procs z size 59 2
z grid divid too small!
myid,each procs z size 46 2
z grid divid too small!
myid,each procs z size 55 2
z grid divid too small!
myid,each procs z size 58 2
z grid divid too small!
myid,each procs z size 36 2
z grid divid too small!
myid,each procs z size 38 2
z grid divid too small!
myid,each procs z size 56 2
z grid divid too small!
myid,each procs z size 63 2
body_cg_ini 0.523700833348298 0.778648765134454
7.03282656467989
Warning - length difference between element and cell
max_element_length,min_element_length,min_delta
0.000000000000000E+000 10000000000.0000 1.800000000000000E-002
maximum ngh_surfaces and ngh_vertics are 42 22
minimum ngh_surfaces and ngh_vertics are 28 10
body_cg_ini 0.896813342835977 -0.976707581163755
7.03282656467989
Warning - length difference between element and cell
max_element_length,min_element_length,min_delta
0.000000000000000E+000 10000000000.0000 1.800000000000000E-002
maximum ngh_surfaces and ngh_vertics are 42 22
minimum ngh_surfaces and ngh_vertics are 28 10
min IIB_cell_no 0
max IIB_cell_no 429
final initial IIB_cell_no 2145
min I_cell_no 0
max I_cell_no 460
final initial I_cell_no 2300
size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u)
2145 2300 2145 2300
IIB_I_cell_no_uvw_total1 3090 3094 3078 3080
3074 3073
IIB_I_cell_no_uvw_total2 3102 3108 3089 3077
3060 3086
1 0.00150000 0.26454057 0.26151125 1.18591343
-0.76697946E+03 -0.32604327E+02 0.62972429E+07
escape_time reached, so abort
body 1
implicit forces and moment 1
0.862588119656401 -0.514914325828415 0.188666046906171
0.478398501406518 0.368390136470159 -1.05426803582325
body 2
implicit forces and moment 2
0.527317340758098 0.731529687675724 0.148470913323249
-0.515187332360951 0.158119801327539 0.961551576757635
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./a.out on a petsc-3.6.2_shared_rel named n12-04 with 64 processors, by wtay
Mon Nov 2 08:09:14 2015
Using Petsc Release Version 3.6.2, Oct, 02, 2015
Max Max/Min Avg Total
Time (sec): 6.462e+02 1.00000 6.462e+02
Objects: 4.300e+01 1.00000 4.300e+01
Flops: 3.832e+09 2.41599 2.918e+09 1.867e+11
Flops/sec: 5.930e+06 2.41599 4.515e+06 2.889e+08
MPI Messages: 4.040e+02 2.00000 3.977e+02 2.545e+04
MPI Message Lengths: 3.953e+08 2.00000 9.784e+05 2.490e+10
MPI Reductions: 1.922e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 6.4623e+02 100.0% 1.8673e+11 100.0% 2.545e+04 100.0%
9.784e+05 100.0% 1.921e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 198 1.0 6.0898e+00 1.5 9.62e+08 2.8 2.5e+04 9.9e+05
0.0e+00 1 26 98100 0 1 26 98100 0 7839
MatSolve 297 1.0 5.0697e+00 2.5 1.30e+09 2.9 0.0e+00 0.0e+00
0.0e+00 1 33 0 0 0 1 33 0 0 0 12289
MatLUFactorNum 99 1.0 6.1544e+00 2.6 6.77e+08 3.4 0.0e+00 0.0e+00
0.0e+00 1 17 0 0 0 1 17 0 0 0 5159
MatILUFactorSym 1 1.0 5.6852e-02 4.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 1 1.0 4.6493e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 100 1.0 1.5075e+0110.6 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+02 1 0 0 0 10 1 0 0 0 10 0
MatAssemblyEnd 100 1.0 1.7887e+00 1.6 0.00e+00 0.0 5.0e+02 1.7e+05
1.6e+01 0 0 2 0 1 0 0 2 0 1 0
MatGetRowIJ 3 1.0 1.1921e-05 0.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 7.5250e-03 4.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 199 1.0 3.5067e-02 9.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 199 1.0 4.9862e+02 1.0 3.83e+09 2.4 2.5e+04 9.9e+05
5.0e+02 77100 98100 26 77100 98100 26 374
VecDot 198 1.0 2.5283e+00 3.7 1.50e+08 1.5 0.0e+00 0.0e+00
2.0e+02 0 4 0 0 10 0 4 0 0 10 2962
VecDotNorm2 99 1.0 2.1805e+00 5.6 1.50e+08 1.5 0.0e+00 0.0e+00
9.9e+01 0 4 0 0 5 0 4 0 0 5 3435
VecNorm 198 1.0 5.7988e+00 8.6 1.50e+08 1.5 0.0e+00 0.0e+00
2.0e+02 0 4 0 0 10 0 4 0 0 10 1292
VecCopy 198 1.0 2.7041e-01 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 696 1.0 6.4512e-01 1.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPBYCZ 198 1.0 7.5918e-01 2.2 3.00e+08 1.5 0.0e+00 0.0e+00
0.0e+00 0 8 0 0 0 0 8 0 0 0 19730
VecWAXPY 198 1.0 7.3240e-01 2.1 1.50e+08 1.5 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 10226
VecAssemblyBegin 398 1.0 2.3003e+00 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+03 0 0 0 0 62 0 0 0 0 62 0
VecAssemblyEnd 398 1.0 1.9789e-03 2.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 198 1.0 5.3828e-01 3.6 0.00e+00 0.0 2.5e+04 9.9e+05
0.0e+00 0 0 98100 0 0 0 98100 0 0
VecScatterEnd 198 1.0 2.7654e+00 4.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 199 1.0 9.0926e+01 1.0 6.77e+08 3.4 0.0e+00 0.0e+00
4.0e+00 14 17 0 0 0 14 17 0 0 0 349
PCSetUpOnBlocks 99 1.0 6.2079e+00 2.6 6.77e+08 3.4 0.0e+00 0.0e+00
0.0e+00 1 17 0 0 0 1 17 0 0 0 5115
PCApply 297 1.0 5.3325e+00 2.3 1.30e+09 2.9 0.0e+00 0.0e+00
0.0e+00 1 33 0 0 0 1 33 0 0 0 11683
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 7 7 136037064 0
Krylov Solver 3 3 3464 0
Vector 20 20 31622728 0
Vector Scatter 2 2 2176 0
Index Set 7 7 3696940 0
Preconditioner 3 3 3208 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 2.14577e-07
Average time for MPI_Barrier(): 0.000201178
Average time for zero size MPI_Send(): 3.05139e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/
--with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/
--with-debugging=0 --download-hypre=1
--prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1
--with-shared-libraries --with-fortran-interfaces=1
-----------------------------------------
Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12
Machine characteristics:
Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core
Using PETSc directory: /home/wtay/Codes/petsc-3.6.2
Using PETSc arch: petsc-3.6.2_shared_rel
-----------------------------------------
Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include
-I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include
-I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include
-I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include
-----------------------------------------
Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc
Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90
Using libraries:
-Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib
-L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc
-Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib
-L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE
-Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64
-L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential
-lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh
-lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib
-L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib
-L/opt/ud/openmpi-1.8.8/lib
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib
-limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread
-lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl
-----------------------------------------
0.000000000000000E+000 0.353000000000000 0.000000000000000E+000
90.0000000000000 0.000000000000000E+000 0.000000000000000E+000
1.00000000000000 0.400000000000000 0 -400000
AB,AA,BB -2.00050000002375 2.00050000002375
2.61200002906844 2.53550002543489
size_x,size_y,size_z 79 133 75
body_cg_ini 0.523700833348298 0.778648765134454
7.03282656467989
Warning - length difference between element and cell
max_element_length,min_element_length,min_delta
0.000000000000000E+000 10000000000.0000 4.300000000000000E-002
maximum ngh_surfaces and ngh_vertics are 149 68
minimum ngh_surfaces and ngh_vertics are 54 22
body_cg_ini 0.896813342835977 -0.976707581163755
7.03282656467989
Warning - length difference between element and cell
max_element_length,min_element_length,min_delta
0.000000000000000E+000 10000000000.0000 4.300000000000000E-002
maximum ngh_surfaces and ngh_vertics are 149 68
minimum ngh_surfaces and ngh_vertics are 54 22
min IIB_cell_no 0
max IIB_cell_no 265
final initial IIB_cell_no 1325
min I_cell_no 0
max I_cell_no 94
final initial I_cell_no 470
size(IIB_cell_u),size(I_cell_u),size(IIB_equal_cell_u),size(I_equal_cell_u)
1325 470 1325 470
IIB_I_cell_no_uvw_total1 265 270 255 94
91 95
IIB_I_cell_no_uvw_total2 273 280 267 97
94 98
1 0.00150000 0.14647307 0.14738629 1.08799982
0.19042331E+02 0.17694812E+00 0.78750669E+06
escape_time reached, so abort
body 1
implicit forces and moment 1
0.869079152284549 -0.476901507812372 8.158446867754350E-002
0.428147709668946 0.558124898859503 -0.928673788206044
body 2
implicit forces and moment 2
0.551071794231021 0.775546442990061 0.135476527830159
-0.634587379905926 0.290234735051080 0.936523173830761
************************************************************************************************************************
*** WIDEN YOUR WINDOW TO 120 CHARACTERS. Use 'enscript -r
-fCourier9' to print this document ***
************************************************************************************************************************
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./a.out on a petsc-3.6.2_shared_rel named n12-04 with 8 processors, by wtay Mon
Nov 2 08:08:29 2015
Using Petsc Release Version 3.6.2, Oct, 02, 2015
Max Max/Min Avg Total
Time (sec): 1.687e+02 1.00000 1.687e+02
Objects: 4.300e+01 1.00000 4.300e+01
Flops: 3.326e+09 1.20038 3.085e+09 2.468e+10
Flops/sec: 1.971e+07 1.20038 1.828e+07 1.463e+08
MPI Messages: 4.040e+02 2.00000 3.535e+02 2.828e+03
MPI Message Lengths: 9.744e+07 2.00000 2.412e+05 6.821e+08
MPI Reductions: 1.922e+03 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 1.6872e+02 100.0% 2.4679e+10 100.0% 2.828e+03 100.0%
2.412e+05 100.0% 1.921e+03 99.9%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
MatMult 198 1.0 3.7159e+00 1.3 7.91e+08 1.2 2.8e+03 2.5e+05
0.0e+00 2 24 98100 0 2 24 98100 0 1575
MatSolve 297 1.0 3.5614e+00 1.4 1.15e+09 1.2 0.0e+00 0.0e+00
0.0e+00 2 35 0 0 0 2 35 0 0 0 2393
MatLUFactorNum 99 1.0 6.4595e+00 1.4 6.34e+08 1.2 0.0e+00 0.0e+00
0.0e+00 3 19 0 0 0 3 19 0 0 0 726
MatILUFactorSym 1 1.0 3.9131e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatConvert 1 1.0 2.7447e-02 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 100 1.0 6.4550e+0011.7 0.00e+00 0.0 0.0e+00 0.0e+00
2.0e+02 2 0 0 0 10 2 0 0 0 10 0
MatAssemblyEnd 100 1.0 1.4923e+00 1.3 0.00e+00 0.0 5.6e+01 4.1e+04
1.6e+01 1 0 2 0 1 1 0 2 0 1 0
MatGetRowIJ 3 1.0 3.3379e-06 3.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetOrdering 1 1.0 5.4829e-03 2.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSetUp 199 1.0 1.3117e-02 2.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
KSPSolve 199 1.0 1.2467e+02 1.0 3.33e+09 1.2 2.8e+03 2.5e+05
5.0e+02 74100 98100 26 74100 98100 26 198
VecDot 198 1.0 1.1893e+00 2.8 1.25e+08 1.1 0.0e+00 0.0e+00
2.0e+02 0 4 0 0 10 0 4 0 0 10 787
VecDotNorm2 99 1.0 1.0476e+00 3.6 1.25e+08 1.1 0.0e+00 0.0e+00
9.9e+01 0 4 0 0 5 0 4 0 0 5 894
VecNorm 198 1.0 2.6889e+00 5.4 1.25e+08 1.1 0.0e+00 0.0e+00
2.0e+02 1 4 0 0 10 1 4 0 0 10 348
VecCopy 198 1.0 1.6091e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecSet 696 1.0 4.0666e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAXPBYCZ 198 1.0 4.8916e-01 1.5 2.50e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 8 0 0 0 0 8 0 0 0 3828
VecWAXPY 198 1.0 4.7945e-01 1.5 1.25e+08 1.1 0.0e+00 0.0e+00
0.0e+00 0 4 0 0 0 0 4 0 0 0 1953
VecAssemblyBegin 398 1.0 8.6470e-01 4.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.2e+03 0 0 0 0 62 0 0 0 0 62 0
VecAssemblyEnd 398 1.0 1.1375e-03 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecScatterBegin 198 1.0 7.4661e-02 2.5 0.00e+00 0.0 2.8e+03 2.5e+05
0.0e+00 0 0 98100 0 0 0 98100 0 0
VecScatterEnd 198 1.0 8.5547e-01 5.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
PCSetUp 199 1.0 1.1193e+01 1.2 6.34e+08 1.2 0.0e+00 0.0e+00
4.0e+00 6 19 0 0 0 6 19 0 0 0 419
PCSetUpOnBlocks 99 1.0 6.5059e+00 1.4 6.34e+08 1.2 0.0e+00 0.0e+00
0.0e+00 3 19 0 0 0 3 19 0 0 0 721
PCApply 297 1.0 3.7292e+00 1.4 1.15e+09 1.2 0.0e+00 0.0e+00
0.0e+00 2 35 0 0 0 2 35 0 0 0 2285
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Matrix 7 7 114426392 0
Krylov Solver 3 3 3464 0
Vector 20 20 25577680 0
Vector Scatter 2 2 2176 0
Index Set 7 7 2691760 0
Preconditioner 3 3 3208 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 3.8147e-06
Average time for zero size MPI_Send(): 2.77162e-06
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-mpi-dir=/opt/ud/openmpi-1.8.8/
--with-blas-lapack-dir=/opt/ud/intel_xe_2013sp1/mkl/lib/intel64/
--with-debugging=0 --download-hypre=1
--prefix=/home/wtay/Lib/petsc-3.6.2_shared_rel --known-mpi-shared=1
--with-shared-libraries --with-fortran-interfaces=1
-----------------------------------------
Libraries compiled on Sun Oct 18 17:34:07 2015 on hpc12
Machine characteristics:
Linux-3.10.0-123.20.1.el7.x86_64-x86_64-with-centos-7.1.1503-Core
Using PETSc directory: /home/wtay/Codes/petsc-3.6.2
Using PETSc arch: petsc-3.6.2_shared_rel
-----------------------------------------
Using C compiler: /opt/ud/openmpi-1.8.8/bin/mpicc -fPIC -wd1572 -O3
${COPTFLAGS} ${CFLAGS}
Using Fortran compiler: /opt/ud/openmpi-1.8.8/bin/mpif90 -fPIC -O3
${FOPTFLAGS} ${FFLAGS}
-----------------------------------------
Using include paths:
-I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include
-I/home/wtay/Codes/petsc-3.6.2/include -I/home/wtay/Codes/petsc-3.6.2/include
-I/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/include
-I/home/wtay/Lib/petsc-3.6.2_shared_rel/include -I/opt/ud/openmpi-1.8.8/include
-----------------------------------------
Using C linker: /opt/ud/openmpi-1.8.8/bin/mpicc
Using Fortran linker: /opt/ud/openmpi-1.8.8/bin/mpif90
Using libraries:
-Wl,-rpath,/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib
-L/home/wtay/Codes/petsc-3.6.2/petsc-3.6.2_shared_rel/lib -lpetsc
-Wl,-rpath,/home/wtay/Lib/petsc-3.6.2_shared_rel/lib
-L/home/wtay/Lib/petsc-3.6.2_shared_rel/lib -lHYPRE
-Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -lmpi_cxx
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/mkl/lib/intel64
-L/opt/ud/intel_xe_2013sp1/mkl/lib/intel64 -lmkl_intel_lp64 -lmkl_sequential
-lmkl_core -lpthread -lm -lX11 -lhwloc -lssl -lcrypto -lmpi_usempi -lmpi_mpifh
-lifport -lifcore -lm -lmpi_cxx -ldl -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib
-L/opt/ud/openmpi-1.8.8/lib -lmpi -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib
-L/opt/ud/openmpi-1.8.8/lib
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib
-limf -lsvml -lirng -lipgo -ldecimal -lcilkrts -lstdc++ -lgcc_s -lirc -lpthread
-lirc_s -Wl,-rpath,/opt/ud/openmpi-1.8.8/lib -L/opt/ud/openmpi-1.8.8/lib
-Wl,-rpath,/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-L/opt/ud/intel_xe_2013sp1/composer_xe_2013_sp1.2.144/compiler/lib/intel64
-Wl,-rpath,/usr/lib/gcc/x86_64-redhat-linux/4.8.3
-L/usr/lib/gcc/x86_64-redhat-linux/4.8.3 -ldl
-----------------------------------------