Hi.
I managed to finish the re-implementation. I ran the program
with 1,2,3,4,5,6 machines and saved the summary. I send each of them in
this email.
In these executions, the program performs Matrix-Vector
(MatMult, MatMultAdd) products and Vector-Vector operations. From what I
understand while reading the logs, the program takes most of the time in
"VecScatterEnd".
In this example, the matrix taking part on the
Matrix-Vector products is not "much diagonal heavy".
The following
numbers are the percentages of nnz values on the matrix diagonal block
for each machine, and each execution time.
NMachines %NNZ ExecTime
1
machine0 100%; 16min08sec
2 machine0 91.1%; 24min58sec
machine1
69.2%;
3 machine0 90.9% 25min42sec
machine1 82.8%
machine2 51.6%
4
machine0 91.9% 26min27sec
machine1 82.4%
machine2 73.1%
machine3
39.9%
5 machine0 93.2% 39min23sec
machine1 82.8%
machine2 74.4%
machine3 64.6%
machine4 31.6%
6 machine0 94.2% 54min54sec
machine1
82.6%
machine2 73.1%
machine3 65.2%
machine4 55.9%
machine5 25.4%
In this implementation I'm using MatCreate and VecCreate. I'm also
leaving the partition sizes in PETSC_DECIDE.
Finally, to run the
application, I'm using mpirun.hydra from mpich, downloaded by PETSc
configure script.
I'm checking the process assignment as suggested on
the last email.
Am I missing anything?
Regards,
Nelson
Em 2015-08-20
16:17, Matthew Knepley escreveu:
> On Thu, Aug 20, 2015 at 6:30 AM,
Nelson Filipe Lopes da Silva <[email protected] [3]> wrote:
>
>>
Hello.
>>
>> I am sorry for the long time without response. I decided
to rewrite my application in a different way and will send the
log_summary output when done reimplementing.
>>
>> As for the machine,
I am using mpirun to run jobs in a 8 node cluster. I modified the
makefile on the steams folder so it would run using my hostfile.
>> The
output is attached to this email. It seems reasonable for a cluster with
8 machines. From "lscpu", each machine cpu has 4 cores and 1 socket.
>
> 1) You launcher is placing processes haphazardly. I would figure out
how to assign them to certain nodes
> 2) Each node has enough bandwidth
for 1 core, so it does not make much sense to use more than 1.
>
Thanks,
> Matt
>
>> Cheers,
>> Nelson
>>
>> Em 2015-07-24 16:50,
Barry Smith escreveu:
>>
>>> It would be very helpful if you ran the
code on say 1, 2, 4, 8, 16
>>> ... processes with the option
-log_summary and send (as attachments)
>>> the log summary
information.
>>>
>>> Also on the same machine run the streams
benchmark; with recent
>>> releases of PETSc you only need to do
>>>
>>> cd $PETSC_DIR
>>> make streams NPMAX=16 (or whatever your largest
process count is)
>>>
>>> and send the output.
>>>
>>> I suspect that
you are doing everything fine and it is more an issue
>>> with the
configuration of your machine. Also read the information at
>>>
http://www.mcs.anl.gov/petsc/documentation/faq.html#computers [2] on
>>>
"binding"
>>>
>>> Barry
>>>
>>>> On Jul 24, 2015, at 10:41 AM, Nelson
Filipe Lopes da Silva <[email protected] [1]> wrote:
>>>>
>>>>
Hello,
>>>>
>>>> I have been using PETSc for a few months now, and it
truly is fantastic piece of software.
>>>>
>>>> In my particular
example I am working with a large, sparse distributed (MPI AIJ) matrix
we can refer as 'G'.
>>>> G is a horizontal - retangular matrix (for
example, 1,1 Million rows per 2,1 Million columns). This matrix is
commonly very sparse and not diagonal 'heavy' (for example 5,2 Million
nnz in which ~50% are on the diagonal block of MPI AIJ
representation).
>>>> To work with this matrix, I also have a few
parallel vectors (created using MatCreate Vec), we can refer as 'm' and
'k'.
>>>> I am trying to parallelize an iterative algorithm in which the
most computational heavy operations are:
>>>>
>>>> ->Matrix-Vector
Multiplication, more precisely G * m + k = b (MatMultAdd). From what I
have been reading, to achive a good speedup in this operation, G should
be as much diagonal as possible, due to overlapping communication and
computation. But even when using a G matrix in which the diagonal block
has ~95% of the nnz, I cannot get a decent speedup. Most of the times,
the performance even gets worse.
>>>>
>>>> ->Matrix-Matrix
Multiplication, in this case I need to perform G * G' = A, where A is
later used on the linear solver and G' is transpose of G. The speedup in
this operation is not worse, although is not very good.
>>>>
>>>>
->Linear problem solving. Lastly, In this operation I compute "Ax=b"
from the last two operations. I tried to apply a RCM permutation to A to
make it more diagonal, for better performance. However, the problem I
faced was that, the permutation is performed locally in each processor
and thus, the final result is different with different number of
processors. I assume this was intended to reduce communication. The
solution I found was
>>>> 1-calculate A
>>>> 2-calculate, localy to 1
machine, the RCM permutation IS using A
>>>> 3-apply this permutation to
the lines of G.
>>>> This works well, and A is generated as if RCM
permuted. It is fine to do this operation in one machine because it is
only done once while reading the input. The nnz of G become more spread
and less diagonal, causing problems when calculating G * m + k = b.
>>>>
>>>> These 3 operations (except the permutation) are performed in each
iteration of my algorithm.
>>>>
>>>> So, my questions are.
>>>> -What
are the characteristics of G that lead to a good speedup in the
operations I described? Am I missing something and too much obsessed
with the diagonal block?
>>>>
>>>> -Is there a better way to permute A
without permute G and still get the same result using 1 or N
machines?
>>>>
>>>> I have been avoiding asking for help for a while.
I'm very sorry for the long email.
>>>> Thank you very much for your
time.
>>>> Best Regards,
>>>> Nelson
>
> --
>
> What most
experimenters take for granted before they begin their experiments is
infinitely more interesting than any results to which their experiments
lead.
> -- Norbert Wiener
Links:
------
[1]
mailto:[email protected]
[2]
http://www.mcs.anl.gov/petsc/documentation/faq.html#computers
[3]
mailto:[email protected]
---------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./bin/balance on a arch-linux2-c-opt named g03 with 1 processor, by u06189 Sat
Aug 22 14:28:02 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015
Max Max/Min Avg Total
Time (sec): 9.684e+02 1.00000 9.684e+02
Objects: 8.700e+01 1.00000 8.700e+01
Flops: 1.667e+11 1.00000 1.667e+11 1.667e+11
Flops/sec: 1.721e+08 1.00000 1.721e+08 1.721e+08
MPI Messages: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Message Lengths: 0.000e+00 0.00000 0.000e+00 0.000e+00
MPI Reductions: 0.000e+00 0.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 9.6837e+02 100.0% 1.6666e+11 100.0% 0.000e+00 0.0%
0.000e+00 0.0% 0.000e+00 0.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMax 10510 1.0 3.6155e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 4 0 0 0 0 4 0 0 0 0 0
VecScale 21258 1.0 6.9022e+01 1.0 2.49e+10 1.0 0.0e+00 0.0e+00
0.0e+00 7 15 0 0 0 7 15 0 0 0 360
VecSet 39 1.0 1.9766e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 6694 1.0 3.5559e+01 1.0 1.73e+10 1.0 0.0e+00 0.0e+00
0.0e+00 4 10 0 0 0 4 10 0 0 0 486
VecSwap 7622 1.0 5.6704e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 6 0 0 0 0 6 0 0 0 0 0
VecAssemblyBegin 9 1.0 2.6226e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 9 1.0 1.4305e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 10509 1.0 7.2255e+01 1.0 1.35e+10 1.0 0.0e+00 0.0e+00
0.0e+00 7 8 0 0 0 7 8 0 0 0 187
VecScatterBegin 18133 1.0 6.0826e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatMult 10506 1.0 1.9713e+02 1.0 5.43e+10 1.0 0.0e+00 0.0e+00
0.0e+00 20 33 0 0 0 20 33 0 0 0 276
MatMultAdd 7624 1.0 1.8330e+02 1.0 4.80e+10 1.0 0.0e+00 0.0e+00
0.0e+00 19 29 0 0 0 19 29 0 0 0 262
MatConvert 2 1.0 1.1073e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 9 1.0 1.6880e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 9 1.0 5.2771e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 2197878 1.0 3.9172e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatTranspose 2 1.0 9.3352e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 2 1.0 2.2173e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 2 1.0 1.5450e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 2 1.0 6.1989e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 43 333312256 0
Vector Scatter 10 10 6480 0
Index Set 12 12 9216 0
Matrix 19 19 900814592 0
Star Forest Bipartite Graph 2 2 1680 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc -fPIC
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O
${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm
-lX11 -lpthread -lssl -lcrypto -lm -ldl
--------------------------------------------------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./bin/balance on a arch-linux2-c-opt named g03 with 2 processors, by u06189 Sat
Aug 22 14:53:02 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015
Max Max/Min Avg Total
Time (sec): 1.498e+03 1.00000 1.498e+03
Objects: 8.700e+01 1.03571 8.550e+01
Flops: 9.640e+10 1.36633 8.348e+10 1.670e+11
Flops/sec: 6.434e+07 1.36633 5.572e+07 1.114e+08
MPI Messages: 1.816e+04 1.00000 1.816e+04 3.632e+04
MPI Message Lengths: 3.461e+10 1.00013 1.906e+06 6.921e+10
MPI Reductions: 1.469e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 1.4983e+03 100.0% 1.6696e+11 100.0% 3.632e+04 100.0%
1.906e+06 100.0% 1.468e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMax 10509 1.0 2.0602e+02 4.2 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+04 9 0 0 0 72 9 0 0 0 72 0
VecScale 21256 1.0 3.7198e+01 1.4 1.24e+10 1.0 0.0e+00 0.0e+00
0.0e+00 2 15 0 0 0 2 15 0 0 0 669
VecSet 12 1.2 2.1827e-02 3.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 6693 1.0 1.6525e+01 1.0 8.64e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 10 0 0 0 1 10 0 0 0 1045
VecSwap 7622 1.0 2.8392e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 2 0 0 0 0 2 0 0 0 0 0
VecAssemblyBegin 9 1.0 2.7049e-01 2.1 0.00e+00 0.0 1.4e+01 4.6e+06
2.7e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 9 1.0 8.3750e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 10508 1.0 3.7821e+01 1.0 6.77e+09 1.0 0.0e+00 0.0e+00
0.0e+00 3 8 0 0 0 3 8 0 0 0 358
VecScatterBegin 18132 1.0 5.1113e+01 6.7 0.00e+00 0.0 3.6e+04 1.9e+06
4.0e+00 2 0100100 0 2 0100100 0 0
VecScatterEnd 18128 1.0 8.9404e+02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 53 0 0 0 0 53 0 0 0 0 0
MatMult 10505 1.0 6.5591e+02 1.4 3.16e+10 1.4 2.1e+04 1.2e+06
0.0e+00 37 33 58 38 0 37 33 58 38 0 83
MatMultAdd 7624 1.0 7.0028e+02 2.3 3.26e+10 2.1 1.5e+04 2.8e+06
0.0e+00 34 29 42 62 0 34 29 42 62 0 69
MatConvert 2 1.0 8.1860e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 9 1.1 9.5876e-01 1.1 0.00e+00 0.0 1.8e+01 6.7e+06
1.6e+01 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 9 1.1 3.3645e+00 1.0 0.00e+00 0.0 1.6e+01 4.9e+05
3.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 1098940 1.0 2.0392e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatTranspose 2 1.0 1.5261e+00 1.0 0.00e+00 0.0 3.0e+01 1.5e+06
2.4e+01 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 2 1.0 4.3559e-0210.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 2 1.0 1.8891e-01 2.0 0.00e+00 0.0 1.0e+01 1.4e+06
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 2 1.0 1.0176e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 43 193899440 0
Vector Scatter 10 10 8976 0
Index Set 12 12 1020188 0
Matrix 19 19 456622536 0
Star Forest Bipartite Graph 2 2 1680 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 7.267e-05
Average time for zero size MPI_Send(): 5.07832e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc -fPIC
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O
${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm
-lX11 -lpthread -lssl -lcrypto -lm -ldl
--------------------------------------------------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./bin/balance on a arch-linux2-c-opt named g03 with 3 processors, by u06189 Sat
Aug 22 15:18:45 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015
Max Max/Min Avg Total
Time (sec): 1.542e+03 1.00003 1.542e+03
Objects: 8.700e+01 1.03571 8.500e+01
Flops: 7.317e+10 1.55489 5.581e+10 1.674e+11
Flops/sec: 4.743e+07 1.55485 3.618e+07 1.086e+08
MPI Messages: 2.723e+04 1.49871 2.421e+04 7.263e+04
MPI Message Lengths: 4.612e+10 1.98761 1.344e+06 9.759e+10
MPI Reductions: 1.468e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 1.5424e+03 100.0% 1.6743e+11 100.0% 7.263e+04 100.0%
1.344e+06 100.0% 1.468e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMax 10508 1.0 3.6979e+0212.4 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+04 10 0 0 0 72 10 0 0 0 72 0
VecScale 21254 1.0 1.9557e+01 1.1 8.29e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 15 0 0 0 1 15 0 0 0 1272
VecSet 12 1.2 2.2308e-0217.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 6692 1.0 1.0413e+01 1.0 5.76e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 10 0 0 0 1 10 0 0 0 1659
VecSwap 7622 1.0 1.8338e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAssemblyBegin 9 1.0 2.9303e-01 1.3 0.00e+00 0.0 2.8e+01 3.1e+06
2.7e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 9 1.0 1.0393e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 10507 1.0 2.6305e+01 1.0 4.51e+09 1.0 0.0e+00 0.0e+00
0.0e+00 2 8 0 0 0 2 8 0 0 0 515
VecScatterBegin 18131 1.0 3.1569e+01 3.5 0.00e+00 0.0 7.3e+04 1.3e+06
4.0e+00 2 0100100 0 2 0100100 0 0
VecScatterEnd 18127 1.0 1.0855e+03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 64 0 0 0 0 64 0 0 0 0 0
MatMult 10504 1.0 8.0156e+02 1.5 2.41e+10 1.6 4.2e+04 8.9e+05
0.0e+00 41 33 58 38 0 41 33 58 38 0 69
MatMultAdd 7624 1.0 7.9189e+02 5.0 2.76e+10 2.7 3.0e+04 2.0e+06
0.0e+00 35 29 42 61 0 35 29 42 61 0 61
MatConvert 2 1.0 7.2936e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 9 1.1 1.7575e+00 1.9 0.00e+00 0.0 3.6e+01 4.1e+06
1.6e+01 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 9 1.1 3.9437e+00 1.0 0.00e+00 0.0 3.2e+01 3.5e+05
3.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 732626 1.0 1.4379e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatTranspose 2 1.0 1.6155e+00 1.0 0.00e+00 0.0 6.0e+01 1.1e+06
2.4e+01 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 2 1.0 6.3762e-0222.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 2 1.0 2.5095e-01 1.9 0.00e+00 0.0 2.0e+01 1.0e+06
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 2 1.0 1.3723e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 43 145379600 0
Vector Scatter 10 10 8976 0
Index Set 12 12 713284 0
Matrix 19 19 331129708 0
Star Forest Bipartite Graph 2 2 1680 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 8.69274e-05
Average time for zero size MPI_Send(): 4.64122e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc -fPIC
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O
${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm
-lX11 -lpthread -lssl -lcrypto -lm -ldl
--------------------------------------------------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./bin/balance on a arch-linux2-c-opt named g03 with 4 processors, by u06189 Sat
Aug 22 15:45:14 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015
Max Max/Min Avg Total
Time (sec): 1.587e+03 1.00003 1.587e+03
Objects: 8.700e+01 1.03571 8.475e+01
Flops: 6.145e+10 1.75253 4.198e+10 1.679e+11
Flops/sec: 3.871e+07 1.75248 2.644e+07 1.058e+08
MPI Messages: 3.630e+04 1.99664 2.723e+04 1.089e+05
MPI Message Lengths: 5.202e+10 3.00589 1.060e+06 1.155e+11
MPI Reductions: 1.468e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 1.5874e+03 100.0% 1.6790e+11 100.0% 1.089e+05 100.0%
1.060e+06 100.0% 1.468e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMax 10506 1.0 4.3514e+0214.7 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+04 11 0 0 0 72 11 0 0 0 72 0
VecScale 21250 1.0 1.3338e+01 1.2 6.22e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 15 0 0 0 1 15 0 0 0 1864
VecSet 12 1.2 2.2970e-0232.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 6691 1.0 7.2026e+00 1.0 4.32e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 10 0 0 0 0 10 0 0 0 2397
VecSwap 7620 1.0 1.2976e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 1 0 0 0 0 1 0 0 0 0 0
VecAssemblyBegin 9 1.0 3.0734e-01 1.7 0.00e+00 0.0 4.2e+01 2.3e+06
2.7e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 9 1.0 1.2512e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 10505 1.0 2.0223e+01 1.0 3.38e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 8 0 0 0 1 8 0 0 0 669
VecScatterBegin 18127 1.0 2.6524e+01 2.8 0.00e+00 0.0 1.1e+05 1.1e+06
4.0e+00 1 0100100 0 1 0100100 0 0
VecScatterEnd 18123 1.0 1.2040e+03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 69 0 0 0 0 69 0 0 0 0 0
MatMult 10502 1.0 8.6774e+02 1.6 2.03e+10 1.8 6.3e+04 7.2e+05
0.0e+00 42 33 58 39 0 42 33 58 39 0 64
MatMultAdd 7622 1.0 8.5206e+02 7.1 2.50e+10 3.3 4.6e+04 1.5e+06
0.0e+00 37 29 42 61 0 37 29 42 61 0 56
MatConvert 2 1.0 6.6689e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 9 1.1 1.8324e+00 1.9 0.00e+00 0.0 5.4e+01 3.0e+06
1.6e+01 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 9 1.1 4.1303e+00 1.0 0.00e+00 0.0 4.8e+01 2.8e+05
3.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 549470 1.0 1.1548e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatTranspose 2 1.0 1.6093e+00 1.0 0.00e+00 0.0 9.0e+01 8.6e+05
2.4e+01 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 2 1.0 8.1864e-0234.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 2 1.0 2.9066e-01 2.1 0.00e+00 0.0 3.0e+01 7.8e+05
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 2 1.0 1.4956e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 43 121254824 0
Vector Scatter 10 10 8976 0
Index Set 12 12 586356 0
Matrix 19 19 268302788 0
Star Forest Bipartite Graph 2 2 1680 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 9.54151e-05
Average time for zero size MPI_Send(): 4.47631e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc -fPIC
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O
${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm
-lX11 -lpthread -lssl -lcrypto -lm -ldl
--------------------------------------------------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./bin/balance on a arch-linux2-c-opt named g03 with 5 processors, by u06189 Sat
Aug 22 16:24:39 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015
Max Max/Min Avg Total
Time (sec): 2.364e+03 1.00003 2.364e+03
Objects: 8.700e+01 1.03571 8.460e+01
Flops: 5.436e+10 1.94379 3.369e+10 1.685e+11
Flops/sec: 2.300e+07 1.94375 1.425e+07 7.127e+07
MPI Messages: 4.538e+04 2.49398 2.905e+04 1.453e+05
MPI Message Lengths: 5.561e+10 4.03642 8.890e+05 1.291e+11
MPI Reductions: 1.469e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 2.3637e+03 100.0% 1.6845e+11 100.0% 1.453e+05 100.0%
8.890e+05 100.0% 1.468e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMax 10509 1.0 1.0526e+03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+04 35 0 0 0 72 35 0 0 0 72 0
VecScale 21256 1.0 1.0645e+01 1.1 4.97e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 15 0 0 0 0 15 0 0 0 2337
VecSet 12 1.2 2.0848e-0226.8 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 6693 1.0 5.6817e+00 1.1 3.45e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 10 0 0 0 0 10 0 0 0 3040
VecSwap 7622 1.0 9.7368e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 9 1.0 4.0851e-01 2.0 0.00e+00 0.0 5.6e+01 1.8e+06
2.7e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 9 1.0 1.3150e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 10508 1.0 1.6680e+01 1.1 2.71e+09 1.0 0.0e+00 0.0e+00
0.0e+00 1 8 0 0 0 1 8 0 0 0 812
VecScatterBegin 18132 1.0 2.1869e+01 2.3 0.00e+00 0.0 1.5e+05 8.9e+05
4.0e+00 1 0100100 0 1 0100100 0 0
VecScatterEnd 18128 1.0 1.6711e+03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 53 0 0 0 0 53 0 0 0 0 0
MatMult 10505 1.0 1.2782e+03 2.8 1.80e+10 2.0 8.4e+04 6.1e+05
0.0e+00 35 33 58 40 0 35 33 58 40 0 44
MatMultAdd 7624 1.0 7.3081e+02 2.5 2.35e+10 3.9 6.1e+04 1.3e+06
0.0e+00 24 28 42 60 0 24 28 42 60 0 66
MatConvert 2 1.0 6.2242e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 9 1.1 2.0017e+00 2.0 0.00e+00 0.0 7.2e+01 2.4e+06
1.6e+01 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 9 1.1 4.5218e+00 1.0 0.00e+00 0.0 6.4e+01 2.4e+05
3.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 439576 1.0 9.8525e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatTranspose 2 1.0 1.5088e+00 1.0 0.00e+00 0.0 1.2e+02 7.3e+05
2.4e+01 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 2 1.0 9.2637e-0242.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 2 1.0 2.7788e-01 2.6 0.00e+00 0.0 4.0e+01 6.6e+05
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 2 1.0 1.1188e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 43 106685440 0
Vector Scatter 10 10 8976 0
Index Set 12 12 495732 0
Matrix 19 19 230596360 0
Star Forest Bipartite Graph 2 2 1680 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000231123
Average time for zero size MPI_Send(): 4.99725e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc -fPIC
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O
${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm
-lX11 -lpthread -lssl -lcrypto -lm -ldl
--------------------------------------------------------------------------------------- PETSc Performance Summary:
----------------------------------------------
./bin/balance on a arch-linux2-c-opt named g03 with 6 processors, by u06189 Sat
Aug 22 17:19:35 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015
Max Max/Min Avg Total
Time (sec): 3.294e+03 1.00003 3.294e+03
Objects: 8.700e+01 1.03571 8.450e+01
Flops: 4.963e+10 2.13897 2.817e+10 1.690e+11
Flops/sec: 1.507e+07 2.13896 8.551e+06 5.130e+07
MPI Messages: 5.445e+04 2.99066 3.027e+04 1.816e+05
MPI Message Lengths: 5.802e+10 5.10574 7.772e+05 1.411e+11
MPI Reductions: 1.469e+04 1.00000
Flop counting convention: 1 flop = 1 real number operation of type
(multiply/divide/add/subtract)
e.g., VecAXPY() for real vectors of length N --> 2N
flops
and VecAXPY() for complex vectors of length N -->
8N flops
Summary of Stages: ----- Time ------ ----- Flops ----- --- Messages --- --
Message Lengths -- -- Reductions --
Avg %Total Avg %Total counts %Total
Avg %Total counts %Total
0: Main Stage: 3.2942e+03 100.0% 1.6900e+11 100.0% 1.816e+05 100.0%
7.772e+05 100.0% 1.469e+04 100.0%
------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting
output.
Phase summary info:
Count: number of times phase was executed
Time and Flops: Max - maximum over all processors
Ratio - ratio of maximum to minimum over all processors
Mess: number of messages sent
Avg. len: average message length (bytes)
Reduct: number of global reductions
Global: entire computation
Stage: stages of a computation. Set stages with PetscLogStagePush() and
PetscLogStagePop().
%T - percent time in this phase %F - percent flops in this phase
%M - percent messages in this phase %L - percent message lengths in
this phase
%R - percent reductions in this phase
Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all
processors)
------------------------------------------------------------------------------------------------------------------------
Event Count Time (sec) Flops
--- Global --- --- Stage --- Total
Max Ratio Max Ratio Max Ratio Mess Avg len Reduct
%T %F %M %L %R %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------
--- Event Stage 0: Main Stage
VecMax 10510 1.0 1.5516e+0318.1 0.00e+00 0.0 0.0e+00 0.0e+00
1.1e+04 32 0 0 0 72 32 0 0 0 72 0
VecScale 21258 1.0 8.6332e+00 1.2 4.15e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 15 0 0 0 0 15 0 0 0 2881
VecSet 12 1.2 2.0830e-0237.0 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAYPX 6694 1.0 4.8097e+00 1.2 2.88e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 10 0 0 0 0 10 0 0 0 3592
VecSwap 7622 1.0 7.2560e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyBegin 9 1.0 4.3204e-01 1.9 0.00e+00 0.0 7.0e+01 1.5e+06
2.7e+01 0 0 0 0 0 0 0 0 0 0 0
VecAssemblyEnd 9 1.0 1.3781e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
VecPointwiseMult 10509 1.0 1.4113e+01 1.1 2.26e+09 1.0 0.0e+00 0.0e+00
0.0e+00 0 8 0 0 0 0 8 0 0 0 960
VecScatterBegin 18133 1.0 2.1887e+01 1.7 0.00e+00 0.0 1.8e+05 7.8e+05
4.0e+00 1 0100100 0 1 0100100 0 0
VecScatterEnd 18129 1.0 2.9113e+03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 61 0 0 0 0 61 0 0 0 0 0
MatMult 10506 1.0 1.6013e+03 1.3 1.65e+10 2.2 1.1e+05 5.4e+05
0.0e+00 42 34 58 40 0 42 34 58 40 0 35
MatMultAdd 7624 1.0 1.4946e+0330.0 2.24e+10 4.5 7.6e+04 1.1e+06
0.0e+00 22 28 42 60 0 22 28 42 60 0 32
MatConvert 2 1.0 5.6268e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyBegin 9 1.1 2.0523e+00 2.0 0.00e+00 0.0 9.0e+01 2.0e+06
1.6e+01 0 0 0 0 0 0 0 0 0 0 0
MatAssemblyEnd 9 1.1 4.9848e+00 1.0 0.00e+00 0.0 8.0e+01 2.1e+05
3.2e+01 0 0 0 0 0 0 0 0 0 0 0
MatGetRow 366314 1.0 8.8142e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
MatTranspose 2 1.0 2.4237e+00 1.0 0.00e+00 0.0 1.5e+02 6.4e+05
2.4e+01 0 0 0 0 0 0 0 0 0 0 0
SFSetGraph 2 1.0 1.0412e-0148.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceBegin 2 1.0 3.3849e-01 2.1 0.00e+00 0.0 5.0e+01 5.7e+05
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
SFReduceEnd 2 1.0 5.4796e-01 3.4 0.00e+00 0.0 0.0e+00 0.0e+00
0.0e+00 0 0 0 0 0 0 0 0 0 0 0
------------------------------------------------------------------------------------------------------------------------
Memory usage is given in bytes:
Object Type Creations Destructions Memory Descendants' Mem.
Reports information only for process 0.
--- Event Stage 0: Main Stage
Vector 43 43 96945944 0
Vector Scatter 10 10 8976 0
Index Set 12 12 415656 0
Matrix 19 19 205360064 0
Star Forest Bipartite Graph 2 2 1680 0
Viewer 1 0 0 0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 0.000215769
Average time for zero size MPI_Send(): 5.94854e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------
Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc -fPIC
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O
${COPTFLAGS} ${CFLAGS}
-----------------------------------------
Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------
Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm
-lX11 -lpthread -lssl -lcrypto -lm -ldl
-----------------------------------------