Re: [petsc-users] Scalability issue

Nelson Filipe Lopes da Silva Sat, 22 Aug 2015 14:18:07 -0700

Hi.


I managed to finish the re-implementation. I ran the program
with 1,2,3,4,5,6 machines and saved the summary. I send each of them in
this email.
In these executions, the program performs Matrix-Vector
(MatMult, MatMultAdd) products and Vector-Vector operations. From what I
understand while reading the logs, the program takes most of the time in
"VecScatterEnd". 
In this example, the matrix taking part on the
Matrix-Vector products is not "much diagonal heavy". 
The following
numbers are the percentages of nnz values on the matrix diagonal block
for each machine, and each execution time.
NMachines %NNZ ExecTime 
1
machine0 100%; 16min08sec

2 machine0 91.1%; 24min58sec 
 machine1
69.2%; 

3 machine0 90.9% 25min42sec
 machine1 82.8%
 machine2 51.6%

4
machine0 91.9% 26min27sec 
 machine1 82.4%
 machine2 73.1%
 machine3
39.9%

5 machine0 93.2% 39min23sec
 machine1 82.8%
 machine2 74.4%

machine3 64.6%
 machine4 31.6%

6 machine0 94.2% 54min54sec
 machine1
82.6%
 machine2 73.1%
 machine3 65.2%
 machine4 55.9% 
 machine5 25.4%


In this implementation I'm using MatCreate and VecCreate. I'm also
leaving the partition sizes in PETSC_DECIDE. 

Finally, to run the
application, I'm using mpirun.hydra from mpich, downloaded by PETSc
configure script.
I'm checking the process assignment as suggested on
the last email.

Am I missing anything?

Regards,
Nelson 

Em 2015-08-20
16:17, Matthew Knepley escreveu: 

> On Thu, Aug 20, 2015 at 6:30 AM,
Nelson Filipe Lopes da Silva <[email protected] [3]> wrote:
> 
>>
Hello.
>> 
>> I am sorry for the long time without response. I decided
to rewrite my application in a different way and will send the
log_summary output when done reimplementing.
>> 
>> As for the machine,
I am using mpirun to run jobs in a 8 node cluster. I modified the
makefile on the steams folder so it would run using my hostfile.
>> The
output is attached to this email. It seems reasonable for a cluster with
8 machines. From "lscpu", each machine cpu has 4 cores and 1 socket.
>

> 1) You launcher is placing processes haphazardly. I would figure out
how to assign them to certain nodes 
> 2) Each node has enough bandwidth
for 1 core, so it does not make much sense to use more than 1. 
>
Thanks, 
> Matt 
> 
>> Cheers,
>> Nelson
>> 
>> Em 2015-07-24 16:50,
Barry Smith escreveu:
>> 
>>> It would be very helpful if you ran the
code on say 1, 2, 4, 8, 16
>>> ... processes with the option
-log_summary and send (as attachments)
>>> the log summary
information.
>>> 
>>> Also on the same machine run the streams
benchmark; with recent
>>> releases of PETSc you only need to do
>>>

>>> cd $PETSC_DIR
>>> make streams NPMAX=16 (or whatever your largest
process count is)
>>> 
>>> and send the output.
>>> 
>>> I suspect that
you are doing everything fine and it is more an issue
>>> with the
configuration of your machine. Also read the information at
>>>
http://www.mcs.anl.gov/petsc/documentation/faq.html#computers [2] on
>>>
"binding"
>>> 
>>> Barry
>>> 
>>>> On Jul 24, 2015, at 10:41 AM, Nelson
Filipe Lopes da Silva <[email protected] [1]> wrote:
>>>> 
>>>>
Hello,
>>>> 
>>>> I have been using PETSc for a few months now, and it
truly is fantastic piece of software.
>>>> 
>>>> In my particular
example I am working with a large, sparse distributed (MPI AIJ) matrix
we can refer as 'G'.
>>>> G is a horizontal - retangular matrix (for
example, 1,1 Million rows per 2,1 Million columns). This matrix is
commonly very sparse and not diagonal 'heavy' (for example 5,2 Million
nnz in which ~50% are on the diagonal block of MPI AIJ
representation).
>>>> To work with this matrix, I also have a few
parallel vectors (created using MatCreate Vec), we can refer as 'm' and
'k'.
>>>> I am trying to parallelize an iterative algorithm in which the
most computational heavy operations are:
>>>> 
>>>> ->Matrix-Vector
Multiplication, more precisely G * m + k = b (MatMultAdd). From what I
have been reading, to achive a good speedup in this operation, G should
be as much diagonal as possible, due to overlapping communication and
computation. But even when using a G matrix in which the diagonal block
has ~95% of the nnz, I cannot get a decent speedup. Most of the times,
the performance even gets worse.
>>>> 
>>>> ->Matrix-Matrix
Multiplication, in this case I need to perform G * G' = A, where A is
later used on the linear solver and G' is transpose of G. The speedup in
this operation is not worse, although is not very good.
>>>> 
>>>>
->Linear problem solving. Lastly, In this operation I compute "Ax=b"
from the last two operations. I tried to apply a RCM permutation to A to
make it more diagonal, for better performance. However, the problem I
faced was that, the permutation is performed locally in each processor
and thus, the final result is different with different number of
processors. I assume this was intended to reduce communication. The
solution I found was
>>>> 1-calculate A
>>>> 2-calculate, localy to 1
machine, the RCM permutation IS using A
>>>> 3-apply this permutation to
the lines of G.
>>>> This works well, and A is generated as if RCM
permuted. It is fine to do this operation in one machine because it is
only done once while reading the input. The nnz of G become more spread
and less diagonal, causing problems when calculating G * m + k = b.
>>>>

>>>> These 3 operations (except the permutation) are performed in each
iteration of my algorithm.
>>>> 
>>>> So, my questions are.
>>>> -What
are the characteristics of G that lead to a good speedup in the
operations I described? Am I missing something and too much obsessed
with the diagonal block?
>>>> 
>>>> -Is there a better way to permute A
without permute G and still get the same result using 1 or N
machines?
>>>> 
>>>> I have been avoiding asking for help for a while.
I'm very sorry for the long email.
>>>> Thank you very much for your
time.
>>>> Best Regards,
>>>> Nelson
> 
> -- 
> 
> What most
experimenters take for granted before they begin their experiments is
infinitely more interesting than any results to which their experiments
lead.
> -- Norbert Wiener

 

Links:
------
[1]
mailto:[email protected]
[2]
http://www.mcs.anl.gov/petsc/documentation/faq.html#computers
[3]
mailto:[email protected]

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./bin/balance on a arch-linux2-c-opt named g03 with 1 processor, by u06189 Sat 
Aug 22 14:28:02 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           9.684e+02      1.00000   9.684e+02
Objects:              8.700e+01      1.00000   8.700e+01
Flops:                1.667e+11      1.00000   1.667e+11  1.667e+11
Flops/sec:            1.721e+08      1.00000   1.721e+08  1.721e+08
MPI Messages:         0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00      0.00000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00      0.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 9.6837e+02 100.0%  1.6666e+11 100.0%  0.000e+00   0.0%  
0.000e+00        0.0%  0.000e+00   0.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMax             10510 1.0 3.6155e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  4  0  0  0  0   4  0  0  0  0     0
VecScale           21258 1.0 6.9022e+01 1.0 2.49e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  7 15  0  0  0   7 15  0  0  0   360
VecSet                39 1.0 1.9766e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX             6694 1.0 3.5559e+01 1.0 1.73e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  4 10  0  0  0   4 10  0  0  0   486
VecSwap             7622 1.0 5.6704e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  6  0  0  0  0   6  0  0  0  0     0
VecAssemblyBegin       9 1.0 2.6226e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         9 1.0 1.4305e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult   10509 1.0 7.2255e+01 1.0 1.35e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  7  8  0  0  0   7  8  0  0  0   187
VecScatterBegin    18133 1.0 6.0826e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatMult            10506 1.0 1.9713e+02 1.0 5.43e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 20 33  0  0  0  20 33  0  0  0   276
MatMultAdd          7624 1.0 1.8330e+02 1.0 4.80e+10 1.0 0.0e+00 0.0e+00 
0.0e+00 19 29  0  0  0  19 29  0  0  0   262
MatConvert             2 1.0 1.1073e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       9 1.0 1.6880e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         9 1.0 5.2771e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatGetRow        2197878 1.0 3.9172e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatTranspose           2 1.0 9.3352e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             2 1.0 2.2173e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin          2 1.0 1.5450e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd            2 1.0 6.1989e-06 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    43             43    333312256     0
      Vector Scatter    10             10         6480     0
           Index Set    12             12         9216     0
              Matrix    19             19    900814592     0
Star Forest Bipartite Graph     2              2         1680     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0 
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03 
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc  -fPIC 
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  
${COPTFLAGS} ${CFLAGS}
-----------------------------------------

Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include 
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include 
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc 
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm 
-lX11 -lpthread -lssl -lcrypto -lm -ldl 
-----------------------------------------

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./bin/balance on a arch-linux2-c-opt named g03 with 2 processors, by u06189 Sat 
Aug 22 14:53:02 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.498e+03      1.00000   1.498e+03
Objects:              8.700e+01      1.03571   8.550e+01
Flops:                9.640e+10      1.36633   8.348e+10  1.670e+11
Flops/sec:            6.434e+07      1.36633   5.572e+07  1.114e+08
MPI Messages:         1.816e+04      1.00000   1.816e+04  3.632e+04
MPI Message Lengths:  3.461e+10      1.00013   1.906e+06  6.921e+10
MPI Reductions:       1.469e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 1.4983e+03 100.0%  1.6696e+11 100.0%  3.632e+04 100.0%  
1.906e+06      100.0%  1.468e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMax             10509 1.0 2.0602e+02 4.2 0.00e+00 0.0 0.0e+00 0.0e+00 
1.1e+04  9  0  0  0 72   9  0  0  0 72     0
VecScale           21256 1.0 3.7198e+01 1.4 1.24e+10 1.0 0.0e+00 0.0e+00 
0.0e+00  2 15  0  0  0   2 15  0  0  0   669
VecSet                12 1.2 2.1827e-02 3.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX             6693 1.0 1.6525e+01 1.0 8.64e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  1 10  0  0  0   1 10  0  0  0  1045
VecSwap             7622 1.0 2.8392e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  2  0  0  0  0   2  0  0  0  0     0
VecAssemblyBegin       9 1.0 2.7049e-01 2.1 0.00e+00 0.0 1.4e+01 4.6e+06 
2.7e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         9 1.0 8.3750e-01 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult   10508 1.0 3.7821e+01 1.0 6.77e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  3  8  0  0  0   3  8  0  0  0   358
VecScatterBegin    18132 1.0 5.1113e+01 6.7 0.00e+00 0.0 3.6e+04 1.9e+06 
4.0e+00  2  0100100  0   2  0100100  0     0
VecScatterEnd      18128 1.0 8.9404e+02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 53  0  0  0  0  53  0  0  0  0     0
MatMult            10505 1.0 6.5591e+02 1.4 3.16e+10 1.4 2.1e+04 1.2e+06 
0.0e+00 37 33 58 38  0  37 33 58 38  0    83
MatMultAdd          7624 1.0 7.0028e+02 2.3 3.26e+10 2.1 1.5e+04 2.8e+06 
0.0e+00 34 29 42 62  0  34 29 42 62  0    69
MatConvert             2 1.0 8.1860e-02 1.6 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       9 1.1 9.5876e-01 1.1 0.00e+00 0.0 1.8e+01 6.7e+06 
1.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         9 1.1 3.3645e+00 1.0 0.00e+00 0.0 1.6e+01 4.9e+05 
3.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRow        1098940 1.0 2.0392e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatTranspose           2 1.0 1.5261e+00 1.0 0.00e+00 0.0 3.0e+01 1.5e+06 
2.4e+01  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             2 1.0 4.3559e-0210.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin          2 1.0 1.8891e-01 2.0 0.00e+00 0.0 1.0e+01 1.4e+06 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd            2 1.0 1.0176e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    43             43    193899440     0
      Vector Scatter    10             10         8976     0
           Index Set    12             12      1020188     0
              Matrix    19             19    456622536     0
Star Forest Bipartite Graph     2              2         1680     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 7.267e-05
Average time for zero size MPI_Send(): 5.07832e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0 
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03 
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc  -fPIC 
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  
${COPTFLAGS} ${CFLAGS}
-----------------------------------------

Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include 
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include 
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc 
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm 
-lX11 -lpthread -lssl -lcrypto -lm -ldl 
-----------------------------------------

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./bin/balance on a arch-linux2-c-opt named g03 with 3 processors, by u06189 Sat 
Aug 22 15:18:45 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.542e+03      1.00003   1.542e+03
Objects:              8.700e+01      1.03571   8.500e+01
Flops:                7.317e+10      1.55489   5.581e+10  1.674e+11
Flops/sec:            4.743e+07      1.55485   3.618e+07  1.086e+08
MPI Messages:         2.723e+04      1.49871   2.421e+04  7.263e+04
MPI Message Lengths:  4.612e+10      1.98761   1.344e+06  9.759e+10
MPI Reductions:       1.468e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 1.5424e+03 100.0%  1.6743e+11 100.0%  7.263e+04 100.0%  
1.344e+06      100.0%  1.468e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMax             10508 1.0 3.6979e+0212.4 0.00e+00 0.0 0.0e+00 0.0e+00 
1.1e+04 10  0  0  0 72  10  0  0  0 72     0
VecScale           21254 1.0 1.9557e+01 1.1 8.29e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  1 15  0  0  0   1 15  0  0  0  1272
VecSet                12 1.2 2.2308e-0217.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX             6692 1.0 1.0413e+01 1.0 5.76e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  1 10  0  0  0   1 10  0  0  0  1659
VecSwap             7622 1.0 1.8338e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAssemblyBegin       9 1.0 2.9303e-01 1.3 0.00e+00 0.0 2.8e+01 3.1e+06 
2.7e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         9 1.0 1.0393e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult   10507 1.0 2.6305e+01 1.0 4.51e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  2  8  0  0  0   2  8  0  0  0   515
VecScatterBegin    18131 1.0 3.1569e+01 3.5 0.00e+00 0.0 7.3e+04 1.3e+06 
4.0e+00  2  0100100  0   2  0100100  0     0
VecScatterEnd      18127 1.0 1.0855e+03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 64  0  0  0  0  64  0  0  0  0     0
MatMult            10504 1.0 8.0156e+02 1.5 2.41e+10 1.6 4.2e+04 8.9e+05 
0.0e+00 41 33 58 38  0  41 33 58 38  0    69
MatMultAdd          7624 1.0 7.9189e+02 5.0 2.76e+10 2.7 3.0e+04 2.0e+06 
0.0e+00 35 29 42 61  0  35 29 42 61  0    61
MatConvert             2 1.0 7.2936e-02 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       9 1.1 1.7575e+00 1.9 0.00e+00 0.0 3.6e+01 4.1e+06 
1.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         9 1.1 3.9437e+00 1.0 0.00e+00 0.0 3.2e+01 3.5e+05 
3.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRow         732626 1.0 1.4379e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatTranspose           2 1.0 1.6155e+00 1.0 0.00e+00 0.0 6.0e+01 1.1e+06 
2.4e+01  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             2 1.0 6.3762e-0222.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin          2 1.0 2.5095e-01 1.9 0.00e+00 0.0 2.0e+01 1.0e+06 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd            2 1.0 1.3723e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    43             43    145379600     0
      Vector Scatter    10             10         8976     0
           Index Set    12             12       713284     0
              Matrix    19             19    331129708     0
Star Forest Bipartite Graph     2              2         1680     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 8.69274e-05
Average time for zero size MPI_Send(): 4.64122e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0 
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03 
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc  -fPIC 
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  
${COPTFLAGS} ${CFLAGS}
-----------------------------------------

Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include 
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include 
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc 
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm 
-lX11 -lpthread -lssl -lcrypto -lm -ldl 
-----------------------------------------

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./bin/balance on a arch-linux2-c-opt named g03 with 4 processors, by u06189 Sat 
Aug 22 15:45:14 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           1.587e+03      1.00003   1.587e+03
Objects:              8.700e+01      1.03571   8.475e+01
Flops:                6.145e+10      1.75253   4.198e+10  1.679e+11
Flops/sec:            3.871e+07      1.75248   2.644e+07  1.058e+08
MPI Messages:         3.630e+04      1.99664   2.723e+04  1.089e+05
MPI Message Lengths:  5.202e+10      3.00589   1.060e+06  1.155e+11
MPI Reductions:       1.468e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 1.5874e+03 100.0%  1.6790e+11 100.0%  1.089e+05 100.0%  
1.060e+06      100.0%  1.468e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMax             10506 1.0 4.3514e+0214.7 0.00e+00 0.0 0.0e+00 0.0e+00 
1.1e+04 11  0  0  0 72  11  0  0  0 72     0
VecScale           21250 1.0 1.3338e+01 1.2 6.22e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  1 15  0  0  0   1 15  0  0  0  1864
VecSet                12 1.2 2.2970e-0232.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX             6691 1.0 7.2026e+00 1.0 4.32e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0 10  0  0  0   0 10  0  0  0  2397
VecSwap             7620 1.0 1.2976e+01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  1  0  0  0  0   1  0  0  0  0     0
VecAssemblyBegin       9 1.0 3.0734e-01 1.7 0.00e+00 0.0 4.2e+01 2.3e+06 
2.7e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         9 1.0 1.2512e+00 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult   10505 1.0 2.0223e+01 1.0 3.38e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  1  8  0  0  0   1  8  0  0  0   669
VecScatterBegin    18127 1.0 2.6524e+01 2.8 0.00e+00 0.0 1.1e+05 1.1e+06 
4.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd      18123 1.0 1.2040e+03 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 69  0  0  0  0  69  0  0  0  0     0
MatMult            10502 1.0 8.6774e+02 1.6 2.03e+10 1.8 6.3e+04 7.2e+05 
0.0e+00 42 33 58 39  0  42 33 58 39  0    64
MatMultAdd          7622 1.0 8.5206e+02 7.1 2.50e+10 3.3 4.6e+04 1.5e+06 
0.0e+00 37 29 42 61  0  37 29 42 61  0    56
MatConvert             2 1.0 6.6689e-02 2.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       9 1.1 1.8324e+00 1.9 0.00e+00 0.0 5.4e+01 3.0e+06 
1.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         9 1.1 4.1303e+00 1.0 0.00e+00 0.0 4.8e+01 2.8e+05 
3.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRow         549470 1.0 1.1548e-01 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatTranspose           2 1.0 1.6093e+00 1.0 0.00e+00 0.0 9.0e+01 8.6e+05 
2.4e+01  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             2 1.0 8.1864e-0234.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin          2 1.0 2.9066e-01 2.1 0.00e+00 0.0 3.0e+01 7.8e+05 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd            2 1.0 1.4956e-01 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    43             43    121254824     0
      Vector Scatter    10             10         8976     0
           Index Set    12             12       586356     0
              Matrix    19             19    268302788     0
Star Forest Bipartite Graph     2              2         1680     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 9.54151e-05
Average time for zero size MPI_Send(): 4.47631e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0 
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03 
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc  -fPIC 
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  
${COPTFLAGS} ${CFLAGS}
-----------------------------------------

Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include 
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include 
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc 
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm 
-lX11 -lpthread -lssl -lcrypto -lm -ldl 
-----------------------------------------

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./bin/balance on a arch-linux2-c-opt named g03 with 5 processors, by u06189 Sat 
Aug 22 16:24:39 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           2.364e+03      1.00003   2.364e+03
Objects:              8.700e+01      1.03571   8.460e+01
Flops:                5.436e+10      1.94379   3.369e+10  1.685e+11
Flops/sec:            2.300e+07      1.94375   1.425e+07  7.127e+07
MPI Messages:         4.538e+04      2.49398   2.905e+04  1.453e+05
MPI Message Lengths:  5.561e+10      4.03642   8.890e+05  1.291e+11
MPI Reductions:       1.469e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 2.3637e+03 100.0%  1.6845e+11 100.0%  1.453e+05 100.0%  
8.890e+05      100.0%  1.468e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMax             10509 1.0 1.0526e+03 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 
1.1e+04 35  0  0  0 72  35  0  0  0 72     0
VecScale           21256 1.0 1.0645e+01 1.1 4.97e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0 15  0  0  0   0 15  0  0  0  2337
VecSet                12 1.2 2.0848e-0226.8 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX             6693 1.0 5.6817e+00 1.1 3.45e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0 10  0  0  0   0 10  0  0  0  3040
VecSwap             7622 1.0 9.7368e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin       9 1.0 4.0851e-01 2.0 0.00e+00 0.0 5.6e+01 1.8e+06 
2.7e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         9 1.0 1.3150e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult   10508 1.0 1.6680e+01 1.1 2.71e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  1  8  0  0  0   1  8  0  0  0   812
VecScatterBegin    18132 1.0 2.1869e+01 2.3 0.00e+00 0.0 1.5e+05 8.9e+05 
4.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd      18128 1.0 1.6711e+03 1.7 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 53  0  0  0  0  53  0  0  0  0     0
MatMult            10505 1.0 1.2782e+03 2.8 1.80e+10 2.0 8.4e+04 6.1e+05 
0.0e+00 35 33 58 40  0  35 33 58 40  0    44
MatMultAdd          7624 1.0 7.3081e+02 2.5 2.35e+10 3.9 6.1e+04 1.3e+06 
0.0e+00 24 28 42 60  0  24 28 42 60  0    66
MatConvert             2 1.0 6.2242e-02 2.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       9 1.1 2.0017e+00 2.0 0.00e+00 0.0 7.2e+01 2.4e+06 
1.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         9 1.1 4.5218e+00 1.0 0.00e+00 0.0 6.4e+01 2.4e+05 
3.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRow         439576 1.0 9.8525e-02 1.3 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatTranspose           2 1.0 1.5088e+00 1.0 0.00e+00 0.0 1.2e+02 7.3e+05 
2.4e+01  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             2 1.0 9.2637e-0242.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin          2 1.0 2.7788e-01 2.6 0.00e+00 0.0 4.0e+01 6.6e+05 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd            2 1.0 1.1188e-01 1.2 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    43             43    106685440     0
      Vector Scatter    10             10         8976     0
           Index Set    12             12       495732     0
              Matrix    19             19    230596360     0
Star Forest Bipartite Graph     2              2         1680     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 9.53674e-08
Average time for MPI_Barrier(): 0.000231123
Average time for zero size MPI_Send(): 4.99725e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0 
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03 
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc  -fPIC 
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  
${COPTFLAGS} ${CFLAGS}
-----------------------------------------

Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include 
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include 
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc 
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm 
-lX11 -lpthread -lssl -lcrypto -lm -ldl 
-----------------------------------------

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------

./bin/balance on a arch-linux2-c-opt named g03 with 6 processors, by u06189 Sat 
Aug 22 17:19:35 2015
Using Petsc Release Version 3.6.1, Jul, 22, 2015 

                         Max       Max/Min        Avg      Total 
Time (sec):           3.294e+03      1.00003   3.294e+03
Objects:              8.700e+01      1.03571   8.450e+01
Flops:                4.963e+10      2.13897   2.817e+10  1.690e+11
Flops/sec:            1.507e+07      2.13896   8.551e+06  5.130e+07
MPI Messages:         5.445e+04      2.99066   3.027e+04  1.816e+05
MPI Message Lengths:  5.802e+10      5.10574   7.772e+05  1.411e+11
MPI Reductions:       1.469e+04      1.00000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flops
                            and VecAXPY() for complex vectors of length N --> 
8N flops

Summary of Stages:   ----- Time ------  ----- Flops -----  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total   counts   %Total     
Avg         %Total   counts   %Total 
 0:      Main Stage: 3.2942e+03 100.0%  1.6900e+11 100.0%  1.816e+05 100.0%  
7.772e+05      100.0%  1.469e+04 100.0% 

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flops: Max - maximum over all processors
                   Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   Avg. len: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flops in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flops over all processors)/(max time over all 
processors)
------------------------------------------------------------------------------------------------------------------------
Event                Count      Time (sec)     Flops                            
 --- Global ---  --- Stage ---   Total
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   Avg len Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s
------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

VecMax             10510 1.0 1.5516e+0318.1 0.00e+00 0.0 0.0e+00 0.0e+00 
1.1e+04 32  0  0  0 72  32  0  0  0 72     0
VecScale           21258 1.0 8.6332e+00 1.2 4.15e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0 15  0  0  0   0 15  0  0  0  2881
VecSet                12 1.2 2.0830e-0237.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAYPX             6694 1.0 4.8097e+00 1.2 2.88e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0 10  0  0  0   0 10  0  0  0  3592
VecSwap             7622 1.0 7.2560e+00 1.1 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyBegin       9 1.0 4.3204e-01 1.9 0.00e+00 0.0 7.0e+01 1.5e+06 
2.7e+01  0  0  0  0  0   0  0  0  0  0     0
VecAssemblyEnd         9 1.0 1.3781e+00 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
VecPointwiseMult   10509 1.0 1.4113e+01 1.1 2.26e+09 1.0 0.0e+00 0.0e+00 
0.0e+00  0  8  0  0  0   0  8  0  0  0   960
VecScatterBegin    18133 1.0 2.1887e+01 1.7 0.00e+00 0.0 1.8e+05 7.8e+05 
4.0e+00  1  0100100  0   1  0100100  0     0
VecScatterEnd      18129 1.0 2.9113e+03 1.9 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 61  0  0  0  0  61  0  0  0  0     0
MatMult            10506 1.0 1.6013e+03 1.3 1.65e+10 2.2 1.1e+05 5.4e+05 
0.0e+00 42 34 58 40  0  42 34 58 40  0    35
MatMultAdd          7624 1.0 1.4946e+0330.0 2.24e+10 4.5 7.6e+04 1.1e+06 
0.0e+00 22 28 42 60  0  22 28 42 60  0    32
MatConvert             2 1.0 5.6268e-02 2.5 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyBegin       9 1.1 2.0523e+00 2.0 0.00e+00 0.0 9.0e+01 2.0e+06 
1.6e+01  0  0  0  0  0   0  0  0  0  0     0
MatAssemblyEnd         9 1.1 4.9848e+00 1.0 0.00e+00 0.0 8.0e+01 2.1e+05 
3.2e+01  0  0  0  0  0   0  0  0  0  0     0
MatGetRow         366314 1.0 8.8142e-02 1.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
MatTranspose           2 1.0 2.4237e+00 1.0 0.00e+00 0.0 1.5e+02 6.4e+05 
2.4e+01  0  0  0  0  0   0  0  0  0  0     0
SFSetGraph             2 1.0 1.0412e-0148.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceBegin          2 1.0 3.3849e-01 2.1 0.00e+00 0.0 5.0e+01 5.7e+05 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
SFReduceEnd            2 1.0 5.4796e-01 3.4 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0
------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Vector    43             43     96945944     0
      Vector Scatter    10             10         8976     0
           Index Set    12             12       415656     0
              Matrix    19             19    205360064     0
Star Forest Bipartite Graph     2              2         1680     0
              Viewer     1              0            0     0
========================================================================================================================
Average time to get PetscTime(): 1.19209e-07
Average time for MPI_Barrier(): 0.000215769
Average time for zero size MPI_Send(): 5.94854e-05
#PETSc Option Table entries:
-log_summary
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: --with-fc=0 --with-cxx=0 --with-debugging=0 
--download-mpich=1 --download-f2cblaslapack=1
-----------------------------------------
Libraries compiled on Thu Jul 30 15:55:55 2015 on g03 
Machine characteristics: Linux-3.16.7-21-desktop-x86_64-with-SuSE-13.2-x86_64
Using PETSc directory: /ffs/u/u06189/petsc-3.6.1
Using PETSc arch: arch-linux2-c-opt
-----------------------------------------

Using C compiler: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc  -fPIC 
-Wall -Wwrite-strings -Wno-strict-aliasing -Wno-unknown-pragmas -O  
${COPTFLAGS} ${CFLAGS}
-----------------------------------------

Using include paths: -I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include 
-I/ffs/u/u06189/petsc-3.6.1/include -I/ffs/u/u06189/petsc-3.6.1/include 
-I/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/include
-----------------------------------------

Using C linker: /ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/bin/mpicc
Using libraries: -Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lpetsc 
-Wl,-rpath,/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib 
-L/ffs/u/u06189/petsc-3.6.1/arch-linux2-c-opt/lib -lf2clapack -lf2cblas -lm 
-lX11 -lpthread -lssl -lcrypto -lm -ldl 
-----------------------------------------

Re: [petsc-users] Scalability issue

Reply via email to