@Matthew Knepley<mailto:[email protected]> Yes, it works with main build!

@Mark Adams<mailto:[email protected]> I have attached the log_view output of ex4 
for your reference. But pcapply or pcsetup did not record any gpu flops.
If it is not too much trouble can you send me the log_view output of 
ksp/ksp/tutorial/ex45 using hypre?

Many thanks!
Karthik.

From: Matthew Knepley <[email protected]>
Date: Monday, 22 November 2021 at 15:26
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]>
Cc: Stefano Zampini <[email protected]>, "[email protected]" 
<[email protected]>
Subject: Re: [petsc-users] hypre on gpus

On Mon, Nov 22, 2021 at 10:20 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Thanks Stefano,

I just did a clean build with no problems. Can you try a clean build of main?

  Thanks,

     Matt


I have another build without the --download-hypre-commit=origin/hypre_petsc but 
that gives a different error.


[kchockalingam@glados tutorials]$ ./ex4 -ksp_view -ksp_type cg -mat_type hypre 
-pc_type hypre

[0]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------

[0]PETSC ERROR: Error in external library

[0]PETSC ERROR: Error in HYPRE_IJMatrixAssemble(): error code 12

[0]PETSC ERROR: See https://www.mcs.anl.gov/petsc/documentation/faq.html for 
trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.15.3, Aug 06, 2021

[0]PETSC ERROR: ./ex4 on a  named glados.dl.ac.uk<http://glados.dl.ac.uk> by 
kchockalingam Mon Nov 22 15:07:46 2021

[0]PETSC ERROR: Configure options 
--package-prefix-hash=/home/kchockalingam/petsc-hash-pkgs --with-make-test-np=2 
COPTFLAGS="-g -O3 -fno-omit-frame-pointer" FOPTFLAGS="-g -O3 
-fno-omit-frame-pointer" CXXOPTFLAGS="-g -O3 -fno-omit-frame-pointer" 
--with-cuda=1 --with-cuda-arch=70 --with-blaslapack=1 
--with-cuda-dir=/apps/packages/cuda/10.1/ 
--with-mpi-dir=/apps/packages/gcc/7.3.0/openmpi/3.1.2 --download-hypre=1 
--download-hypre-configure-arguments=HYPRE_CUDA_SM=70 --with-debugging=no 
PETSC_ARCH=arch-ci-linux-cuda11-double

[0]PETSC ERROR: #1 MatAssemblyEnd_HYPRE() at 
/home/kchockalingam/tools/petsc-3.15.3/src/mat/impls/hypre/mhypre.c:1212

[0]PETSC ERROR: #2 MatAssemblyEnd() at 
/home/kchockalingam/tools/petsc-3.15.3/src/mat/interface/matrix.c:5652

[0]PETSC ERROR: #3 main() at ex4.c:84

[0]PETSC ERROR: PETSc Option Table entries:

[0]PETSC ERROR: -ksp_type cg

[0]PETSC ERROR: -ksp_view

[0]PETSC ERROR: -mat_type hypre

[0]PETSC ERROR: -pc_type hypre

[0]PETSC ERROR: ----------------End of Error Message -------send entire error 
message to [email protected]

--------------------------------------------------------------------------

Best,
Karthik.

From: Stefano Zampini 
<[email protected]<mailto:[email protected]>>
Date: Monday, 22 November 2021 at 14:46
To: Matthew Knepley <[email protected]<mailto:[email protected]>>
Cc: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>,
 "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus

You don't need to specify the HYPRE commit. Remove  
--download-hypre-commit=origin/hypre_petsc  from the configuration options

Il giorno lun 22 nov 2021 alle ore 17:29 Matthew Knepley 
<[email protected]<mailto:[email protected]>> ha scritto:
On Mon, Nov 22, 2021 at 8:50 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Hi Matt,

Below is the entire error message:

I cannot reproduce this:

main $:/PETSc3/petsc/petsc-dev/src/ksp/ksp/tutorials$ ./ex4 -ksp_view -ksp_type 
cg -mat_type hypre -pc_type hypre
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=0.000138889, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
      Cycle type V
      Maximum number of levels 25
      Maximum number of iterations PER hypre call 1
      Convergence tolerance PER hypre call 0.
      Threshold for strong coupling 0.25
      Interpolation truncation factor 0.
      Interpolation: max elements per row 0
      Number of levels of aggressive coarsening 0
      Number of paths for aggressive coarsening 1
      Maximum row sums 0.9
      Sweeps down         1
      Sweeps up           1
      Sweeps on coarse    1
      Relax down          symmetric-SOR/Jacobi
      Relax up            symmetric-SOR/Jacobi
      Relax on coarse     Gaussian-elimination
      Relax weight  (all)      1.
      Outer relax weight (all) 1.
      Using CF-relaxation
      Not using more complex smoothers.
      Measure type        local
      Coarsen type        Falgout
      Interpolation type  classical
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: hypre
    rows=56, cols=56
Norm of error 8.69801e-05 iterations 2
This is on the 'main' branch.  So either there is some bug in release, or 
something is strange on your end. Since we run Hypre tests for the CI,
I am leaning toward the latter. Can you try the 'main' branch? We will have to 
use this anyway if we want any fixes.

  Thanks,

    Matt

 [0]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------

[0]PETSC ERROR: Object is in wrong state

[0]PETSC ERROR: Must call MatXXXSetPreallocation(), MatSetUp() or the matrix 
has not yet been factored on argument 1 "mat" before MatGetOwnershipRange()

[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.

[0]PETSC ERROR: Petsc Release Version 3.16.0, Sep 29, 2021

[0]PETSC ERROR: ./ex4 on a  named sqg2b4.bullx by kxc07-lxm25 Mon Nov 22 
11:33:41 2021

[0]PETSC ERROR: Configure options 
--prefix=/lustre/scafellpike/local/apps/gcc7/petsc/3.16.0-cuda11.2 
--with-debugging=yes 
--with-blaslapack-dir=/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl
 --with-cuda=1 --with-cuda-arch=70 --download-hypre=yes 
--download-hypre-configure-arguments=HYPRE_CUDA_SM=70 
--download-hypre-commit=origin/hypre_petsc --with-shared-libraries=1 
--known-mpi-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx -with-fc=mpif90

[0]PETSC ERROR: #1 MatGetOwnershipRange() at 
/netfs/smain01/scafellpike/local/package_build/build/rja87-build/petsc-cuda-3.16.0/src/mat/interface/matrix.c:6784

[0]PETSC ERROR: #2 main() at ex4.c:40

[0]PETSC ERROR: PETSc Option Table entries:

[0]PETSC ERROR: -ksp_type cg

[0]PETSC ERROR: -ksp_view

[0]PETSC ERROR: -mat_type hypre

[0]PETSC ERROR: -pc_type hypre

[0]PETSC ERROR: -use_gpu_aware_mpi 0

[0]PETSC ERROR: ----------------End of Error Message -------send entire error 
message to [email protected]

--------------------------------------------------------------------------

I have also attached the make.log. Thank you for having a look.

Best,
Karthik.

From: Matthew Knepley <[email protected]<mailto:[email protected]>>
Date: Monday, 22 November 2021 at 13:41
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: Mark Adams <[email protected]<mailto:[email protected]>>, 
"[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus

On Mon, Nov 22, 2021 at 6:47 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:

Thank you for your response. I tried to run the same example



petsc/src/ksp/ksp/tutorials$  ./ex4 -ksp_type cg -mat_type hypre -ksp_view 
-pc_type hypre



but it crashed with the below error


[0]PETSC ERROR: --------------------- Error Message 
--------------------------------------------------------------
[0]PETSC ERROR: Object is in wrong state
[0]PETSC ERROR: Must call MatXXXSetPreallocation(), MatSetUp() or the matrix 
has not yet been factored on argument 1 "mat" before MatGetOwnershipRange()

Hi Karthik,

Please do not clip the error message. We need the entire output. This seems 
strange that you would get a logic error,
since that should be the same on any architecture. So could you also send the 
make.log?

  Thanks,

    Matt


Below are the options used to configure hypre with cuda support. Do you spot 
any mistakes?



--with-blaslapack-dir=/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl

--with-cuda=1

--with-cuda-arch=70

--download-hypre=yes

--download-hypre-configure-arguments=HYPRE_CUDA_SM=70

--download-hypre-commit=origin/hypre_petsc

--with-shared-libraries=1

--known-mpi-shared-libraries=1

--with-cc=mpicc

--with-cxx=mpicxx

-with-fc=mpif90



Best,

Karthik.


From: Mark Adams <[email protected]<mailto:[email protected]>>
Date: Friday, 19 November 2021 at 16:31
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus

You should run with -options_left to check that your options are being used. It 
may be -mat_type hypre.

I have tested this:

petsc/src/ksp/ksp/tutorials$ srun -n2 ./ex4 -ksp_type cg -mat_type hypre 
-ksp_view -pc_type hypre

You can add -log_view and that will print performance data for each method like 
KSPSolve.

If you are configured with a GPU there will be some extra columns that give the 
percent of flops on the GPU.

In the past hypre has not registered flops with us, but I do get flops from 
hypre now, and -ksp_view showed that it did indeed use hypre.
I saw that the flops were 100% GPU in hypre.


On Fri, Nov 19, 2021 at 10:47 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Hello,

I tried to solve a 3D Poisson (ksp/tutorial/ex45) using -pc_type hypre on gpus

./ex45 -da_grid_x 128 -da_grid_y 128 -da_grid_z 128 -dm_mat_type hypre 
-dm_vec_type cuda -ksp_type cg -pc_type hypre -pc_hypre_type  boomeramg  
-ksp_monitor -log_view

I profiled the run using NSYS -  attached you find all the relevant files.
Looking at the profile I doubt if hypre is running on gpus. The cuda kernels 
are barely active.
I don’t see any cuda kernel relevant to solve.
Is my assessment correct? How can I verify if hypre indeed is running on gpus?

Best,
Karthik.

From: Mark Adams <[email protected]<mailto:[email protected]>>
Date: Friday, 8 October 2021 at 18:47
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus

I think you would want to use 'cuda' vec_type, but I .
You might ask Hypre how one verifies that the GPU is used.
Mark

On Fri, Oct 8, 2021 at 1:35 PM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Yes, I used it for both cpu and gpu. Is that not okay?

For gpu: -dm_mat_type hypre -dm_vec_type mpicuda

For cpu:  -dm_mat_type hypre -dm_vec_type mpi

From: Mark Adams <[email protected]<mailto:[email protected]>>
Date: Friday, 8 October 2021 at 18:28
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus

Did you use -dm_mat_type hypre on the GPU case ?

On Fri, Oct 8, 2021 at 12:19 PM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
I tried a different exercise ran the same problem on two cpu cores and on two 
gpu:

On gpu

PCApply                6 1.0 6.0335e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
6.0e+00 15  0  0  0  1  15  0  0  0  1     0       0      0 0.00e+00    5 
9.65e+01  0

and on cpu

PCApply                6 1.0 5.6348e+00 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 16  0  0  0  0  16  0  0  0  0     0

timings again are close but gpu version did a reduction 6.0e+00 but the cpu 
version did not 0.0e+00.
I am not sure if that is any indication if hypre ran on gpus?

Thanks,
Karthik.


From: Mark Adams <[email protected]<mailto:[email protected]>>
Date: Friday, 8 October 2021 at 16:36
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus



On Fri, Oct 8, 2021 at 10:29 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
The PCApply timing on

gpu


PCApply                6 1.0 1.0235e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 39  0  0  0  0  39  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0



and cpu



PCApply                6 1.0 1.0242e+01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 41  0  0  0  0  41  0  0  0  0     0


You don't have GPUs. probably.
Use -dm_mat_type hypre.

are close. It is hard for me tell if hypre on gpu is on or not.

Best,
Karthik.


From: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Date: Friday, 8 October 2021 at 14:55
To: Mark Adams <[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus

Thanks Mark, I will try your recommendations.
Should I also change -dm_vec_type to hypre currently I have it as mpicuda?

Karthik.


From: Mark Adams <[email protected]<mailto:[email protected]>>
Date: Friday, 8 October 2021 at 14:33
To: "Chockalingam, Karthikeyan (STFC,DL,HC)" 
<[email protected]<mailto:[email protected]>>
Cc: "[email protected]<mailto:[email protected]>" 
<[email protected]<mailto:[email protected]>>
Subject: Re: [petsc-users] hypre on gpus

Hypre does not record its flops with PETSc's timers.
Configure with and without CUDA and see if the timings change in PCApply.
Hypre does not dynamically switch between CUDA and CPU solves at this time, but 
you want to use -dm_mat_type hypre.
Mark

On Fri, Oct 8, 2021 at 6:59 AM Karthikeyan Chockalingam - STFC UKRI 
<[email protected]<mailto:[email protected]>>
 wrote:
Hello,

I am trying to run ex45 (in KSP tutorial) using hypre on gpus. I have attached 
the python configuration file and -log_view output from running the below 
command options

mpirun -n 2 ./ex45 -log_view -da_grid_x 169 -da_grid_y 169 -da_grid_z 169  
-dm_mat_type mpiaijcusparse -dm_vec_type mpicuda -ksp_type gmres -pc_type hypre 
-pc_hypre_type  boomeramg -ksp_gmres_restart 31 
-pc_hypre_boomeramg_strong_threshold 0.7  -ksp_monitor

The problem was solved and converged but from the output file I suspect hypre 
is not running on gpus as PCApply and DMCreate does not record any gpu Mflop/s. 
However, some events such KSPSolve, MatMult etc are running on gpus.

Can you please let me know if I need to add any extra flag to the attached 
arch-ci-linux-cuda11-double-xx.py script file to get hypre working on gpus?

Thanks,
Karthik.



This email and any attachments are intended solely for the use of the named 
recipients. If you are not the intended recipient you must not use, disclose, 
copy or distribute this email or any of its attachments and should notify the 
sender immediately and delete this email from your system. UK Research and 
Innovation (UKRI) has taken every reasonable precaution to minimise risk of 
this email or any attachments containing viruses or malware but the recipient 
should carry out its own virus and malware checks before opening the 
attachments. UKRI does not accept any liability for any losses or damages which 
the recipient may sustain due to presence of any viruses.


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>


--
Stefano


--
What most experimenters take for granted before they begin their experiments is 
infinitely more interesting than any results to which their experiments lead.
-- Norbert Wiener

https://www.cse.buffalo.edu/~knepley/<http://www.cse.buffalo.edu/~knepley/>
[1637602133.843999] [sqg2e4:18844:0]         mxm.c:196  MXM  WARN  The 'ulimit 
-s' on the system is set to 'unlimited'. This may have negative performance 
implications. Please set the stack size to the default value (10240) 
  0 KSP Residual norm 6.736906443113e+00 
  1 KSP Residual norm 3.924488666810e-01 
  2 KSP Residual norm 3.573236154366e-02 
  3 KSP Residual norm 5.628368310285e-03 
  4 KSP Residual norm 9.795224289872e-04 
  5 KSP Residual norm 9.239787115081e-05 
KSP Object: 1 MPI processes
  type: cg
  maximum iterations=10000, initial guess is zero
  tolerances:  relative=0.000138889, absolute=1e-50, divergence=10000.
  left preconditioning
  using PRECONDITIONED norm type for convergence test
PC Object: 1 MPI processes
  type: hypre
    HYPRE BoomerAMG preconditioning
      Cycle type V
      Maximum number of levels 25
      Maximum number of iterations PER hypre call 1
      Convergence tolerance PER hypre call 0.
      Threshold for strong coupling 0.25
      Interpolation truncation factor 0.
      Interpolation: max elements per row 0
      Number of levels of aggressive coarsening 0
      Number of paths for aggressive coarsening 1
      Maximum row sums 0.9
      Sweeps down         1
      Sweeps up           1
      Sweeps on coarse    1
      Relax down          l1scaled-Jacobi
      Relax up            l1scaled-Jacobi
      Relax on coarse     Gaussian-elimination
      Relax weight  (all)      1.
      Outer relax weight (all) 1.
      Not using CF-relaxation
      Not using more complex smoothers.
      Measure type        local
      Coarsen type        PMIS
      Interpolation type  ext+i
  linear system matrix = precond matrix:
  Mat Object: 1 MPI processes
    type: hypre
    rows=56, cols=56
Norm of error 0.000122452 iterations 5
************************************************************************************************************************
***             WIDEN YOUR WINDOW TO 120 CHARACTERS.  Use 'enscript -r 
-fCourier9' to print this document            ***
************************************************************************************************************************

---------------------------------------------- PETSc Performance Summary: 
----------------------------------------------



      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################




      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      # This code was compiled with GPU support and you've     #
      # created PETSc/GPU objects, but you intentionally used  #
      # -use_gpu_aware_mpi 0, such that PETSc had to copy data #
      # from GPU to CPU for communication. To get meaningfull  #
      # timing results, please use GPU-aware MPI instead.      #
      ##########################################################


./ex4 on a arch-linux2-c-debug named sqg2e4.bullx with 1 processor, by 
kxc07-lxm25 Mon Nov 22 17:28:55 2021
Using Petsc Development GIT revision: v3.16.1-353-g887dddf386  GIT Date: 
2021-11-19 20:24:41 +0000

                         Max       Max/Min     Avg       Total
Time (sec):           3.084e-01     1.000   3.084e-01
Objects:              1.100e+01     1.000   1.100e+01
Flop:                 3.567e+03     1.000   3.567e+03  3.567e+03
Flop/sec:             1.157e+04     1.000   1.157e+04  1.157e+04
Memory:               2.483e+05     1.000   2.483e+05  2.483e+05
MPI Messages:         0.000e+00     0.000   0.000e+00  0.000e+00
MPI Message Lengths:  0.000e+00     0.000   0.000e+00  0.000e+00
MPI Reductions:       0.000e+00     0.000

Flop counting convention: 1 flop = 1 real number operation of type 
(multiply/divide/add/subtract)
                            e.g., VecAXPY() for real vectors of length N --> 2N 
flop
                            and VecAXPY() for complex vectors of length N --> 
8N flop

Summary of Stages:   ----- Time ------  ----- Flop ------  --- Messages ---  -- 
Message Lengths --  -- Reductions --
                        Avg     %Total     Avg     %Total    Count   %Total     
Avg         %Total    Count   %Total
 0:      Main Stage: 3.0838e-01 100.0%  3.5670e+03 100.0%  0.000e+00   0.0%  
0.000e+00        0.0%  0.000e+00   0.0%

------------------------------------------------------------------------------------------------------------------------
See the 'Profiling' chapter of the users' manual for details on interpreting 
output.
Phase summary info:
   Count: number of times phase was executed
   Time and Flop: Max - maximum over all processors
                  Ratio - ratio of maximum to minimum over all processors
   Mess: number of messages sent
   AvgLen: average message length (bytes)
   Reduct: number of global reductions
   Global: entire computation
   Stage: stages of a computation. Set stages with PetscLogStagePush() and 
PetscLogStagePop().
      %T - percent time in this phase         %F - percent flop in this phase
      %M - percent messages in this phase     %L - percent message lengths in 
this phase
      %R - percent reductions in this phase
   Total Mflop/s: 10e-6 * (sum of flop over all processors)/(max time over all 
processors)
   GPU Mflop/s: 10e-6 * (sum of flop on GPU over all processors)/(max GPU time 
over all processors)
   CpuToGpu Count: total number of CPU to GPU copies per processor
   CpuToGpu Size (Mbytes): 10e-6 * (total size of CPU to GPU copies per 
processor)
   GpuToCpu Count: total number of GPU to CPU copies per processor
   GpuToCpu Size (Mbytes): 10e-6 * (total size of GPU to CPU copies per 
processor)
   GPU %F: percent flops on GPU in this event
------------------------------------------------------------------------------------------------------------------------


      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Event                Count      Time (sec)     Flop                             
 --- Global ---  --- Stage ----  Total   GPU    - CpuToGpu -   - GpuToCpu - GPU
                   Max Ratio  Max     Ratio   Max  Ratio  Mess   AvgLen  Reduct 
 %T %F %M %L %R  %T %F %M %L %R Mflop/s Mflop/s Count   Size   Count   Size  %F
---------------------------------------------------------------------------------------------------------------------------------------------------------------

--- Event Stage 0: Main Stage

BuildTwoSided          1 1.0 1.0332e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
BuildTwoSidedF         1 1.0 2.3869e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
MatMult                6 1.0 2.5697e-01 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 83  0  0  0  0  83  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
MatAssemblyBegin       1 1.0 5.2592e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
MatAssemblyEnd         1 1.0 6.1558e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
MatView                1 1.0 4.9629e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
VecTDot               10 1.0 3.6972e-04 1.0 1.11e+03 1.0 0.0e+00 0.0e+00 
0.0e+00  0 31  0  0  0   0 31  0  0  0     3       5      0 0.00e+00    0 
0.00e+00 100
VecNorm                7 1.0 4.3831e-04 1.0 7.77e+02 1.0 0.0e+00 0.0e+00 
0.0e+00  0 22  0  0  0   0 22  0  0  0     2       2      0 0.00e+00    0 
0.00e+00 100
VecCopy                2 1.0 4.9198e-05 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
VecSet                 8 1.0 2.1870e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
VecAXPY               11 1.0 2.0920e-04 1.0 1.23e+03 1.0 0.0e+00 0.0e+00 
0.0e+00  0 35  0  0  0   0 35  0  0  0     6      18      0 0.00e+00    0 
0.00e+00 100
VecAYPX                4 1.0 8.7857e-05 1.0 4.48e+02 1.0 0.0e+00 0.0e+00 
0.0e+00  0 13  0  0  0   0 13  0  0  0     5      11      0 0.00e+00    0 
0.00e+00 100
KSPSetUp               1 1.0 1.8801e-04 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
KSPSolve               1 1.0 2.6545e-03 1.0 3.34e+03 1.0 0.0e+00 0.0e+00 
0.0e+00  1 94  0  0  0   1 94  0  0  0     1       5      0 0.00e+00    0 
0.00e+00 100
PCSetUp                1 1.0 4.3471e-02 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00 14  0  0  0  0  14  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
PCApply                6 1.0 1.1224e-03 1.0 0.00e+00 0.0 0.0e+00 0.0e+00 
0.0e+00  0  0  0  0  0   0  0  0  0  0     0       0      0 0.00e+00    0 
0.00e+00  0
---------------------------------------------------------------------------------------------------------------------------------------------------------------

Memory usage is given in bytes:

Object Type          Creations   Destructions     Memory  Descendants' Mem.
Reports information only for process 0.

--- Event Stage 0: Main Stage

              Matrix     1              1         3008     0.
              Vector     6              6         9984     0.
       Krylov Solver     1              1         1672     0.
      Preconditioner     1              1         1512     0.
              Viewer     2              1          848     0.
========================================================================================================================
Average time to get PetscTime(): 2.99e-08
#PETSc Option Table entries:
-ksp_monitor
-ksp_type cg
-ksp_view
-log_view
-mat_type hypre
-pc_type hypre
-use_gpu_aware_mpi 0
#End of PETSc Option Table entries
Compiled without FORTRAN kernels
Compiled with full precision matrices (default)
sizeof(short) 2 sizeof(int) 4 sizeof(long) 8 sizeof(void*) 8 
sizeof(PetscScalar) 8 sizeof(PetscInt) 4
Configure options: 
--with-blaslapack-dir=/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl
 --with-cuda=1 --with-cuda-arch=70 --download-hypre=yes 
--download-hypre-configure-arguments=HYPRE_CUDA_SM=70 
--download-hypre-commit=origin/hypre_petsc --with-shared-libraries=1 
--known-mpi-shared-libraries=1 --with-cc=mpicc --with-cxx=mpicxx -with-fc=mpif90
-----------------------------------------
Libraries compiled on 2021-11-22 17:18:17 on hcxlogin2 
Machine characteristics: 
Linux-3.10.0-1127.el7.x86_64-x86_64-with-redhat-7.8-Maipo
Using PETSc directory: 
/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc
Using PETSc arch: arch-linux2-c-debug
-----------------------------------------

Using C compiler: mpicc  -fPIC -Wall -Wwrite-strings -Wno-strict-aliasing 
-Wno-unknown-pragmas -fstack-protector -fvisibility=hidden -g3 -O0   
Using Fortran compiler: mpif90  -fPIC -Wall -ffree-line-length-0 
-Wno-unused-dummy-argument -g -O0     
-----------------------------------------

Using include paths: 
-I/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/include 
-I/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/include
 -I/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/include 
-I/lustre/scafellpike/local/apps/cuda/11.2/include
-----------------------------------------

Using C linker: mpicc
Using Fortran linker: mpif90
Using libraries: 
-Wl,-rpath,/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib
 
-L/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib
 -lpetsc 
-Wl,-rpath,/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib
 
-L/lustre/scafellpike/local/HT04048/lxm25/kxc07-lxm25/petsc-main/petsc/arch-linux2-c-debug/lib
 
-Wl,-rpath,/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/lib/intel64
 -L/lustre/scafellpike/local/apps/intel/intel_cs/2018.0.128/mkl/lib/intel64 
-Wl,-rpath,/lustre/scafellpike/local/apps/cuda/11.2/lib64 
-L/lustre/scafellpike/local/apps/cuda/11.2/lib64 
-L/lustre/scafellpike/local/apps/cuda/11.2/lib64/stubs 
-Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/openmpi/4.0.4-cuda11.2/lib 
-L/lustre/scafellpike/local/apps/gcc7/openmpi/4.0.4-cuda11.2/lib 
-Wl,-rpath,/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/lib 
-L/opt/lsf/10.1/linux3.10-glibc2.17-x86_64/lib 
-Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0
 
-L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc/x86_64-pc-linux-gnu/7.2.0
 -Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc 
-L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib/gcc 
-Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib64 
-L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib64 
-Wl,-rpath,/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib 
-L/lustre/scafellpike/local/apps/gcc7/gcc/7.2.0/lib -lHYPRE -lmkl_intel_lp64 
-lmkl_core -lmkl_sequential -lpthread -lm -lcudart -lcufft -lcublas -lcusparse 
-lcusolver -lcurand -lcuda -lX11 -lstdc++ -ldl -lmpi_usempi_ignore_tkr 
-lmpi_mpifh -lmpi -lgfortran -lm -lgfortran -lm -lgcc_s -lquadmath -lstdc++ -ldl
-----------------------------------------



      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      # This code was compiled with GPU support and you've     #
      # created PETSc/GPU objects, but you intentionally used  #
      # -use_gpu_aware_mpi 0, such that PETSc had to copy data #
      # from GPU to CPU for communication. To get meaningfull  #
      # timing results, please use GPU-aware MPI instead.      #
      ##########################################################




      ##########################################################
      #                                                        #
      #                       WARNING!!!                       #
      #                                                        #
      #   This code was compiled with a debugging option.      #
      #   To get timing results run ./configure                #
      #   using --with-debugging=no, the performance will      #
      #   be generally two or three times faster.              #
      #                                                        #
      ##########################################################


Reply via email to