Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-26 Thread Barry Smith

   When run with -log_view_gpu_time each event has two times: the time of 
kernel on the GPU (computed directly on the GPU using GPU timers) and the time 
of the CPU clock. The time on the CPU for the event always encloses the entire 
kernel (hence its time is always at least as large as the time of the kernel). 
Basically the CPU time in an event where all the action happens on the GPU is 
the time of kernel launch plus the time to run the kernel and confirm it is 
finished. So the GPU flop is the flop rate actually achieved on the GPU while 
the CPU flop rate is the effective flop rate the user is getting on the 
application.

  

> On Jan 26, 2024, at 6:48 AM, Anthony Jourdon  wrote:
> 
> Hello,
> 
> Thank you for your answers.
> I am working with Dave May on this topic.
> 
> Still running src/ksp/ksp/tutorials/ex34 with the same options reported by 
> Dave, I added the option -log_view_gpu_time.
> Now the log provides gpu flop/s instead of nans.
> However, I have trouble understanding the numbers reported in the log (file 
> attached).
> The numbers reported for Total Mflop/s and GPU Mflop/s are different even 
> when 100% of the work is supposed to be done on the GPU.
> The numbers reported for GPU Mflop/s are always higher than the numbers 
> reported for Total Mflop/s.
> As I understand, the Total Mflop/s should be the sum of both GPU and CPU 
> flop/s, but if the gpu does 100% of the work, why are there different numbers 
> reported by the GPU and Total flop/s columns and why the GPU flop/s are 
> always higher than the Total flop/s ?
> Or am I missing something?
> 
> Thank you for your attention.
> Anthony Jourdon
> 
> 
> 
> Le sam. 20 janv. 2024 à 02:25, Barry Smith  > a écrit :
>> 
>>Nans indicate we do not have valid computational times for these 
>> operations; think of them as Not Available. Providing valid times for the 
>> "inner" operations listed with Nans requires inaccurate times (higher) for 
>> the outer operations, since extra synchronization between the CPU and GPU 
>> must be done to get valid times for the inner options. We opted to have the 
>> best valid times for the outer operations since those times reflect the time 
>> of the application.
>> 
>> 
>> 
>> 
>> 
>> > On Jan 19, 2024, at 12:35 PM, Dave May > > > wrote:
>> > 
>> > Hi all,
>> > 
>> > I am trying to understand the logging information associated with the 
>> > %flops-performed-on-the-gpu reported by -log_view when running 
>> >   src/ksp/ksp/tutorials/ex34
>> > with the following options
>> > -da_grid_x 192
>> > -da_grid_y 192
>> > -da_grid_z 192
>> > -dm_mat_type seqaijhipsparse
>> > -dm_vec_type seqhip
>> > -ksp_max_it 10
>> > -ksp_monitor
>> > -ksp_type richardson
>> > -ksp_view
>> > -log_view
>> > -mg_coarse_ksp_max_it 2
>> > -mg_coarse_ksp_type richardson
>> > -mg_coarse_pc_type none
>> > -mg_levels_ksp_type richardson
>> > -mg_levels_pc_type none
>> > -options_left
>> > -pc_mg_levels 3
>> > -pc_mg_log
>> > -pc_type mg
>> > 
>> > This config is not intended to actually solve the problem, rather it is a 
>> > stripped down set of options designed to understand what parts of the 
>> > smoothers are being executed on the GPU.
>> > 
>> > With respect to the log file attached, my first set of questions related 
>> > to the data reported under "Event Stage 2: MG Apply".
>> > 
>> > [1] Why is the log littered with nan's?
>> > * I don't understand how and why "GPU Mflop/s" should be reported as nan 
>> > when a value is given for "GPU %F" (see MatMult for example).
>> > 
>> > * For events executed on the GPU, I assume the column "Time (sec)" relates 
>> > to "CPU execute time", this would explain why we see a nan in "Time (sec)" 
>> > for MatMult.
>> > If my assumption is correct, how should I interpret the column "Flop 
>> > (Max)" which is showing 1.92e+09? 
>> > I would assume of "Time (sec)" relates to the CPU then "Flop (Max)" should 
>> > also relate to CPU and GPU flops would be logged in "GPU Mflop/s"
>> > 
>> > [2] More curious is that within "Event Stage 2: MG Apply" KSPSolve, 
>> > MGSmooth Level 0, MGSmooth Level 1, MGSmooth Level 2 all report "GPU %F" 
>> > as 93. I believe this value should be 100 as the smoother (and coarse grid 
>> > solver) are configured as richardson(2)+none and thus should run entirely 
>> > on the GPU. 
>> > Furthermore, when one inspects all events listed under "Event Stage 2: MG 
>> > Apply" those events which do flops correctly report "GPU %F" as 100. 
>> > And the events showing "GPU %F" = 0 such as 
>> >   MatHIPSPARSCopyTo, VecCopy, VecSet, PCApply, DCtxSync
>> > don't do any flops (on the CPU or GPU) - which is also correct (although 
>> > non GPU events should show nan??)
>> > 
>> > Hence I am wondering what is the explanation for the missing 7% from "GPU 
>> > %F" for KSPSolve and MGSmooth {0,1,2}??
>> > 
>> > Does anyone understand this -log_view, or can explain to me how to 
>> > interpret it?
>> > 
>> > It 

Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-26 Thread Anthony Jourdon
Hello,

Thank you for your answers.
I am working with Dave May on this topic.

Still running src/ksp/ksp/tutorials/ex34 with the same options reported by
Dave, I added the option -log_view_gpu_time.
Now the log provides gpu flop/s instead of nans.
However, I have trouble understanding the numbers reported in the log (file
attached).

   1. The numbers reported for Total Mflop/s and GPU Mflop/s are different
   even when 100% of the work is supposed to be done on the GPU.
   2. The numbers reported for GPU Mflop/s are always higher than the
   numbers reported for Total Mflop/s.

As I understand, the Total Mflop/s should be the sum of both GPU and CPU
flop/s, but if the gpu does 100% of the work, why are there different
numbers reported by the GPU and Total flop/s columns and why the GPU flop/s
are always higher than the Total flop/s ?
Or am I missing something?

Thank you for your attention.
Anthony Jourdon



Le sam. 20 janv. 2024 à 02:25, Barry Smith  a écrit :

>
>Nans indicate we do not have valid computational times for these
> operations; think of them as Not Available. Providing valid times for the
> "inner" operations listed with Nans requires inaccurate times (higher) for
> the outer operations, since extra synchronization between the CPU and GPU
> must be done to get valid times for the inner options. We opted to have the
> best valid times for the outer operations since those times reflect the
> time of the application.
>
>
>
>
>
> > On Jan 19, 2024, at 12:35 PM, Dave May  wrote:
> >
> > Hi all,
> >
> > I am trying to understand the logging information associated with the
> %flops-performed-on-the-gpu reported by -log_view when running
> >   src/ksp/ksp/tutorials/ex34
> > with the following options
> > -da_grid_x 192
> > -da_grid_y 192
> > -da_grid_z 192
> > -dm_mat_type seqaijhipsparse
> > -dm_vec_type seqhip
> > -ksp_max_it 10
> > -ksp_monitor
> > -ksp_type richardson
> > -ksp_view
> > -log_view
> > -mg_coarse_ksp_max_it 2
> > -mg_coarse_ksp_type richardson
> > -mg_coarse_pc_type none
> > -mg_levels_ksp_type richardson
> > -mg_levels_pc_type none
> > -options_left
> > -pc_mg_levels 3
> > -pc_mg_log
> > -pc_type mg
> >
> > This config is not intended to actually solve the problem, rather it is
> a stripped down set of options designed to understand what parts of the
> smoothers are being executed on the GPU.
> >
> > With respect to the log file attached, my first set of questions related
> to the data reported under "Event Stage 2: MG Apply".
> >
> > [1] Why is the log littered with nan's?
> > * I don't understand how and why "GPU Mflop/s" should be reported as nan
> when a value is given for "GPU %F" (see MatMult for example).
> >
> > * For events executed on the GPU, I assume the column "Time (sec)"
> relates to "CPU execute time", this would explain why we see a nan in "Time
> (sec)" for MatMult.
> > If my assumption is correct, how should I interpret the column "Flop
> (Max)" which is showing 1.92e+09?
> > I would assume of "Time (sec)" relates to the CPU then "Flop (Max)"
> should also relate to CPU and GPU flops would be logged in "GPU Mflop/s"
> >
> > [2] More curious is that within "Event Stage 2: MG Apply" KSPSolve,
> MGSmooth Level 0, MGSmooth Level 1, MGSmooth Level 2 all report "GPU %F" as
> 93. I believe this value should be 100 as the smoother (and coarse grid
> solver) are configured as richardson(2)+none and thus should run entirely
> on the GPU.
> > Furthermore, when one inspects all events listed under "Event Stage 2:
> MG Apply" those events which do flops correctly report "GPU %F" as 100.
> > And the events showing "GPU %F" = 0 such as
> >   MatHIPSPARSCopyTo, VecCopy, VecSet, PCApply, DCtxSync
> > don't do any flops (on the CPU or GPU) - which is also correct (although
> non GPU events should show nan??)
> >
> > Hence I am wondering what is the explanation for the missing 7% from
> "GPU %F" for KSPSolve and MGSmooth {0,1,2}??
> >
> > Does anyone understand this -log_view, or can explain to me how to
> interpret it?
> >
> > It could simply be that:
> > a) something is messed up with -pc_mg_log
> > b) something is messed up with the PETSc build
> > c) I am putting too much faith in -log_view and should profile the code
> differently.
> >
> > Either way I'd really like to understand what is going on.
> >
> >
> > Cheers,
> > Dave
> >
> >
> >
> > 
>
>


ex34_192_mg_seqhip_richardson_pcnone_gpulog.out
Description: Binary data


Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-19 Thread Junchao Zhang
I reproduced this HIPSPARSE_STATUS_INVALID_VALUE error, but have not yet
found obvious input argument errors for this hipsparse call.

On Fri, Jan 19, 2024 at 2:18 PM Barry Smith  wrote:

>
>   Junchao
>
> I run the following on the CI machine, why does this happen? With
> trivial solver options it runs ok.
>
> bsmith@petsc-gpu-02:/scratch/bsmith/petsc/src/ksp/ksp/tutorials$ ./ex34
> -da_grid_x 192 -da_grid_y 192 -da_grid_z 192 -dm_mat_type seqaijhipsparse
> -dm_vec_type seqhip -ksp_max_it 10 -ksp_monitor -ksp_type richardson
> -ksp_view -log_view -mg_coarse_ksp_max_it 2 -mg_coarse_ksp_type richardson
> -mg_coarse_pc_type none -mg_levels_ksp_type richardson -mg_levels_pc_type
> none -options_left -pc_mg_levels 3 -pc_mg_log -pc_type mg
>
> *[0]PETSC ERROR: - Error Message
> --*
>
> [0]PETSC ERROR: GPU error
>
> [0]PETSC ERROR: hipSPARSE errorcode 3 (HIPSPARSE_STATUS_INVALID_VALUE)
>
> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
> program crashed before usage or a spelling mistake, etc!
>
> [0]PETSC ERROR:   Option left: name:-options_left (no value) source:
> command line
>
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.20.3, unknown
>
> [0]PETSC ERROR: ./ex34 on a  named petsc-gpu-02 by bsmith Fri Jan 19
> 14:15:20 2024
>
> [0]PETSC ERROR: Configure options
> --package-prefix-hash=/home/bsmith/petsc-hash-pkgs --with-make-np=24
> --with-make-test-np=8 --with-hipc=/opt/rocm-5.4.3/bin/hipcc
> --with-hip-dir=/opt/rocm-5.4.3 COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" HIPOPTFLAGS="-g -O" --with-cuda=0 --with-hip=1
> --with-precision=double --with-clanguage=c --download-kokkos
> --download-kokkos-kernels --download-hypre --download-magma
> --with-magma-fortran-bindings=0 --download-mfem --download-metis
> --with-strict-petscerrorcode PETSC_ARCH=arch-ci-linux-hip-double
>
> [0]PETSC ERROR: #1 MatMultAddKernel_SeqAIJHIPSPARSE() at
> /scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3131
>
> [0]PETSC ERROR: #2 MatMultAdd_SeqAIJHIPSPARSE() at
> /scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3004
>
> [0]PETSC ERROR: #3 MatMultAdd() at
> /scratch/bsmith/petsc/src/mat/interface/matrix.c:2770
>
> [0]PETSC ERROR: #4 MatInterpolateAdd() at
> /scratch/bsmith/petsc/src/mat/interface/matrix.c:8603
>
> [0]PETSC ERROR: #5 PCMGMCycle_Private() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:87
>
> [0]PETSC ERROR: #6 PCMGMCycle_Private() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:83
>
> [0]PETSC ERROR: #7 PCApply_MG_Internal() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:611
>
> [0]PETSC ERROR: #8 PCApply_MG() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:633
>
> [0]PETSC ERROR: #9 PCApply() at
> /scratch/bsmith/petsc/src/ksp/pc/interface/precon.c:498
>
> [0]PETSC ERROR: #10 KSP_PCApply() at
> /scratch/bsmith/petsc/include/petsc/private/kspimpl.h:383
>
> [0]PETSC ERROR: #11 KSPSolve_Richardson() at
> /scratch/bsmith/petsc/src/ksp/ksp/impls/rich/rich.c:106
>
> [0]PETSC ERROR: #12 KSPSolve_Private() at
> /scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:906
>
> [0]PETSC ERROR: #13 KSPSolve() at
> /scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:1079
>
> [0]PETSC ERROR: #14 main() at ex34.c:52
>
> [0]PETSC ERROR: PETSc Option Table entries:
>
>   Dave,
>
> Trying to debug the 7% now, but having trouble running, as you see
> above.
>
>
>
> On Jan 19, 2024, at 3:02 PM, Dave May  wrote:
>
> Thank you Barry and Junchao for these explanations. I'll turn on
> -log_view_gpu_time.
>
> Do either of you have any thoughts regarding why the percentage of flop's
> being reported on the GPU is not 100% for MGSmooth Level {0,1,2} for this
> solver configuration?
>
> This number should have nothing to do with timings as it reports the ratio
> of operations performed on the GPU and CPU, presumably obtained from
> PetscLogFlops() and PetscLogGpuFlops().
>
> Cheers,
> Dave
>
> On Fri, 19 Jan 2024 at 11:39, Junchao Zhang 
> wrote:
>
>> Try to also add -log_view_gpu_time,
>> https://petsc.org/release/manualpages/Profiling/PetscLogGpuTime/
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Jan 19, 2024 at 11:35 AM Dave May 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to understand the logging information associated with the
>>> %flops-performed-on-the-gpu reported by -log_view when running
>>>   src/ksp/ksp/tutorials/ex34
>>> with the following options
>>> -da_grid_x 192
>>> -da_grid_y 192
>>> -da_grid_z 192
>>> -dm_mat_type seqaijhipsparse
>>> -dm_vec_type seqhip
>>> -ksp_max_it 10
>>> -ksp_monitor
>>> -ksp_type richardson
>>> -ksp_view
>>> -log_view
>>> -mg_coarse_ksp_max_it 2
>>> -mg_coarse_ksp_type richardson
>>> -mg_coarse_pc_type none
>>> -mg_levels_ksp_type richardson
>>> -mg_levels_pc_type none
>>> -options_left
>>> -pc_mg_levels 3
>>> -pc_mg_log
>>> -pc_type mg
>>>

Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-19 Thread Barry Smith

   Junchao,

How come  vecseqcupm_impl.hpp  has  PetscCall(PetscLogFlops(n)); instead of 
logging the flops on the GPU? 

This could be the root of the problem, the VecShift used to remove the null 
space from vectors in the solver is logging incorrectly. (For some reason there 
is no LogEventBegin/End() for VecShift which is why it doesn't get it on line 
in the -log_view).

   Barry




> On Jan 19, 2024, at 3:17 PM, Barry Smith  wrote:
> 
> 
>   Junchao
> 
> I run the following on the CI machine, why does this happen? With trivial 
> solver options it runs ok.
> 
> bsmith@petsc-gpu-02:/scratch/bsmith/petsc/src/ksp/ksp/tutorials$ ./ex34 
> -da_grid_x 192 -da_grid_y 192 -da_grid_z 192 -dm_mat_type seqaijhipsparse 
> -dm_vec_type seqhip -ksp_max_it 10 -ksp_monitor -ksp_type richardson 
> -ksp_view -log_view -mg_coarse_ksp_max_it 2 -mg_coarse_ksp_type richardson 
> -mg_coarse_pc_type none -mg_levels_ksp_type richardson -mg_levels_pc_type 
> none -options_left -pc_mg_levels 3 -pc_mg_log -pc_type mg
> [0]PETSC ERROR: - Error Message 
> --
> [0]PETSC ERROR: GPU error
> [0]PETSC ERROR: hipSPARSE errorcode 3 (HIPSPARSE_STATUS_INVALID_VALUE)
> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program 
> crashed before usage or a spelling mistake, etc!
> [0]PETSC ERROR:   Option left: name:-options_left (no value) source: command 
> line
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
> [0]PETSC ERROR: Petsc Release Version 3.20.3, unknown 
> [0]PETSC ERROR: ./ex34 on a  named petsc-gpu-02 by bsmith Fri Jan 19 14:15:20 
> 2024
> [0]PETSC ERROR: Configure options 
> --package-prefix-hash=/home/bsmith/petsc-hash-pkgs --with-make-np=24 
> --with-make-test-np=8 --with-hipc=/opt/rocm-5.4.3/bin/hipcc 
> --with-hip-dir=/opt/rocm-5.4.3 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
> CXXOPTFLAGS="-g -O" HIPOPTFLAGS="-g -O" --with-cuda=0 --with-hip=1 
> --with-precision=double --with-clanguage=c --download-kokkos 
> --download-kokkos-kernels --download-hypre --download-magma 
> --with-magma-fortran-bindings=0 --download-mfem --download-metis 
> --with-strict-petscerrorcode PETSC_ARCH=arch-ci-linux-hip-double
> [0]PETSC ERROR: #1 MatMultAddKernel_SeqAIJHIPSPARSE() at 
> /scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3131
> [0]PETSC ERROR: #2 MatMultAdd_SeqAIJHIPSPARSE() at 
> /scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3004
> [0]PETSC ERROR: #3 MatMultAdd() at 
> /scratch/bsmith/petsc/src/mat/interface/matrix.c:2770
> [0]PETSC ERROR: #4 MatInterpolateAdd() at 
> /scratch/bsmith/petsc/src/mat/interface/matrix.c:8603
> [0]PETSC ERROR: #5 PCMGMCycle_Private() at 
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:87
> [0]PETSC ERROR: #6 PCMGMCycle_Private() at 
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:83
> [0]PETSC ERROR: #7 PCApply_MG_Internal() at 
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:611
> [0]PETSC ERROR: #8 PCApply_MG() at 
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:633
> [0]PETSC ERROR: #9 PCApply() at 
> /scratch/bsmith/petsc/src/ksp/pc/interface/precon.c:498
> [0]PETSC ERROR: #10 KSP_PCApply() at 
> /scratch/bsmith/petsc/include/petsc/private/kspimpl.h:383
> [0]PETSC ERROR: #11 KSPSolve_Richardson() at 
> /scratch/bsmith/petsc/src/ksp/ksp/impls/rich/rich.c:106
> [0]PETSC ERROR: #12 KSPSolve_Private() at 
> /scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:906
> [0]PETSC ERROR: #13 KSPSolve() at 
> /scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:1079
> [0]PETSC ERROR: #14 main() at ex34.c:52
> [0]PETSC ERROR: PETSc Option Table entries:
> 
>   Dave,
> 
> Trying to debug the 7% now, but having trouble running, as you see above.
> 
> 
> 
>> On Jan 19, 2024, at 3:02 PM, Dave May  wrote:
>> 
>> Thank you Barry and Junchao for these explanations. I'll turn on 
>> -log_view_gpu_time.
>> 
>> Do either of you have any thoughts regarding why the percentage of flop's 
>> being reported on the GPU is not 100% for MGSmooth Level {0,1,2} for this 
>> solver configuration?
>> 
>> This number should have nothing to do with timings as it reports the ratio 
>> of operations performed on the GPU and CPU, presumably obtained from 
>> PetscLogFlops() and PetscLogGpuFlops().
>> 
>> Cheers,
>> Dave
>> 
>> On Fri, 19 Jan 2024 at 11:39, Junchao Zhang > > wrote:
>>> Try to also add -log_view_gpu_time, 
>>> https://petsc.org/release/manualpages/Profiling/PetscLogGpuTime/
>>> 
>>> --Junchao Zhang
>>> 
>>> 
>>> On Fri, Jan 19, 2024 at 11:35 AM Dave May >> > wrote:
 Hi all,
 
 I am trying to understand the logging information associated with the 
 %flops-performed-on-the-gpu reported by -log_view when running 
   src/ksp/ksp/tutorials/ex34
 with the following options
 -da_grid_x 192
 -da_grid_y 192
 -da_grid_z 192
 

Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-19 Thread Dave May
Thanks Barry!

On Fri, 19 Jan 2024 at 12:18, Barry Smith  wrote:

>
>   Junchao
>
> I run the following on the CI machine, why does this happen? With
> trivial solver options it runs ok.
>
> bsmith@petsc-gpu-02:/scratch/bsmith/petsc/src/ksp/ksp/tutorials$ ./ex34
> -da_grid_x 192 -da_grid_y 192 -da_grid_z 192 -dm_mat_type seqaijhipsparse
> -dm_vec_type seqhip -ksp_max_it 10 -ksp_monitor -ksp_type richardson
> -ksp_view -log_view -mg_coarse_ksp_max_it 2 -mg_coarse_ksp_type richardson
> -mg_coarse_pc_type none -mg_levels_ksp_type richardson -mg_levels_pc_type
> none -options_left -pc_mg_levels 3 -pc_mg_log -pc_type mg
>
> *[0]PETSC ERROR: - Error Message
> --*
>
> [0]PETSC ERROR: GPU error
>
> [0]PETSC ERROR: hipSPARSE errorcode 3 (HIPSPARSE_STATUS_INVALID_VALUE)
>
> [0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the
> program crashed before usage or a spelling mistake, etc!
>
> [0]PETSC ERROR:   Option left: name:-options_left (no value) source:
> command line
>
> [0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
>
> [0]PETSC ERROR: Petsc Release Version 3.20.3, unknown
>
> [0]PETSC ERROR: ./ex34 on a  named petsc-gpu-02 by bsmith Fri Jan 19
> 14:15:20 2024
>
> [0]PETSC ERROR: Configure options
> --package-prefix-hash=/home/bsmith/petsc-hash-pkgs --with-make-np=24
> --with-make-test-np=8 --with-hipc=/opt/rocm-5.4.3/bin/hipcc
> --with-hip-dir=/opt/rocm-5.4.3 COPTFLAGS="-g -O" FOPTFLAGS="-g -O"
> CXXOPTFLAGS="-g -O" HIPOPTFLAGS="-g -O" --with-cuda=0 --with-hip=1
> --with-precision=double --with-clanguage=c --download-kokkos
> --download-kokkos-kernels --download-hypre --download-magma
> --with-magma-fortran-bindings=0 --download-mfem --download-metis
> --with-strict-petscerrorcode PETSC_ARCH=arch-ci-linux-hip-double
>
> [0]PETSC ERROR: #1 MatMultAddKernel_SeqAIJHIPSPARSE() at
> /scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3131
>
> [0]PETSC ERROR: #2 MatMultAdd_SeqAIJHIPSPARSE() at
> /scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3004
>
> [0]PETSC ERROR: #3 MatMultAdd() at
> /scratch/bsmith/petsc/src/mat/interface/matrix.c:2770
>
> [0]PETSC ERROR: #4 MatInterpolateAdd() at
> /scratch/bsmith/petsc/src/mat/interface/matrix.c:8603
>
> [0]PETSC ERROR: #5 PCMGMCycle_Private() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:87
>
> [0]PETSC ERROR: #6 PCMGMCycle_Private() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:83
>
> [0]PETSC ERROR: #7 PCApply_MG_Internal() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:611
>
> [0]PETSC ERROR: #8 PCApply_MG() at
> /scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:633
>
> [0]PETSC ERROR: #9 PCApply() at
> /scratch/bsmith/petsc/src/ksp/pc/interface/precon.c:498
>
> [0]PETSC ERROR: #10 KSP_PCApply() at
> /scratch/bsmith/petsc/include/petsc/private/kspimpl.h:383
>
> [0]PETSC ERROR: #11 KSPSolve_Richardson() at
> /scratch/bsmith/petsc/src/ksp/ksp/impls/rich/rich.c:106
>
> [0]PETSC ERROR: #12 KSPSolve_Private() at
> /scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:906
>
> [0]PETSC ERROR: #13 KSPSolve() at
> /scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:1079
>
> [0]PETSC ERROR: #14 main() at ex34.c:52
>
> [0]PETSC ERROR: PETSc Option Table entries:
>
>   Dave,
>
> Trying to debug the 7% now, but having trouble running, as you see
> above.
>
>
>
> On Jan 19, 2024, at 3:02 PM, Dave May  wrote:
>
> Thank you Barry and Junchao for these explanations. I'll turn on
> -log_view_gpu_time.
>
> Do either of you have any thoughts regarding why the percentage of flop's
> being reported on the GPU is not 100% for MGSmooth Level {0,1,2} for this
> solver configuration?
>
> This number should have nothing to do with timings as it reports the ratio
> of operations performed on the GPU and CPU, presumably obtained from
> PetscLogFlops() and PetscLogGpuFlops().
>
> Cheers,
> Dave
>
> On Fri, 19 Jan 2024 at 11:39, Junchao Zhang 
> wrote:
>
>> Try to also add -log_view_gpu_time,
>> https://petsc.org/release/manualpages/Profiling/PetscLogGpuTime/
>>
>> --Junchao Zhang
>>
>>
>> On Fri, Jan 19, 2024 at 11:35 AM Dave May 
>> wrote:
>>
>>> Hi all,
>>>
>>> I am trying to understand the logging information associated with the
>>> %flops-performed-on-the-gpu reported by -log_view when running
>>>   src/ksp/ksp/tutorials/ex34
>>> with the following options
>>> -da_grid_x 192
>>> -da_grid_y 192
>>> -da_grid_z 192
>>> -dm_mat_type seqaijhipsparse
>>> -dm_vec_type seqhip
>>> -ksp_max_it 10
>>> -ksp_monitor
>>> -ksp_type richardson
>>> -ksp_view
>>> -log_view
>>> -mg_coarse_ksp_max_it 2
>>> -mg_coarse_ksp_type richardson
>>> -mg_coarse_pc_type none
>>> -mg_levels_ksp_type richardson
>>> -mg_levels_pc_type none
>>> -options_left
>>> -pc_mg_levels 3
>>> -pc_mg_log
>>> -pc_type mg
>>>
>>> This config is not intended to actually solve the problem, rather it is
>>> a stripped down set of options designed to 

Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-19 Thread Barry Smith

  Junchao

I run the following on the CI machine, why does this happen? With trivial 
solver options it runs ok.

bsmith@petsc-gpu-02:/scratch/bsmith/petsc/src/ksp/ksp/tutorials$ ./ex34 
-da_grid_x 192 -da_grid_y 192 -da_grid_z 192 -dm_mat_type seqaijhipsparse 
-dm_vec_type seqhip -ksp_max_it 10 -ksp_monitor -ksp_type richardson -ksp_view 
-log_view -mg_coarse_ksp_max_it 2 -mg_coarse_ksp_type richardson 
-mg_coarse_pc_type none -mg_levels_ksp_type richardson -mg_levels_pc_type none 
-options_left -pc_mg_levels 3 -pc_mg_log -pc_type mg
[0]PETSC ERROR: - Error Message 
--
[0]PETSC ERROR: GPU error
[0]PETSC ERROR: hipSPARSE errorcode 3 (HIPSPARSE_STATUS_INVALID_VALUE)
[0]PETSC ERROR: WARNING! There are unused option(s) set! Could be the program 
crashed before usage or a spelling mistake, etc!
[0]PETSC ERROR:   Option left: name:-options_left (no value) source: command 
line
[0]PETSC ERROR: See https://petsc.org/release/faq/ for trouble shooting.
[0]PETSC ERROR: Petsc Release Version 3.20.3, unknown 
[0]PETSC ERROR: ./ex34 on a  named petsc-gpu-02 by bsmith Fri Jan 19 14:15:20 
2024
[0]PETSC ERROR: Configure options 
--package-prefix-hash=/home/bsmith/petsc-hash-pkgs --with-make-np=24 
--with-make-test-np=8 --with-hipc=/opt/rocm-5.4.3/bin/hipcc 
--with-hip-dir=/opt/rocm-5.4.3 COPTFLAGS="-g -O" FOPTFLAGS="-g -O" 
CXXOPTFLAGS="-g -O" HIPOPTFLAGS="-g -O" --with-cuda=0 --with-hip=1 
--with-precision=double --with-clanguage=c --download-kokkos 
--download-kokkos-kernels --download-hypre --download-magma 
--with-magma-fortran-bindings=0 --download-mfem --download-metis 
--with-strict-petscerrorcode PETSC_ARCH=arch-ci-linux-hip-double
[0]PETSC ERROR: #1 MatMultAddKernel_SeqAIJHIPSPARSE() at 
/scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3131
[0]PETSC ERROR: #2 MatMultAdd_SeqAIJHIPSPARSE() at 
/scratch/bsmith/petsc/src/mat/impls/aij/seq/seqhipsparse/aijhipsparse.hip.cpp:3004
[0]PETSC ERROR: #3 MatMultAdd() at 
/scratch/bsmith/petsc/src/mat/interface/matrix.c:2770
[0]PETSC ERROR: #4 MatInterpolateAdd() at 
/scratch/bsmith/petsc/src/mat/interface/matrix.c:8603
[0]PETSC ERROR: #5 PCMGMCycle_Private() at 
/scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:87
[0]PETSC ERROR: #6 PCMGMCycle_Private() at 
/scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:83
[0]PETSC ERROR: #7 PCApply_MG_Internal() at 
/scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:611
[0]PETSC ERROR: #8 PCApply_MG() at 
/scratch/bsmith/petsc/src/ksp/pc/impls/mg/mg.c:633
[0]PETSC ERROR: #9 PCApply() at 
/scratch/bsmith/petsc/src/ksp/pc/interface/precon.c:498
[0]PETSC ERROR: #10 KSP_PCApply() at 
/scratch/bsmith/petsc/include/petsc/private/kspimpl.h:383
[0]PETSC ERROR: #11 KSPSolve_Richardson() at 
/scratch/bsmith/petsc/src/ksp/ksp/impls/rich/rich.c:106
[0]PETSC ERROR: #12 KSPSolve_Private() at 
/scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:906
[0]PETSC ERROR: #13 KSPSolve() at 
/scratch/bsmith/petsc/src/ksp/ksp/interface/itfunc.c:1079
[0]PETSC ERROR: #14 main() at ex34.c:52
[0]PETSC ERROR: PETSc Option Table entries:

  Dave,

Trying to debug the 7% now, but having trouble running, as you see above.



> On Jan 19, 2024, at 3:02 PM, Dave May  wrote:
> 
> Thank you Barry and Junchao for these explanations. I'll turn on 
> -log_view_gpu_time.
> 
> Do either of you have any thoughts regarding why the percentage of flop's 
> being reported on the GPU is not 100% for MGSmooth Level {0,1,2} for this 
> solver configuration?
> 
> This number should have nothing to do with timings as it reports the ratio of 
> operations performed on the GPU and CPU, presumably obtained from 
> PetscLogFlops() and PetscLogGpuFlops().
> 
> Cheers,
> Dave
> 
> On Fri, 19 Jan 2024 at 11:39, Junchao Zhang  > wrote:
>> Try to also add -log_view_gpu_time, 
>> https://petsc.org/release/manualpages/Profiling/PetscLogGpuTime/
>> 
>> --Junchao Zhang
>> 
>> 
>> On Fri, Jan 19, 2024 at 11:35 AM Dave May > > wrote:
>>> Hi all,
>>> 
>>> I am trying to understand the logging information associated with the 
>>> %flops-performed-on-the-gpu reported by -log_view when running 
>>>   src/ksp/ksp/tutorials/ex34
>>> with the following options
>>> -da_grid_x 192
>>> -da_grid_y 192
>>> -da_grid_z 192
>>> -dm_mat_type seqaijhipsparse
>>> -dm_vec_type seqhip
>>> -ksp_max_it 10
>>> -ksp_monitor
>>> -ksp_type richardson
>>> -ksp_view
>>> -log_view
>>> -mg_coarse_ksp_max_it 2
>>> -mg_coarse_ksp_type richardson
>>> -mg_coarse_pc_type none
>>> -mg_levels_ksp_type richardson
>>> -mg_levels_pc_type none
>>> -options_left
>>> -pc_mg_levels 3
>>> -pc_mg_log
>>> -pc_type mg
>>> 
>>> This config is not intended to actually solve the problem, rather it is a 
>>> stripped down set of options designed to understand what parts of the 
>>> smoothers are being executed on the GPU.
>>> 
>>> With respect to the log file attached, my first set 

Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-19 Thread Dave May
Thank you Barry and Junchao for these explanations. I'll turn on
-log_view_gpu_time.

Do either of you have any thoughts regarding why the percentage of flop's
being reported on the GPU is not 100% for MGSmooth Level {0,1,2} for this
solver configuration?

This number should have nothing to do with timings as it reports the ratio
of operations performed on the GPU and CPU, presumably obtained from
PetscLogFlops() and PetscLogGpuFlops().

Cheers,
Dave

On Fri, 19 Jan 2024 at 11:39, Junchao Zhang  wrote:

> Try to also add -log_view_gpu_time,
> https://petsc.org/release/manualpages/Profiling/PetscLogGpuTime/
>
> --Junchao Zhang
>
>
> On Fri, Jan 19, 2024 at 11:35 AM Dave May  wrote:
>
>> Hi all,
>>
>> I am trying to understand the logging information associated with the
>> %flops-performed-on-the-gpu reported by -log_view when running
>>   src/ksp/ksp/tutorials/ex34
>> with the following options
>> -da_grid_x 192
>> -da_grid_y 192
>> -da_grid_z 192
>> -dm_mat_type seqaijhipsparse
>> -dm_vec_type seqhip
>> -ksp_max_it 10
>> -ksp_monitor
>> -ksp_type richardson
>> -ksp_view
>> -log_view
>> -mg_coarse_ksp_max_it 2
>> -mg_coarse_ksp_type richardson
>> -mg_coarse_pc_type none
>> -mg_levels_ksp_type richardson
>> -mg_levels_pc_type none
>> -options_left
>> -pc_mg_levels 3
>> -pc_mg_log
>> -pc_type mg
>>
>> This config is not intended to actually solve the problem, rather it is a
>> stripped down set of options designed to understand what parts of the
>> smoothers are being executed on the GPU.
>>
>> With respect to the log file attached, my first set of questions related
>> to the data reported under "Event Stage 2: MG Apply".
>>
>> [1] Why is the log littered with nan's?
>> * I don't understand how and why "GPU Mflop/s" should be reported as nan
>> when a value is given for "GPU %F" (see MatMult for example).
>>
>> * For events executed on the GPU, I assume the column "Time (sec)"
>> relates to "CPU execute time", this would explain why we see a nan in "Time
>> (sec)" for MatMult.
>> If my assumption is correct, how should I interpret the column "Flop
>> (Max)" which is showing 1.92e+09?
>> I would assume of "Time (sec)" relates to the CPU then "Flop (Max)"
>> should also relate to CPU and GPU flops would be logged in "GPU Mflop/s"
>>
>> [2] More curious is that within "Event Stage 2: MG Apply" KSPSolve,
>> MGSmooth Level 0, MGSmooth Level 1, MGSmooth Level 2 all report "GPU %F" as
>> 93. I believe this value should be 100 as the smoother (and coarse grid
>> solver) are configured as richardson(2)+none and thus should run entirely
>> on the GPU.
>> Furthermore, when one inspects all events listed under "Event Stage 2: MG
>> Apply" those events which do flops correctly report "GPU %F" as 100.
>> And the events showing "GPU %F" = 0 such as
>>   MatHIPSPARSCopyTo, VecCopy, VecSet, PCApply, DCtxSync
>> don't do any flops (on the CPU or GPU) - which is also correct
>> (although non GPU events should show nan??)
>>
>> Hence I am wondering what is the explanation for the missing 7% from "GPU
>> %F" for KSPSolve and MGSmooth {0,1,2}??
>>
>> Does anyone understand this -log_view, or can explain to me how to
>> interpret it?
>>
>> It could simply be that:
>> a) something is messed up with -pc_mg_log
>> b) something is messed up with the PETSc build
>> c) I am putting too much faith in -log_view and should profile the code
>> differently.
>>
>> Either way I'd really like to understand what is going on.
>>
>>
>> Cheers,
>> Dave
>>
>>
>>
>>


Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-19 Thread Junchao Zhang
Try to also add -log_view_gpu_time,
https://petsc.org/release/manualpages/Profiling/PetscLogGpuTime/

--Junchao Zhang


On Fri, Jan 19, 2024 at 11:35 AM Dave May  wrote:

> Hi all,
>
> I am trying to understand the logging information associated with the
> %flops-performed-on-the-gpu reported by -log_view when running
>   src/ksp/ksp/tutorials/ex34
> with the following options
> -da_grid_x 192
> -da_grid_y 192
> -da_grid_z 192
> -dm_mat_type seqaijhipsparse
> -dm_vec_type seqhip
> -ksp_max_it 10
> -ksp_monitor
> -ksp_type richardson
> -ksp_view
> -log_view
> -mg_coarse_ksp_max_it 2
> -mg_coarse_ksp_type richardson
> -mg_coarse_pc_type none
> -mg_levels_ksp_type richardson
> -mg_levels_pc_type none
> -options_left
> -pc_mg_levels 3
> -pc_mg_log
> -pc_type mg
>
> This config is not intended to actually solve the problem, rather it is a
> stripped down set of options designed to understand what parts of the
> smoothers are being executed on the GPU.
>
> With respect to the log file attached, my first set of questions related
> to the data reported under "Event Stage 2: MG Apply".
>
> [1] Why is the log littered with nan's?
> * I don't understand how and why "GPU Mflop/s" should be reported as nan
> when a value is given for "GPU %F" (see MatMult for example).
>
> * For events executed on the GPU, I assume the column "Time (sec)" relates
> to "CPU execute time", this would explain why we see a nan in "Time (sec)"
> for MatMult.
> If my assumption is correct, how should I interpret the column "Flop
> (Max)" which is showing 1.92e+09?
> I would assume of "Time (sec)" relates to the CPU then "Flop (Max)" should
> also relate to CPU and GPU flops would be logged in "GPU Mflop/s"
>
> [2] More curious is that within "Event Stage 2: MG Apply" KSPSolve,
> MGSmooth Level 0, MGSmooth Level 1, MGSmooth Level 2 all report "GPU %F" as
> 93. I believe this value should be 100 as the smoother (and coarse grid
> solver) are configured as richardson(2)+none and thus should run entirely
> on the GPU.
> Furthermore, when one inspects all events listed under "Event Stage 2: MG
> Apply" those events which do flops correctly report "GPU %F" as 100.
> And the events showing "GPU %F" = 0 such as
>   MatHIPSPARSCopyTo, VecCopy, VecSet, PCApply, DCtxSync
> don't do any flops (on the CPU or GPU) - which is also correct
> (although non GPU events should show nan??)
>
> Hence I am wondering what is the explanation for the missing 7% from "GPU
> %F" for KSPSolve and MGSmooth {0,1,2}??
>
> Does anyone understand this -log_view, or can explain to me how to
> interpret it?
>
> It could simply be that:
> a) something is messed up with -pc_mg_log
> b) something is messed up with the PETSc build
> c) I am putting too much faith in -log_view and should profile the code
> differently.
>
> Either way I'd really like to understand what is going on.
>
>
> Cheers,
> Dave
>
>
>
>


Re: [petsc-users] Trying to understand -log_view when using HIP kernels (ex34)

2024-01-19 Thread Barry Smith


   Nans indicate we do not have valid computational times for these operations; 
think of them as Not Available. Providing valid times for the "inner" 
operations listed with Nans requires inaccurate times (higher) for the outer 
operations, since extra synchronization between the CPU and GPU must be done to 
get valid times for the inner options. We opted to have the best valid times 
for the outer operations since those times reflect the time of the application.





> On Jan 19, 2024, at 12:35 PM, Dave May  wrote:
> 
> Hi all,
> 
> I am trying to understand the logging information associated with the 
> %flops-performed-on-the-gpu reported by -log_view when running 
>   src/ksp/ksp/tutorials/ex34
> with the following options
> -da_grid_x 192
> -da_grid_y 192
> -da_grid_z 192
> -dm_mat_type seqaijhipsparse
> -dm_vec_type seqhip
> -ksp_max_it 10
> -ksp_monitor
> -ksp_type richardson
> -ksp_view
> -log_view
> -mg_coarse_ksp_max_it 2
> -mg_coarse_ksp_type richardson
> -mg_coarse_pc_type none
> -mg_levels_ksp_type richardson
> -mg_levels_pc_type none
> -options_left
> -pc_mg_levels 3
> -pc_mg_log
> -pc_type mg
> 
> This config is not intended to actually solve the problem, rather it is a 
> stripped down set of options designed to understand what parts of the 
> smoothers are being executed on the GPU.
> 
> With respect to the log file attached, my first set of questions related to 
> the data reported under "Event Stage 2: MG Apply".
> 
> [1] Why is the log littered with nan's?
> * I don't understand how and why "GPU Mflop/s" should be reported as nan when 
> a value is given for "GPU %F" (see MatMult for example).
> 
> * For events executed on the GPU, I assume the column "Time (sec)" relates to 
> "CPU execute time", this would explain why we see a nan in "Time (sec)" for 
> MatMult.
> If my assumption is correct, how should I interpret the column "Flop (Max)" 
> which is showing 1.92e+09? 
> I would assume of "Time (sec)" relates to the CPU then "Flop (Max)" should 
> also relate to CPU and GPU flops would be logged in "GPU Mflop/s"
> 
> [2] More curious is that within "Event Stage 2: MG Apply" KSPSolve, MGSmooth 
> Level 0, MGSmooth Level 1, MGSmooth Level 2 all report "GPU %F" as 93. I 
> believe this value should be 100 as the smoother (and coarse grid solver) are 
> configured as richardson(2)+none and thus should run entirely on the GPU. 
> Furthermore, when one inspects all events listed under "Event Stage 2: MG 
> Apply" those events which do flops correctly report "GPU %F" as 100. 
> And the events showing "GPU %F" = 0 such as 
>   MatHIPSPARSCopyTo, VecCopy, VecSet, PCApply, DCtxSync
> don't do any flops (on the CPU or GPU) - which is also correct (although non 
> GPU events should show nan??)
> 
> Hence I am wondering what is the explanation for the missing 7% from "GPU %F" 
> for KSPSolve and MGSmooth {0,1,2}??
> 
> Does anyone understand this -log_view, or can explain to me how to interpret 
> it?
> 
> It could simply be that:
> a) something is messed up with -pc_mg_log
> b) something is messed up with the PETSc build
> c) I am putting too much faith in -log_view and should profile the code 
> differently.
> 
> Either way I'd really like to understand what is going on.
> 
> 
> Cheers,
> Dave
> 
> 
> 
>