Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

Mark Adams via petsc-dev Tue, 09 Jul 2019 13:17:30 -0700

I am stumped with this GPU bug(s). Maybe someone has an idea.

I did find a bug in the cuda transpose mat-vec that cuda-memcheck detected,
but I still have differences between the GPU and CPU transpose mat-vec.
I've got it down to a very simple test: bicg/none on a tiny mesh with two
processors. It works on one processor or with cg/none. So it is the
transpose mat-vec.


I see that the result of the off-diagonal  (a->lvec) is different* only
proc 1*. I instrumented MatMultTranspose_MPIAIJ[CUSPARSE] with norms of mat
and vec and printed out matlab vectors. Below is the CPU output and then
the GPU with a view of the scatter object, which is identical as you can
see.

The matlab B matrix and xx vector are identical. Maybe the GPU copy
is wrong ...

The only/first difference between CPU and GPU is a->lvec (the off diagonal
contribution)on processor 1. (you can see the norms are *different*). Here
is the diff on the process 1 a->lvec vector (all values are off).

Any thoughts would be appreciated,
Mark

15:30 1  /gpfs/alpine/scratch/adams/geo127$ diff lvgpu.m lvcpu.m
2,12c2,12
< %  type: seqcuda
< Vec_0x53738630_0 = [
< 9.5702137431412879e+00
< 2.1970298791152253e+01
< 4.5422290209190646e+00
< 2.0185031807270226e+00
< 4.2627312508573375e+01
< 1.0889191983882025e+01
< 1.6038202417695462e+01
< 2.7155672033607665e+01
< 6.2540357853223556e+00
---
> %  type: seq
> Vec_0x3a546440_0 = [
> 4.5565851251714653e+00
> 1.0460532998971189e+01
> 2.1626531807270220e+00
> 9.6105288923182408e-01
> 2.0295782656035659e+01
> 5.1845791066529463e+00
> 7.6361340020576058e+00
> 1.2929401011659799e+01
> 2.9776812928669392e+00

15:15 130  /gpfs/alpine/scratch/adams/geo127$ jsrun -n 1 -c 2 -a 2 -g 1
./ex56 -cells 2,2,1
[0] 27 global equations, 9 vertices
[0] 27 equations in vector, 9 vertices
  0 SNES Function norm 1.223958326481e+02
    0 KSP Residual norm 1.223958326481e+02
[0] |x|=  1.223958326481e+02 |a->lvec|=  1.773965489475e+01 |B|=
 1.424708937136e+00
[1] |x|=  1.223958326481e+02 |a->lvec|=  *2.844171413778e*+01 |B|=
 1.424708937136e+00
[1] 1) |yy|=  2.007423334680e+02
[0] 1) |yy|=  2.007423334680e+02
[0] 2) |yy|=  1.957605719265e+02
[1] 2) |yy|=  1.957605719265e+02
[1] Number sends = 1; Number to self = 0
[1]   0 length = 9 to whom 0
Now the indices for all remote sends (in order by process sent to)
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] Number receives = 1; Number from self = 0
[1] 0 length 9 from whom 0
Now the indices for all remote receives (in order by process received from)
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
    1 KSP Residual norm 8.199932342150e+01
  Linear solve did not converge due to DIVERGED_ITS iterations 1
Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0


15:19  /gpfs/alpine/scratch/adams/geo127$ jsrun -n 1 -c 2 -a 2 -g 1 ./ex56
-cells 2,2,1 *-ex56_dm_mat_type aijcusparse -ex56_dm_vec_type cuda*
[0] 27 global equations, 9 vertices
[0] 27 equations in vector, 9 vertices
  0 SNES Function norm 1.223958326481e+02
    0 KSP Residual norm 1.223958326481e+02
[0] |x|=  1.223958326481e+02 |a->lvec|=  1.773965489475e+01 |B|=
 1.424708937136e+00
[1] |x|=  1.223958326481e+02 |a->lvec|=  *5.973624458725e*+01 |B|=
 1.424708937136e+00
[0] 1) |yy|=  2.007423334680e+02
[1] 1) |yy|=  2.007423334680e+02
[0] 2) |yy|=  1.953571867633e+02
[1] 2) |yy|=  1.953571867633e+02
[1] Number sends = 1; Number to self = 0
[1]   0 length = 9 to whom 0
Now the indices for all remote sends (in order by process sent to)
[1] 9
[1] 10
[1] 11
[1] 12
[1] 13
[1] 14
[1] 15
[1] 16
[1] 17
[1] Number receives = 1; Number from self = 0
[1] 0 length 9 from whom 0
Now the indices for all remote receives (in order by process received from)
[1] 0
[1] 1
[1] 2
[1] 3
[1] 4
[1] 5
[1] 6
[1] 7
[1] 8
    1 KSP Residual norm 8.199932342150e+01

Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

Reply via email to