Re: [petsc-dev] error with karlrupp/fix-cuda-streams

Karl Rupp via petsc-dev Fri, 27 Sep 2019 21:56:17 -0700

Hi Mark,

OK, so now the problem has shifted somewhat in that it now manifestsitself on small cases. In earlier investigation I was drawn toMatTranspose but had a hard time pinning it down. The bug seems morestable now or you probably fixed what looks like all the other bugs.
I added print statements with norms of vectors in mg.c (v-cycle) andfound that the diffs between the CPU and GPU runs came in MatRestrict,which calls MatMultTranspose. I added identical print statements in thetwo versions of MatMultTranspose and see this. (pinning to the CPU doesnot seem to make any difference). Note that the problem comes in the 2nditeration where the *output* vector is non-zero coming in (this shouldnot matter).
Karl, I zeroed out the output vector (yy) when I come into this methodand it fixed the problem. This is with -n 4, and this always works with-n 3. See the attached process layouts. It looks like this comes whenyou use the 2nd socket.
So this looks like an Nvidia bug. Let me know what you think and I canpass it on to ORNL.

Hmm, there were some issues with MatMultTranspose_MPIAIJ at some point.I've addressed some of them, but I can't confidently say that all of theissues were fixed. Thus, I don't think it's a problem in NVIDIA'scuSparse, but rather something we need to fix in PETSc. Note that theproblem shows up with multiple MPI ranks; if it were a problem incuSparse, it would show up on a single rank as well.


Best regards,
Karli

06:49 /gpfs/alpine/geo127/scratch/adams$ jsrun*-n 4 *-a 4 -c 4 -g 1./ex56 -cells 8,12,16 *-ex56_dm_vec_type cuda -ex56_dm_mat_type aijcusparse*
[0] 3465 global equations, 1155 vertices
[0] 3465 equations in vector, 1155 vertices
   0 SNES Function norm 1.725526579328e+01
     0 KSP Residual norm 1.725526579328e+01
         2) call Restrict with |r| = 1.402719214830704e+01
MatMultTranspose_MPIAIJCUSPARSE |x in| =1.40271921483070e+01* MatMultTranspose_MPIAIJ |y in| =0.00000000000000e+00* MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJCUSPARSE |yy| =3.43436359545813e+00 MatMultTranspose_MPIAIJCUSPARSE final |yy| =1.29055494844681e+01
                 3) |R| = 1.290554948446808e+01
         2) call Restrict with |r| = 4.109771717986951e+00
MatMultTranspose_MPIAIJCUSPARSE |x in| =4.10977171798695e+00* MatMultTranspose_MPIAIJ |y in| =0.00000000000000e+00* MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJCUSPARSE |yy| =1.79415048609144e-01 MatMultTranspose_MPIAIJCUSPARSE final |yy| =9.01083013948788e-01
                 3) |R| = 9.010830139487883e-01
                 4) |X| = 2.864698671963022e+02
                 5) |x| = 9.763280000911783e+02
                 6) post smooth |x| = 8.940011621494751e+02
                 4) |X| = 8.940011621494751e+02
                 5) |x| = 1.005081556495388e+03
                 6) post smooth |x| = 1.029043994031627e+03
     1 KSP Residual norm 8.102614049404e+00
         2) call Restrict with |r| = 4.402603749876137e+00
MatMultTranspose_MPIAIJCUSPARSE |x in| =4.40260374987614e+00* MatMultTranspose_MPIAIJ |y in| =1.29055494844681e+01* MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJCUSPARSE |yy| =1.68544559626318e+00 MatMultTranspose_MPIAIJCUSPARSE final |yy| =1.82129824300863e+00
                 3) |R| = 1.821298243008628e+00
         2) call Restrict with |r| = 1.068309793900564e+00
MatMultTranspose_MPIAIJCUSPARSE |x in| =1.06830979390056e+00 MatMultTranspose_MPIAIJ |y in| =9.01083013948788e-01 MatMultTranspose_MPIAIJCUSPARSE |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJCUSPARSE |yy| =1.40519177065298e-01 MatMultTranspose_MPIAIJCUSPARSE final |yy| =1.01853904152812e-01
                 3) |R| = 1.018539041528117e-01
                 4) |X| = 4.949616392884510e+01
                 5) |x| = 9.309440014159884e+01
                 6) post smooth |x| = 5.432486021529479e+01
                 4) |X| = 5.432486021529479e+01
                 5) |x| = 8.246142532204632e+01
                 6) post smooth |x| = 7.605703654091440e+01
   Linear solve did not converge due to DIVERGED_ITS iterations 1
Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
06:50 /gpfs/alpine/geo127/scratch/adams$ jsrun -n 4 -a 4 -c 4 -g 1./ex56 -cells 8,12,16
[0] 3465 global equations, 1155 vertices
[0] 3465 equations in vector, 1155 vertices
   0 SNES Function norm 1.725526579328e+01
     0 KSP Residual norm 1.725526579328e+01
         2) call Restrict with |r| = 1.402719214830704e+01
MatMultTranspose_MPIAIJ |x in| =1.40271921483070e+01* MatMultTranspose_MPIAIJ |y in| =0.00000000000000e+00* MatMultTranspose_MPIAIJ |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJ |yy| =3.43436359545813e+00 MatMultTranspose_MPIAIJ final |yy| =1.29055494844681e+01
                 3) |R| = 1.290554948446809e+01
         2) call Restrict with |r| = 4.109771717986956e+00
MatMultTranspose_MPIAIJ |x in| =4.10977171798696e+00* MatMultTranspose_MPIAIJ |y in| =0.00000000000000e+00* MatMultTranspose_MPIAIJ |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJ |yy| =1.79415048609143e-01 MatMultTranspose_MPIAIJ final |yy| =9.01083013948789e-01
                 3) |R| = 9.010830139487889e-01
                 4) |X| = 2.864698671963023e+02
                 5) |x| = 9.763280000911785e+02
                 6) post smooth |x| = 8.940011621494754e+02
                 4) |X| = 8.940011621494754e+02
                 5) |x| = 1.005081556495388e+03
                 6) post smooth |x| = 1.029043994031627e+03
     1 KSP Residual norm 8.102614049404e+00
         2) call Restrict with |r| = 4.402603749876139e+00
MatMultTranspose_MPIAIJ |x in| =4.40260374987614e+00* MatMultTranspose_MPIAIJ |y in| =1.29055494844681e+01* MatMultTranspose_MPIAIJ |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJ |yy| =4.43650979822523e-01 MatMultTranspose_MPIAIJ final |yy| =1.18089369006243e+00
                 3) |R| = 1.180893690062426e+00
         2) call Restrict with |r| = 6.868764720156294e-01
MatMultTranspose_MPIAIJ |x in| =6.86876472015629e-01 MatMultTranspose_MPIAIJ |y in| =9.01083013948789e-01 MatMultTranspose_MPIAIJ |a->lvec| =0.00000000000000e+00 *** MatMultTranspose_MPIAIJ |yy| =3.36768099045088e-02 MatMultTranspose_MPIAIJ final |yy| =6.40334376876017e-02
                 3) |R| = 6.403343768760170e-02
                 4) |X| = 2.380471873599142e+01
                 5) |x| = 6.932703848368443e+01
                 6) post smooth |x| = 4.502536862656444e+01
                 4) |X| = 4.502536862656444e+01
                 5) |x| = 7.998534854728734e+01
                 6) post smooth |x| = 7.660075651381680e+01
   Linear solve did not converge due to DIVERGED_ITS iterations 1
Nonlinear solve did not converge due to DIVERGED_LINEAR_SOLVE iterations 0
06:50  /gpfs/alpine/geo127/scratch/adams$

Re: [petsc-dev] error with karlrupp/fix-cuda-streams

Reply via email to