Hi, alright, here we go: https://bitbucket.org/petsc/petsc-dev/commits/ccdf0150dce67cfc50e1ec80872f3d5d
Satish, could you please upload txpetscgpu-0.0.9.tar.gz (and eventually update the download URL in the build system)? Thanks and best regards, Karli On 02/02/2013 02:04 PM, Paul Mullowney wrote: > Hi Karl, > > I pulled from petsc-dev this morning and reworked the patch. Everything > is working as expected. Regarding your comments, the initialization of > CUSPARRAY * variable is done correctly in VecCUSPGetArrayRead() and > VecCUSPGetArrayWrite(). Thus the initializations to PETSC_NULL is not > required and the compiler warning are removed. In this patch, I fixed > the initialization of VecCUSPGetArrayWrite() (ArrayRead() was working > correctly previous to this patch). > > Regarding your second comment, the PETSc KSP algorithms use an identity > when doing Hermitian solves and multiplies. In particular, the > conjugation of the input and output vectors is done so that one should > only do the Transpose multiply and solve. For instance in bicg.c, one has > > ierr = VecConjugate(Rl);CHKERRQ(ierr); > ierr = KSP_PCApplyTranspose(ksp,Rl,Zl);CHKERRQ(ierr); > ierr = VecConjugate(Rl);CHKERRQ(ierr); > ierr = VecConjugate(Zl);CHKERRQ(ierr); > > The conjugation of the input and output vectors forces one to use the > Transpose solve and not the Hermitian solve. The same holds for the > multiplies. > > Also attached is a new tarball for download once this patch is pushed. > Thanks, > -Paul > >> Hi Paul, >> >> just a few questions on your patch: >> >> I've spotted a few replacements of the kind: >> - CUSPARRAY *xGPU=PETSC_NULL, *bGPU=PETSC_NULL; >> + CUSPARRAY *xGPU, *bGPU; >> Is this intentional? This is likely to lead to warnings. I skipped >> these changes. >> >> Also, there is >> -#if !defined(PETSC_USE_COMPLEX) >> ierr = cusparseMat->mat->multiply(...,TRANSPOSE);... >> -#else >> - ierr = cusparseMat->mat->multiply(...,HERMITIAN);... >> -#endif >> Is it safe to throw out the Hermitian transpose here? I've seen that >> the path adds a kernel for hermitian transpose, but I want to make >> sure this does not cause any side effects. >> >> A patch for the current tip is attached, including the removal of the >> preprocessor switch for PETSC_USE_COMPLEX. However, I can't test it on >> my AMD machine right now... >> >> Best regards, >> Karli >> >> >> >> >> On 02/01/2013 06:41 PM, Jed Brown wrote: >>> That's gonna suck. Karl, can you apply his patch to the old code, run >>> uncrustify on it, then send out the diff (which should apply cleanly to >>> head). >>> >>> On Feb 1, 2013 6:32 PM, "Karl Rupp" <rupp at mcs.anl.gov >>> <mailto:rupp at mcs.anl.gov>> wrote: >>> >>> Hi Paul, >>> >>> I just uncrustified src/mat/impls/aij/* and pushed it to petsc-dev. >>> Could you please re-generate your patch based on the latest commit? >>> >>> Thanks and best regards, >>> Karli >>> >>> >>> On 02/01/2013 06:11 PM, Paul Mullowney wrote: >>> >>> Hi, >>> >>> Here's a reworked patch for running BiCG on GPUs (with ILU(0) >>> preconditioners) on GPUs for the aijcusparse.cu >>> <http://aijcusparse.cu> class. I fixed the >>> comments from the previous emails on this patch. In particular, >>> I added >>> >>> (1) VecConjugate implementation in veccusp.cu >>> <http://veccusp.cu> with the correct method >>> for getting the device ptr (VecCUSPGetArrayReadWrite()). >>> (2) Various methods in aijcusparse.cu <http://aijcusparse.cu> >>> for building the transpose >>> matrices for MatSolveTranspose* methods. The implementation >>> of the >>> solves is done under the hood in the txpetscgpu library. A >>> protection >>> was added to ensure the matrix generation routines are only >>> called once. >>> (3) I fixed the uninitialized compiler warning when building in >>> double >>> complex. This required a slight fix in VecCUSPGetArrayWrite(). >>> (4) Small Style fixes. >>> >>> I wasn't clear to me how to break this up patch into a small >>> organizational patch and then a large implementation patch. If >>> you have >>> suggestions on what corresponds to organization and what >>> corresponds to >>> implementation, I can try to do that in subsequent patches. >>> >>> Everything builds and runs fine on my end. >>> >>> Thanks, >>> -Paul >>> >>> >> >
