uploaded tarball and pushed https://bitbucket.org/petsc/petsc-dev/commits/e95fd54300be1e05489068b844fd57c7
satish On Sat, 2 Feb 2013, Karl Rupp wrote: > Hi, > > alright, here we go: > https://bitbucket.org/petsc/petsc-dev/commits/ccdf0150dce67cfc50e1ec80872f3d5d > > Satish, could you please upload txpetscgpu-0.0.9.tar.gz (and eventually update > the download URL in the build system)? > > Thanks and best regards, > Karli > > > On 02/02/2013 02:04 PM, Paul Mullowney wrote: > > Hi Karl, > > > > I pulled from petsc-dev this morning and reworked the patch. Everything > > is working as expected. Regarding your comments, the initialization of > > CUSPARRAY * variable is done correctly in VecCUSPGetArrayRead() and > > VecCUSPGetArrayWrite(). Thus the initializations to PETSC_NULL is not > > required and the compiler warning are removed. In this patch, I fixed > > the initialization of VecCUSPGetArrayWrite() (ArrayRead() was working > > correctly previous to this patch). > > > > Regarding your second comment, the PETSc KSP algorithms use an identity > > when doing Hermitian solves and multiplies. In particular, the > > conjugation of the input and output vectors is done so that one should > > only do the Transpose multiply and solve. For instance in bicg.c, one has > > > > ierr = VecConjugate(Rl);CHKERRQ(ierr); > > ierr = KSP_PCApplyTranspose(ksp,Rl,Zl);CHKERRQ(ierr); > > ierr = VecConjugate(Rl);CHKERRQ(ierr); > > ierr = VecConjugate(Zl);CHKERRQ(ierr); > > > > The conjugation of the input and output vectors forces one to use the > > Transpose solve and not the Hermitian solve. The same holds for the > > multiplies. > > > > Also attached is a new tarball for download once this patch is pushed. > > Thanks, > > -Paul > > > > > Hi Paul, > > > > > > just a few questions on your patch: > > > > > > I've spotted a few replacements of the kind: > > > - CUSPARRAY *xGPU=PETSC_NULL, *bGPU=PETSC_NULL; > > > + CUSPARRAY *xGPU, *bGPU; > > > Is this intentional? This is likely to lead to warnings. I skipped > > > these changes. > > > > > > Also, there is > > > -#if !defined(PETSC_USE_COMPLEX) > > > ierr = cusparseMat->mat->multiply(...,TRANSPOSE);... > > > -#else > > > - ierr = cusparseMat->mat->multiply(...,HERMITIAN);... > > > -#endif > > > Is it safe to throw out the Hermitian transpose here? I've seen that > > > the path adds a kernel for hermitian transpose, but I want to make > > > sure this does not cause any side effects. > > > > > > A patch for the current tip is attached, including the removal of the > > > preprocessor switch for PETSC_USE_COMPLEX. However, I can't test it on > > > my AMD machine right now... > > > > > > Best regards, > > > Karli > > > > > > > > > > > > > > > On 02/01/2013 06:41 PM, Jed Brown wrote: > > > > That's gonna suck. Karl, can you apply his patch to the old code, run > > > > uncrustify on it, then send out the diff (which should apply cleanly to > > > > head). > > > > > > > > On Feb 1, 2013 6:32 PM, "Karl Rupp" <rupp at mcs.anl.gov > > > > <mailto:rupp at mcs.anl.gov>> wrote: > > > > > > > > Hi Paul, > > > > > > > > I just uncrustified src/mat/impls/aij/* and pushed it to petsc-dev. > > > > Could you please re-generate your patch based on the latest commit? > > > > > > > > Thanks and best regards, > > > > Karli > > > > > > > > > > > > On 02/01/2013 06:11 PM, Paul Mullowney wrote: > > > > > > > > Hi, > > > > > > > > Here's a reworked patch for running BiCG on GPUs (with ILU(0) > > > > preconditioners) on GPUs for the aijcusparse.cu > > > > <http://aijcusparse.cu> class. I fixed the > > > > comments from the previous emails on this patch. In particular, > > > > I added > > > > > > > > (1) VecConjugate implementation in veccusp.cu > > > > <http://veccusp.cu> with the correct method > > > > for getting the device ptr (VecCUSPGetArrayReadWrite()). > > > > (2) Various methods in aijcusparse.cu <http://aijcusparse.cu> > > > > for building the transpose > > > > matrices for MatSolveTranspose* methods. The implementation > > > > of the > > > > solves is done under the hood in the txpetscgpu library. A > > > > protection > > > > was added to ensure the matrix generation routines are only > > > > called once. > > > > (3) I fixed the uninitialized compiler warning when building in > > > > double > > > > complex. This required a slight fix in VecCUSPGetArrayWrite(). > > > > (4) Small Style fixes. > > > > > > > > I wasn't clear to me how to break this up patch into a small > > > > organizational patch and then a large implementation patch. If > > > > you have > > > > suggestions on what corresponds to organization and what > > > > corresponds to > > > > implementation, I can try to do that in subsequent patches. > > > > > > > > Everything builds and runs fine on my end. > > > > > > > > Thanks, > > > > -Paul > > > > > > > > > > > > > > >
