On Nov 17, 2010, at 12:23 PM, SUN Chun wrote: > Hi Barry, > > Thanks for your complete reply. Everything is clear w.r.t. my previous > questions. > > My understanding is the following: As long as I create matrix with type > MPIAIJCUDA in the same way as I do with MPIAIJ, it should be performing Mat, > Vec, and Ksp on the GPU side, while everything else on the CPU side? (So my > code doesn't change except for the MatCreateMPIAIJ -> MatCreateMPIAIJCUDA)
Yes, but currently only the MatMult() takes place on the GPU, all other matrix operations happen on the CPU. Barry > > Thanks, > Chun > > -----Original Message----- > From: petsc-users-bounces at mcs.anl.gov [mailto:petsc-users-bounces at > mcs.anl.gov] On Behalf Of Barry Smith > Sent: Wednesday, November 17, 2010 12:41 PM > To: PETSc users list > Subject: Re: [petsc-users] GPU questions > > > On Nov 17, 2010, at 9:48 AM, SUN Chun wrote: > >> Hi PETSc developers, >> >> I have some questions regarding GPGPU support in PETSc. Sorry if these >> questions are redundant, I didn't browse the dev code too carefully... >> >> 1. The only example I can find is tutorial/ex47. Is there a plan to provide >> more examples involving KSP with, say, simple Jacobi PC? I thought KSP is >> supported but PC is not. > > Any example that uses DAGetMatrix(), DAGetGlobalVector() or DMMG can be run > with -da_vec_type cuda -da_mat_type aijcuda to use the GPUS > > The whole idea behind PETSc is to have the SAME code for different solvers > and different systems so we will not have a bunch of examples just for GPUs. > >> >> 2. Would you please comment on the difficulty in supporting PCs? Like ILU, >> SSOR.... Would you please also comment on the difficulty in supporting >> external libraries such as ML? > > We don't have code for triangular solves on the GPU, without those ILU and > SSOR cannot run on GPUs. Once someone provides triangular solves for GPUs we > can add there use and put ILU and SSOR onto the GPUs with PETSc. Regarding ML > that is totally up their developers. Note that Nvidia has a smooth > agglomaration algorithm for symmetric problems in CUSP that you can access > via PETSc as the PCSACUDA PC (not yet stable, so possibly bugs). > >> >> 3. I noticed (might be wrong), MatMult in CUDA is implemented in such a way >> that we copy the lhs and rhs to GPU each time before we do MatVec. I >> understand that you may have to do this to ensure MatMult being robust, but >> I'm worried about performance. Is it possible, say, like in KSP, we keep the >> lhs and intermediate results on the GPU side? > > Look at the code more closes. VecCUDACopyToGPU() (also the MatCUDACopy...) > ONLY copy down if the values are NOT already on the GPU. This means once the > vectors are on the GPU they remain there and are NOT copied back and forth > for each multiply. > > ierr = MatCUDACopyToGPU(A);CHKERRQ(ierr); > ierr = VecCUDACopyToGPU(xx);CHKERRQ(ierr); > ierr = VecCUDAAllocateCheck(yy);CHKERRQ(ierr); > if (usecprow){ /* use compressed row format */ > try { > cusp::multiply(*cudastruct->mat,*((Vec_CUDA > *)xx->spptr)->GPUarray,*cudastruct->tempvec); > ierr = VecSet_SeqCUDA(yy,0.0);CHKERRQ(ierr); > > thrust::copy(cudastruct->tempvec->begin(),cudastruct->tempvec->end(),thrust::make_permutation_iterator(((Vec_CUDA > *)yy->spptr)->GPUarray->begin(),cudastruct->indices->begin())); > } catch (char* ex) { > SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"CUDA error: %s", ex); > } > } else { /* do not use compressed row format */ > try { > cusp::multiply(*cudastruct->mat,*((Vec_CUDA > *)xx->spptr)->GPUarray,*((Vec_CUDA *)yy->spptr)->GPUarray); > } catch(char* ex) { > SETERRQ1(PETSC_COMM_SELF,PETSC_ERR_LIB,"CUDA error: %s", ex); > } > } > yy->valid_GPU_array = PETSC_CUDA_GPU; > ierr = WaitForGPU();CHKERRCUDA(ierr); > >> >> 4. Any performance data on Tesla/Fermi card? I saw from the webpage you >> only have Tesla card? > > We don't have good hardware for running benchmarks. The performance is > better than just running on the CPU, that is about all I can say. > >> >> 5. Is there a roadmap, a plan, a timeline..., regarding PETSc and nVidia's >> collaboration for a final fully GPGPU compatible PETSc? > > What do you mean by final fully GPGPU compatible PETSc? Now some things > (vector operations, matrix vector products, Krylov solvers) are fully done on > the GPUs. Others are automatically done on the CPU. I imagine it will always > be this way, I double ever that EVERYTHING will be done on the GPU but that > is ok so long as most things are done on the GPU. If you run with > -log_summary it will tell how how many copy to GPUs and copy from GPUs are > done in the run and how much time they take. Obviously one wants that number > as low as possible. > > Barry > >> >> >> Thanks a lot for your time! >> Chun >> >> >> This email and any attachments are intended solely for the use of the >> individual or entity to whom it is addressed and may be confidential and/or >> privileged. If you are not one of the named recipients or have received >> this email in error, (i) you should not read, disclose, or copy it, (ii) >> please notify sender of your receipt by reply email and delete this email >> and all attachments, (iii) Dassault Systemes does not accept or assume any >> liability or responsibility for any use of or reliance on this email.For >> other languages, go to http://www.3ds.com/terms/email-disclaimer. > > > > This email and any attachments are intended solely for the use of the > individual or entity to whom it is addressed and may be confidential and/or > privileged. If you are not one of the named recipients or have received this > email in error, (i) you should not read, disclose, or copy it, (ii) please > notify sender of your receipt by reply email and delete this email and all > attachments, (iii) Dassault Systemes does not accept or assume any liability > or responsibility for any use of or reliance on this email.For other > languages, go to http://www.3ds.com/terms/email-disclaimer.
