Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-07-10 Thread Mark Adams via petsc-dev
> > > 3) Is comparison between pointers appropriate? For example if (dptr != > zarray) { is scary if some arrays are zero length how do we know what the > pointer value will be? > > Yes, you need to consider these cases, which is kind of error prone. Also, I think merging transpose,and not,is a go

Re: [petsc-dev] Slowness of PetscSortIntWithArrayPair in MatAssembly

2019-07-10 Thread Zhang, Junchao via petsc-dev
Fande, I ran your code with two processes and found the poor performance of PetscSortIntWithArrayPair() was due to duplicates. In particular, rank 0 has array length = 0 and rank 1 has array length = 4,180,070. On rank 1, each unique array value has ~95 duplicates; The duplicates are already

Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-07-10 Thread Smith, Barry F. via petsc-dev
My concern is 1) is it actually optimally efficient for all cases? This kind of stuff, IMHO if (yy) { if (dptr != zarray) { ierr = VecCopy_SeqCUDA(yy,zz);CHKERRQ(ierr); } else if (zz != yy) { ierr = VecAXPY_SeqCUDA(zz,1.0,yy);CHKERRQ(ierr); } } else i

Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-07-10 Thread Mark Adams via petsc-dev
Yea, I agree. Once this is working, I'll go back and split MatMultAdd, etc. On Wed, Jul 10, 2019 at 11:16 AM Smith, Barry F. wrote: > >In the long run I would like to see smaller specialized chunks of code > (with a bit of duplication between them) instead of highly overloaded > routines lik

Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-07-10 Thread Smith, Barry F. via petsc-dev
In the long run I would like to see smaller specialized chunks of code (with a bit of duplication between them) instead of highly overloaded routines like MatMultAdd_AIJCUSPARSE. Better 3 routines, for multiple alone, for multiple add alone and for multiple add with sparse format. Trying to

Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-07-10 Thread Mark Adams via petsc-dev
Thanks, you made several changes here, including switches with the workvector size. I guess I should import this logic to the transpose method(s), except for the yy==NULL branches ... MatMult_ calls MatMultAdd with yy=0, but the transpose version have their own code. MatMultTranspose_SeqAIJCUSPARS

Re: [petsc-dev] [petsc-maint] running CUDA on SUMMIT

2019-07-10 Thread Mark Adams via petsc-dev
On Wed, Jul 10, 2019 at 1:13 AM Smith, Barry F. wrote: > > ierr = VecGetLocalSize(xx,&nt);CHKERRQ(ierr); > if (nt != A->rmap->n) > SETERRQ2(PETSC_COMM_SELF,PETSC_ERR_ARG_SIZ,"Incompatible partition of A > (%D) and xx (%D)",A->rmap->n,nt); > ierr = VecScatterInitializeForGPU(a->Mvctx,xx);CHK