ISCUDA isn't even right (perhaps ISGENERALCUDA, ISBLOCKCUDA). I agree that this isn't a priority, but I could see it being needed in the next few years to avoid bottlenecks in adaptive mesh refinement or other adaptive algorithms. It's not a small amount of work, but I think all the index coordination can be done efficiently on a GPU.
Junchao Zhang <[email protected]> writes: > Even ISCUDA is simple to add, the PetscSFSetUp algorithm and many functions > involved are done on host (and are not simple to be parallelized on GPU) > The indices passed to VecScatter are analyzed and re-grouped. Even they are > copied to device eventually, they are likely not in their original form. > So, copying the indices from device to host and build a VecScatter there > seems the easiest approach. > > The Kokkos-related functions are experimental. We need to decide whether > they are good or not. > > --Junchao Zhang > > > On Fri, Feb 19, 2021 at 4:32 AM Patrick Sanan <[email protected]> > wrote: > >> Thanks! That helps a lot. >> >> I assume "no," but is ISCUDA simple to add? >> >> More on what I'm trying to do, in case I'm missing an obvious approach: >> >> I'm working on a demo code that uses an external library, based on Kokkos, >> as a solver - I create a Vec of type KOKKOS and populate it with the >> solution data from the library, by getting access to the raw Kokkos view >> with VecKokkosGetDeviceView() * . >> >> I then want to reorder that solution data into PETSc-native ordering (for >> a velocity-pressure DMStag), so I create a pair of ISs and a VecScatter to >> do that. >> >> The issue is that to create this scatter, I need to use information >> (essentially, an element-to-index map) from the external library's >> mesh-management object, which lives on the device. This doesn't work (when >> host != device), because of course the ISs live on the host and to create >> them I need to provide host arrays of indices. >> >> Am I stuck, for now, with sending the index information information from >> the device to the host, using it to create the IS, and then having >> essentially the same information go back to the device when I use the >> scatter? >> >> * As an aside, it looks like some of these Kokkos-related functions and >> types are missing man pages - if you have time to add them, even as stubs, >> that'd be great (if not let me know and I'll just try to formally do it, so >> that at least the existence of the functions in the API is reflected on the >> website). >> >> Am 18.02.2021 um 23:17 schrieb Junchao Zhang <[email protected]>: >> >> >> On Thu, Feb 18, 2021 at 4:04 PM Fande Kong <[email protected]> wrote: >> >>> >>> >>> On Thu, Feb 18, 2021 at 1:55 PM Junchao Zhang <[email protected]> >>> wrote: >>> >>>> VecScatter (i.e., SF, the two are the same thing) setup (building >>>> various index lists, rank lists) is done on the CPU. is1, is2 must be host >>>> data. >>>> >>> >>> Just out of curiosity, is1 and is2 can not be created on a GPU device in >>> the first place? That being said, it is technically impossible? Or we just >>> did not implement them yet? >>> >> Simply because we do not have an ISCUDA class. >> >> >>> >>> Fande, >>> >>> >>>> When the SF is used to communicate device data, indices are copied to >>>> the device.. >>>> >>>> --Junchao Zhang >>>> >>>> >>>> On Thu, Feb 18, 2021 at 11:50 AM Patrick Sanan <[email protected]> >>>> wrote: >>>> >>>>> I'm trying to understand how VecScatters work with GPU-native Kokkos >>>>> Vecs. >>>>> >>>>> Specifically, I'm interested in what will happen in code like in >>>>> src/vec/vec/tests/ex22.c, >>>>> >>>>> ierr = VecScatterCreate(x,is1,y,is2,&ctx);CHKERRQ(ierr); >>>>> >>>>> (from >>>>> https://gitlab.com/petsc/petsc/-/blob/master/src/vec/vec/tests/ex22.c#L44 >>>>> ) >>>>> >>>>> Here, x and y can be set to type KOKKOS using -vec_type kokkos at the >>>>> command line. But is1 and is2 are (I think), always >>>>> CPU/host data. Assuming that the scatter itself can happen on the GPU, >>>>> the indices must make it to the device somehow - are they copied there >>>>> when >>>>> the scatter is created? Is there a way to create the scatter using indices >>>>> already on the GPU (Maybe using SF more directly)? >>>>> >>>>> >>
