On Sat, Jan 9, 2021 at 7:37 PM Jacob Faibussowitsch <[email protected]> wrote:
> It is a single object that holds a pointer to every stream implementation > and toggleable type so it can be universally passed around. Currently has a > cudaStream and a hipStream but this is easily extendable to any other > stream implementation. > Do you have any thoughts on how this would work with Kokkos? Would you want to feed Kokkos your Cuda/Hip, etc, stream or add a Kokkos backend to your object? Junchao might be the person to ask. I would guess Kokkos View (vector) objects carry a stream because they block on a "deep_copy", that moves data to/from the GPU, and it is blocking. Thanks, Mark > Best regards, > > Jacob Faibussowitsch > (Jacob Fai - booss - oh - vitch) > Cell: +1 (312) 694-3391 > > On Jan 9, 2021, at 18:19, Mark Adams <[email protected]> wrote: > > > Is this stream object going to have Cuda, Kokkos, etc., implementations? > > On Sat, Jan 9, 2021 at 4:09 PM Jacob Faibussowitsch <[email protected]> > wrote: > >> I’m currently working on an implementation of a general PetscStream >> object. Currently it only supports Vector ops and has a proof of concept >> KSPCG, but should be extensible to other objects when finished. Junchao is >> also indirectly working on pipeline support in his NVSHMEM MR. Take a look >> at either MR, it would be very useful to get your input, as tailoring >> either of these approaches for pipelined algorithms is key. >> >> Best regards, >> >> Jacob Faibussowitsch >> (Jacob Fai - booss - oh - vitch) >> Cell: (312) 694-3391 >> >> On Jan 9, 2021, at 15:01, Mark Adams <[email protected]> wrote: >> >> I would like to put a non-overlapping ASM solve on the GPU. It's not >> clear that we have a model for this. >> >> PCApply_ASM currently pipelines the scater with the subdomain solves. I >> think we would want to change this and do a 1) scatter begin loop, 2) >> scatter end and non-blocking solve loop, 3) solve-wait and scatter >> begging loop and 4) scatter end loop. >> >> I'm not sure how to go about doing this. >> * Should we make a new PCApply_ASM_PARALLEL or dump this pipelining >> algorithm and rewrite PCApply_ASM? >> * Add a solver-wait method to KSP? >> >> Thoughts? >> >> Mark >> >> >>
