Re: [petsc-dev] model for parallel ASM

Jacob Faibussowitsch Mon, 11 Jan 2021 08:52:33 -0800

Hmm I suppose this means Kokkos should accept a stream like we expect it to? 
According to this somewhat recent merged PR: 
https://github.com/kokkos/kokkos/pull/1919 
<https://github.com/kokkos/kokkos/pull/1919> you can now make a "Kokkos::Cuda” 
object, and pass it as arg1 to range policies as an execution space. Here’s 
what I found on it (the cuda specific one is useless):


https://github.com/kokkos/kokkos/wiki/ExecutionSpaceConcept 
<https://github.com/kokkos/kokkos/wiki/ExecutionSpaceConcept>
https://github.com/kokkos/kokkos/wiki/Kokkos%3A%3AExecutionSpaceConcept 
<https://github.com/kokkos/kokkos/wiki/Kokkos::ExecutionSpaceConcept>
https://github.com/kokkos/kokkos/wiki/Kokkos%3A%3ACuda 
<https://github.com/kokkos/kokkos/wiki/Kokkos::Cuda> <—— cuda specific

Best regards,

Jacob Faibussowitsch
(Jacob Fai - booss - oh - vitch)
Cell: (312) 694-3391

> On Jan 11, 2021, at 10:35, Mark Adams <[email protected]> wrote:
> 
> Jacob, I'm not sure I understand this response. I could not find you on the 
> Kokkos slack channel.
> 
> Me: And My colleague in PETSc, Jacob Faibussowitsch, has talked to you about 
> Kokkos taking a Cuda, Hip, etc., stream. This is something that would make it 
> easier to deal with asynchronous GPU solvers in PETSc. We just wanted to 
> check on this.
> 
> Trott: Kokkos itself can do it for practically every operation
> 
> Maybe you want to talk with him at some point, but we can worry about getting 
> Cuda to work for now.
> 
> On Sun, Jan 10, 2021 at 2:28 PM Jacob Faibussowitsch <[email protected] 
> <mailto:[email protected]>> wrote:
> I would like as much as possible to pass the cuda and hip streams to Kokkos, 
> since I can directly handle much of the annoyance with wrangling multiple 
> streams and stream objects externally. Last I checked on this Kokkos was 
> moving towards allowing association of streams to functions, but admittedly 
> this was a while back.
> 
> Best regards,
> 
> Jacob Faibussowitsch
> (Jacob Fai - booss - oh - vitch)
> Cell: (312) 694-3391
> 
>> On Jan 10, 2021, at 13:10, Mark Adams <[email protected] 
>> <mailto:[email protected]>> wrote:
>> 
>> 
>> 
>> On Sat, Jan 9, 2021 at 7:37 PM Jacob Faibussowitsch <[email protected] 
>> <mailto:[email protected]>> wrote:
>> It is a single object that holds a pointer to every stream implementation 
>> and toggleable type so it can be universally passed around. Currently has a 
>> cudaStream and a hipStream but this is easily extendable to any other stream 
>> implementation.  
>> 
>> Do you have any thoughts on how this would work with Kokkos?
>> 
>> Would you want to feed Kokkos your Cuda/Hip, etc, stream or add a Kokkos 
>> backend to your object? 
>> 
>> Junchao might be the person to ask. I would guess Kokkos View (vector) 
>> objects carry a stream because they block on a "deep_copy", that moves data 
>> to/from the GPU, and it is blocking.
>> 
>> Thanks,
>> Mark
>> 
>> 
>> Best regards,
>> 
>> Jacob Faibussowitsch
>> (Jacob Fai - booss - oh - vitch)
>> Cell: +1 (312) 694-3391
>> 
>>> On Jan 9, 2021, at 18:19, Mark Adams <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> 
>>> 
>>> Is this stream object going to have Cuda, Kokkos, etc., implementations?
>>> 
>>> On Sat, Jan 9, 2021 at 4:09 PM Jacob Faibussowitsch <[email protected] 
>>> <mailto:[email protected]>> wrote:
>>> I’m currently working on an implementation of a general PetscStream object. 
>>> Currently it only supports Vector ops and has a proof of concept KSPCG, but 
>>> should be extensible to other objects when finished. Junchao is also 
>>> indirectly working on pipeline support in his NVSHMEM MR. Take a look at 
>>> either MR, it would be very useful to get your input, as tailoring either 
>>> of these approaches for pipelined algorithms is key.
>>> 
>>> Best regards,
>>> 
>>> Jacob Faibussowitsch
>>> (Jacob Fai - booss - oh - vitch)
>>> Cell: (312) 694-3391
>>> 
>>>> On Jan 9, 2021, at 15:01, Mark Adams <[email protected] 
>>>> <mailto:[email protected]>> wrote:
>>>> 
>>>> I would like to put a non-overlapping ASM solve on the GPU. It's not clear 
>>>> that we have a model for this. 
>>>> 
>>>> PCApply_ASM currently pipelines the scater with the subdomain solves. I 
>>>> think we would want to change this and do a 1) scatter begin loop, 2) 
>>>> scatter end and non-blocking solve loop, 3) solve-wait and scatter begging 
>>>> loop and 4) scatter end loop.
>>>> 
>>>> I'm not sure how to go about doing this.
>>>>  * Should we make a new PCApply_ASM_PARALLEL or dump this pipelining 
>>>> algorithm and rewrite PCApply_ASM?
>>>>  * Add a solver-wait method to KSP?
>>>> 
>>>> Thoughts?
>>>> 
>>>> Mark
>>> 
>

Re: [petsc-dev] model for parallel ASM

Reply via email to