Note that operations that don't have communication (like VecAXPY and 
VecPointwiseMult) are already non-blocking on streams. (A recent Thrust update 
helped us recover what had silently become blocking in a previous release.) For 
multi-rank, operations like MatMult require communication and MPI doesn't have 
a way to make it nonblocking. We've had some issues/bugs with NVSHMEM to bypass 
MPI.

MPI implementors have been really skeptical of placing MPI operations on 
streams (like NCCL/RCCL or NVSHMEM). Cray's MPI doesn't have anything to do 
with streams, device memory is cachable on the host, and RDMA operations are 
initiated on the host without device logic being involved. I feel like it's 
going to take company investment or a very enterprising systems researcher to 
make the case for getting messaging to play well with streams. Perhaps it's a 
better use of time to focus on reducing latency of notifying the host when RDMA 
completes and reducing kernel launch time. In short, there are many unanswered 
questions regarding truly asynchronous Krylov solvers. But in the most obvious 
places for async, it works currently.

Jacob Faibussowitsch <jacob....@gmail.com> writes:

> New code can (and absolutely should) use it right away, PetscDeviceContext 
> has been fully functional since its merger. Remember though that it works on 
> a “principled parallelism” model; the caller is responsible for proper 
> serialization.
>
> Existing code? Not so much. In broad strokes the following sections need 
> support before parallelism can be achieved from user-code:
>
> 1. Vec     - WIP (feature complete, now in bug-fixing stage)
> 2. PetscSF - TODO
> 3. Mat     - TODO
> 4. KSP/PC  - TODO
>
> Seeing as each MR thus far for this has taken me roughly 3-4 months to merge, 
> and with the later sections requiring enormous rewrites and API changes I 
> don’t expect this to be finished for at least 2 years… Once the Vec MR is 
> merged you could theoretically run with -device_context_stream_type 
> default_blocking and achieve “asynchronous” compute but nothing would work 
> properly as every other part of petsc expects to be synchronous.
>
> That being said I would be happy to give a demo to people on how they can 
> integrate PetscDeviceContext into their code on the next developers meeting. 
> It would go a long way to cutting down the timeline.
>
>> On Feb 15, 2022, at 02:02, Stefano Zampini <stefano.zamp...@gmail.com> wrote:
>> 
>> Jacob
>> 
>> what is the current status of the async support in PETSc?
>> Can you summarize here? Is there any documentation available?
>> 
>> Thanks
>> -- 
>> Stefano

Reply via email to