Here's a paper from a few years ago that uses NCCL to give a better mpi_bcast:

https://arxiv.org/pdf/1707.09414.pdf

But what's interesting is that they have this statement:

In general, NCCL integration with MPI runtimes might lead to very complicated designs. Thus, the proposed work is a step towards achieving similar or better performance without utilizing NCCL.

Scott

On 6/16/20 9:19 PM, Karl Rupp wrote:
From a practical standpoint it seems to me that NCCL is an offering to a community that isn't used to MPI. It's categorized as 'Deep Learning Software' on the NVIDIA page ;-)

The section 'NCCL and MPI' has some interesting bits:
  https://docs.nvidia.com/deeplearning/nccl/user-guide/docs/mpi.html

At the bottom of the page there is
 "Using NCCL to perform inter-GPU communication concurrently with CUDA-aware MPI may create deadlocks. (...) Using both MPI and NCCL to perform transfers between the same sets of CUDA devices concurrently is therefore not guaranteed to be safe."

While I'm impressed that NVIDIA even 'reinvents' MPI for their GPUs to serve the deep learning community, I don't think NCCL provides enough beyond MPI for PETSc.

Best regards,
Karli





On 6/17/20 4:13 AM, Junchao Zhang wrote:
It should be renamed as NCL (NVIDIA Communications Library) as it adds point-to-point, in addition to collectives. I am not sure whether to implement it in petsc as none exscale machine uses nvidia GPUs.

--Junchao Zhang


On Tue, Jun 16, 2020 at 6:44 PM Matthew Knepley <knep...@gmail.com <mailto:knep...@gmail.com>> wrote:

    It would seem to make more sense to just reverse-engineering this as
    another MPI impl.

        Matt

    On Tue, Jun 16, 2020 at 6:22 PM Barry Smith <bsm...@petsc.dev
    <mailto:bsm...@petsc.dev>> wrote:




    --     What most experimenters take for granted before they begin their
    experiments is infinitely more interesting than any results to which
    their experiments lead.
    -- Norbert Wiener

    https://www.cse.buffalo.edu/~knepley/
    <http://www.cse.buffalo.edu/~knepley/>


--
Tech-X Corporation               kru...@txcorp.com
5621 Arapahoe Ave, Suite A       Phone: (720) 974-1841
Boulder, CO 80303                Fax:   (303) 448-7756

Reply via email to