Hi Gilles,

Jeff’s PR - https://github.com/open-mpi/ompi/pull/10763 - is to address that 
issue. We are thinking about having the accelerator components silent by 
default.

Thanks,
William Zhang

From: devel <devel-boun...@lists.open-mpi.org> on behalf of Gilles Gouaillardet 
via devel <devel@lists.open-mpi.org>
Reply-To: Open MPI Developers <devel@lists.open-mpi.org>
Date: Friday, September 9, 2022 at 9:50 PM
To: Open MPI Developers <devel@lists.open-mpi.org>
Cc: Gilles Gouaillardet <gilles.gouaillar...@gmail.com>
Subject: RE: [EXTERNAL][OMPI devel] Proposed change to Cuda dependencies in 
Open MPI


CAUTION: This email originated from outside of the organization. Do not click 
links or open attachments unless you can confirm the sender and know the 
content is safe.


William,

What is the desired behavior if Open MPI built with CUDA is used on a system 
where CUDA is not available or cannot be used because of ABI compatibility 
issues?
 - issue a warning (could not open the DSO because of unsatisfied dependencies)?
 - silently ignore the CUDA related components?

I guess this should be configurable by yet an other MCA parameter, but that 
begs the question of what should be the default value for this parameter.


Cheers,

Gilles


On Sat, Sep 10, 2022 at 6:25 AM Zhang, William via devel 
<devel@lists.open-mpi.org<mailto:devel@lists.open-mpi.org>> wrote:
Hello interested parties,

As part of the work for the accelerator framework, the non standard behavior of 
the existing cuda code in Open MPI is being reworked. One of the proposed 
changes involves a change to the behavior of linking/compiling cuda components.

Currently, cuda functions are loaded dynamically using dlopen and stored in a 
function pointer table, with some code to search through typical paths to 
locate libcuda. This means that we can compile Open MPI 
–with-cuda=/path/to/cuda and the resulting build should work on both cuda and 
non cuda environments.

The change we are making involves removing the function pointer table and 
instead, having relevant components have a direct dependency on libcuda. This 
is in line with the rest of Open MPI’s MCA system where you can build 
components as dsos.

The difference here are: Open MPI will call libcuda functions directly and 
components that have a cuda dependency will be built as dso’s (ie. 
–with-cuda=/path/to/cuda/ –enable-mca-dso=accelerator-cuda). During linking, 
these dso’s may fail to load, such as on a non cuda environment, but this won’t 
prevent Open MPI from functioning. A related work - 
https://github.com/open-mpi/ompi/pull/10763 - to have an option to silence 
warnings that occur in this expected behavior path is also being worked on.

From a user behavior, nothing changes. From compilation, dependent components 
will need to be built as dso’s. From code, we can remove dlopen dependency for 
cuda builds, standardize the cuda code with the rest of Open MPI, and remove 
code involved with storing function pointers and detecting libcuda location.

Please provide feedback if you have any suggestions or are against these 
changes.

Thanks,
William Zhang

Reply via email to