Folks,

At SC'17 Open MPI BoF, we presented slice 74 about cross-version
mpirun interoperability (i attached a screenshot for your
convenience).

The topic is documented on the wiki at
https://github.com/open-mpi/ompi/wiki/Container-Versioning.

If I oversimplify, we have two use-cases to consider
1. "Singularity-like" : mpirun and orted have the same version A, the
MPI app running inside the container use libmpi (and its dependencies)
version B
2. "Docker-like" : mpirun has version A, both orted and libmpi (and
their common dependencies) inside the container have version B

My understanding is we plan to support both use cases from Open MPI v3.

I ran a few naive tests for the "Docker-like" scenario and ran into
several issues.
(fwiw, i do not use docker but i "creatively" use orte_launch_agent
and orte_fork_agent in order to mix Open MPI versions) :
 - Open MPI v3 and v2 oob/tcp protocol are different, this is likely a
no brainer, but we might be fine not supporting that scenario
involving two different major releases.
 - oob/tcp expects all daemons (e.g. mpirun and orted) have the same
version (major+minor+release+greek), so that prevents any kind of
interoperability (and that can be easily fixed)
 - mpirun v3.1 pass "--tree-spawn" to orted, and that option is not
supported by v3.0 (mpirun --mca plm_rsh_no_tree_spawn can be used as a
work around)
 - plm/{base, rsh} protocol(s) differ between v3.0 and v3.1, so some
extra code has to be written (or everything has to be back ported to
the v3.0 branch) if we want interoperability between these two minor
series.

As you can see, there is yet nothing in Open MPI to support any kind
of interoperability between mpirun and orted, even if they both have
the same major version 3.

Could you please discuss this topic during this weekly call ?
 - what kind of interoperability (e.g. singularity and/or docker) do
we want to support, and when ?
 - Is this a blocker for v3.1.0 ?
 - master has major version 4, should it interoperate with the v3
release branch(es) ?
 - what (e.g. matrix compatibility) and how (e.g. MTT and/or CI)
should this be tested ?


For the record, Ralph and I started discussing this when reviewing
https://github.com/open-mpi/ompi/pull/4748. The exchange was long and
imho inconclusive, and we should probably start from the scratch here
in the mailing list, or create a new GitHub issue.


Cheers,

Gilles
_______________________________________________
devel mailing list
devel@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/devel

Reply via email to