Folks, At SC'17 Open MPI BoF, we presented slice 74 about cross-version mpirun interoperability (i attached a screenshot for your convenience).
The topic is documented on the wiki at https://github.com/open-mpi/ompi/wiki/Container-Versioning. If I oversimplify, we have two use-cases to consider 1. "Singularity-like" : mpirun and orted have the same version A, the MPI app running inside the container use libmpi (and its dependencies) version B 2. "Docker-like" : mpirun has version A, both orted and libmpi (and their common dependencies) inside the container have version B My understanding is we plan to support both use cases from Open MPI v3. I ran a few naive tests for the "Docker-like" scenario and ran into several issues. (fwiw, i do not use docker but i "creatively" use orte_launch_agent and orte_fork_agent in order to mix Open MPI versions) : - Open MPI v3 and v2 oob/tcp protocol are different, this is likely a no brainer, but we might be fine not supporting that scenario involving two different major releases. - oob/tcp expects all daemons (e.g. mpirun and orted) have the same version (major+minor+release+greek), so that prevents any kind of interoperability (and that can be easily fixed) - mpirun v3.1 pass "--tree-spawn" to orted, and that option is not supported by v3.0 (mpirun --mca plm_rsh_no_tree_spawn can be used as a work around) - plm/{base, rsh} protocol(s) differ between v3.0 and v3.1, so some extra code has to be written (or everything has to be back ported to the v3.0 branch) if we want interoperability between these two minor series. As you can see, there is yet nothing in Open MPI to support any kind of interoperability between mpirun and orted, even if they both have the same major version 3. Could you please discuss this topic during this weekly call ? - what kind of interoperability (e.g. singularity and/or docker) do we want to support, and when ? - Is this a blocker for v3.1.0 ? - master has major version 4, should it interoperate with the v3 release branch(es) ? - what (e.g. matrix compatibility) and how (e.g. MTT and/or CI) should this be tested ? For the record, Ralph and I started discussing this when reviewing https://github.com/open-mpi/ompi/pull/4748. The exchange was long and imho inconclusive, and we should probably start from the scratch here in the mailing list, or create a new GitHub issue. Cheers, Gilles
_______________________________________________ devel mailing list devel@lists.open-mpi.org https://lists.open-mpi.org/mailman/listinfo/devel