Dear Singularity & OpenMPI teams, Greg and Ralph,

going back to the Ralph Castain response to this thread:

https://groups.google.com/a/lbl.gov/forum/#!topic/singularity/lQ6sWCWhIWY

In order to get portability of Singularity images containing OpenMPI
distributed applications, he suggested mix some OpenMPI versions with some
external PMIX to check about the interoperability across versions while
using of the Singularity MPI hybrid approach (see his response in the
thread).

I did some experiments and I would like to share with you my results and to
discuss about the conclusions.

First of all, I'm going to describe the environment (some scripts are
attached).

   - I performed this test at CESGA FinisTerrae II cluster (
   https://www.cesga.es/en/infraestructuras/computacion/FinisTerrae2
   <https://www.cesga.es/en/infraestructuras/computacion/FinisTerrae2>).
   - The compiler used is GCC/6.3.0 and I had to compile some external
   dependencies to be linked from PMIX or OpenMPI:
      - hwloc/1.11.5
      - libevent/2.0.22
   - PMIX versions used in this experiments:
      - 1.2.1
      - 1.2.2
      - 2.0.0
      - I configure PMIX with the following options:
      - ./configure --with-hwloc= --with-munge-libdir=
      --with-platform=optimized --with-libevent=
   - OpenMPI versions used in this experiments:
      - 2.0.X
      - 2.1.1
      - 3.0.0_rcX
      - I configure OpenMPI with the following options:
      - ./configure --with-hwloc= --enable-shared --with-slurm
      --enable-mpi-thread-multiple --with-verbs-libdir=
      --enable-mpirun-prefix-by-default --disable-dlopen --with-pmix=
      --with-libevent= --with-knem
      - Version 2.1.1 was also compiled with flag --disable-pmix-dstore
   - I used the well known "Ring" OpenMPI application.
   - I used MPIRUN as process manager

What I expected from previous Ralph response was full cross-version
compatibility using any OpenMPI >= 2.0.0  linked against PMIX 1.2.X both,
inside the container and at the host.
In general, my results were not as good as expected, but promising.

   - The worst thing is, my results show that OpenMPI 2.X versions needs
   exactly the same version of OpenMPI inside & outside the container, but I
   can mix PMIx 1.2.1 and 1.2.2
   - The better thing, if OpenMPI 3.0.0_rc3 version is present inside or
   outside the container,  seems to work mixing any other OpenMPI >= 2.X
   version and also mixing PMIx 1.2.1 and 1.2.2. Some notes* to this result:
   - OpenMPI 2.0.0 with PMIx 1.2.2 (In&Out the container) never worked.
      - After getting the expected output from "Ring" app, I randomly get
      SEGFAULT if OpenMPI 3.0.0.rcX is involved.
      - As Ralph said, PMIx 1.2X and 2.0.X are not interoperable.
   - I was not able to compile OpenMPI 2.1.0 with external PMIx

I can conclude that PMIx 1.2.1 and 1.2.2 are interoperable, but only
OpenMPI 3.0.0_rc3 can work*, in general, with other versions of OpenMPI
(>2).

Going back again to Ralph Castain mail to this thread, I would expect full
support for interoperability with different PMIx versions (>1.2) through
PMIx > 2.1 (not yet released)

Some questions about this experiments and conclusions are:

   - What do you think about this results?  Do you have any suggestion? I'm
   missing something?
   - are these results aligned with your expectations?
   - I know that PMIx 2.1 is being developed but, any version is already
   available to check? How can I get it?
   - The SEGFAULT I get with  OpenMPI 3.0.0.rcX is something already
   tracked?

Hope to helpful!

BR,

Víctor



2017-07-14 1:34 GMT+02:00 Gregory M. Kurtzer <gmkurt...@gmail.com>:

> Hi Victor,
>
> The are of ABI compatibility I am referring to is with the container's
> underlying library stack. Meaning that if you link in the libraries
> compiled on the host, and the container you want to run is newer then what
> is installed on the host, (or potentially vise-versa), you may end up with
> a conflict between the binary and library.
>
> This is what Nvidia has mitigated by building their library on a very
> recent toolchain, thus the libraries are backwards compatible with older
> binaries.
>
> Does that make sense?
>
> Greg
>
>
>
> On Wed, Jul 12, 2017 at 4:50 AM, victor sv <victo...@gmail.com> wrote:
>
>> Hi Greg and Ralph,
>>
>> yes Greg, I agree with you that the mentioned strategy could be dangerous
>> and goes against the principals of containment.
>>
>> sorry for the basic question ... but what do you mean with ABI compatible
>> containers? which components of the container environment are involved with
>> this ABI compatibility?
>>
>> If we talk about libc or the kernel itself, as you say in your web page,
>> "If you require kernel dependent features, a container platform is probably
>> not the right solution for you."
>>
>> If we focus on OpenMPI ABI compatibility, I figure out that the variables
>> involved in this compatibility could be (1) the compiler (vendor) and (2)
>> the OpenMPI library itself.
>>
>> I'm right or I'm missing any other variables?
>>
>> An interesting project called ABI-tracker has performed an OpenMPI ABI
>> compatibility study that you can watch in the following link:
>>
>> https://abi-laboratory.pro/tracker/timeline/openmpi/
>>
>> I think that, at least for OpenMPI 2.X, altough been a dangerous
>> approach, the ABI compatibility seems reasonable.
>>
>> What do you think?
>>
>> BR,
>> Víctor.
>>
>> 2017-07-11 16:55 GMT+02:00 Gregory M. Kurtzer <gmkurt...@gmail.com>:
>>
>>> Hi Victor,
>>>
>>> I will let Ralph comment on the OMPI versions and compatibilities, but
>>> regarding using the MPI host libraries within a container is dangerous for
>>> the reason that you are mentioning. If you are running ABI compatible
>>> containers with the host, then things *might* work as expected. But this
>>> breaks container portability, and goes against the principals of
>>> containment.
>>>
>>> We do however do exactly this for the Nvidia driver libraries, but...
>>> Nvidia builds these libraries with careful attention on ABI compatibility
>>> such that these binary libraries are indeed reasonably portable across
>>> containers.
>>>
>>> The only way to do this portably is with using a launcher on the host,
>>> outside the container, to spin up the container and launch the MPI within.
>>> PMIx is a fantastic approach to solving this.
>>>
>>> Hope that helps!
>>>
>>> Greg
>>>
>>>
>>>
>>> On Tue, Jul 11, 2017 at 6:03 AM, victor sv <victo...@gmail.com> wrote:
>>>
>>>> Hi Greg and Ralph,
>>>>
>>>> Thank you for your precise and elaborated answers.
>>>>
>>>> Only for confirmation and to sum up some conclussions (if I understood
>>>> correctly):
>>>>
>>>>  - OpenMPI process management compatibility depends on PMIx.
>>>>  - OpenMPI (and also Slurm) complete  backward/forward compatibility
>>>> will come (hopefully) in the future by means of PMIx 2.1.
>>>>  - Nowadays, there exists compatibility with OpenMPI 2.X if we compile
>>>> it with default PMIx (1.X) support.
>>>>  - OpenMPI 2.1 must be compiled with --disable-pmix-dstore due to a
>>>> compatibility break.
>>>>  - OpenMPI 1.X does not suppot PMIx and we can ignore it from this
>>>> thread.
>>>>
>>>> I'm right?
>>>>
>>>> I'm interested in performing the tests you purpose. I will try to build
>>>> all three OMPI versions (2.0, 2.1 and 3.0) against the same PMIx external
>>>> library to check the compatibility. Which PMIx version (1.2.0, 1.2.1 or
>>>> 1.2.2 ) do you recommend as a start point?
>>>>
>>>> I will report this results ASAP to this thread.
>>>>
>>>> On the other hand, although we are planning to add support to PMIx,
>>>> unfortunately, our Slurm version (14.11.10-Bull.1.0) does not support it
>>>> yet.
>>>>
>>>> The second strategy we are testing to get compatibility between OpenMPI
>>>> inside and outside a Singularity container relies on replacing the OpenMPI
>>>> libraries inside the container by the host libraries hierarchy.
>>>>
>>>> This approach rest upon the assumption that OpenMPI symbols and data
>>>> structures are compatible through several versions of OpenMPI. At least
>>>> combining several releases that share the same major version.
>>>>
>>>> Although the empirical tests of this approach seem to work properly
>>>> with some tests, benchmarks and real apps, I'm afraid of getting
>>>> unexepected errors/warnings (segfaults, data errors, etc.) in the future.
>>>>
>>>> What do you think about this approach?
>>>>
>>>> Can you confirm that OpenMPI is compatible in this way?
>>>>
>>>> Finally, I think this thread could be very interesting for other users
>>>> too and I would like to keep it alive with your help.
>>>>
>>>> Thank you again for your support!
>>>>
>>>> BR,
>>>> Víctor
>>>>
>>>> 2017-07-09 23:45 GMT+02:00 Gregory M. Kurtzer <gmkurt...@gmail.com>:
>>>>
>>>>> Hiya Victor, et al.,
>>>>>
>>>>> I didn't realize this but Ralph had to drop off of the Singularity
>>>>> list. Hopefully we will get him back again, as he is a fantastic resource
>>>>> for all of the OMPI questions and always a great source of information and
>>>>> ideas (poke, poke Ralph!). Ralph did send me this in response to the
>>>>> previous email hoping it helps to explain things:
>>>>>
>>>>>
>>>>> On Sun, Jul 9, 2017 at 2:22 PM, r...@open-mpi.org <r...@open-mpi.org>
>>>>>  wrote:
>>>>>
>>>>>> ...
>>>>> You are welcome to forward the following to the list:
>>>>>
>>>>> As Greg said, we have been concerned about this since we started
>>>>> looking at Singularity support. Just for clarity, the version of PMI OMPI
>>>>> uses is PMIx (https://pmix.github.io/pmix/). While our plan from the
>>>>> beginning was to support cross-versions specifically to address this
>>>>> problem, we fell behind on its implementation due to priorities. We just
>>>>> committed the code to the PMIx repo in the last week, and it won’t be
>>>>> released into production for a few months while we shake it down.
>>>>>
>>>>> I fear it will be impossible to get the OMPI 1.10 series to work with
>>>>> anything other than itself as it pre-dates PMIx.
>>>>>
>>>>> The OMPI 2.0 and 2.1 series should work across each other as they both
>>>>> include PMIx 1.x. However, you probably will need to configure the 2.1
>>>>> series with --disable-pmix-dstore as there was an unintended compatibility
>>>>> break there (the shared memory store was added during the PMIx 1.x series
>>>>> and we didn’t catch the compatibility break it introduced).
>>>>>
>>>>> Looking into the future, OMPI 3.0 is about to be released. It includes
>>>>> PMIx 2.0, which isn’t backwards compatible at this time, and so it won’t
>>>>> cross-version with OMPI 2.x “out-of-the-box”. We haven’t tested this, but
>>>>> one thing you could try is to build all three OMPI versions against the
>>>>> same PMIx external library (you would probably have to experiment a bit
>>>>> with PMIx versions to see which works across the different OMPI versions 
>>>>> as
>>>>> the glue between the two also changed a bit). This will ensure that the
>>>>> shared memory store in PMIx is compatible across the versions, and things
>>>>> should work since OMPI doesn’t care how the data is moved across the
>>>>> host-container boundary.
>>>>>
>>>>> As I said, we will be adding cross-version support to the PMIx release
>>>>> series soon, without changing the API, that will ensure support across all
>>>>> PMIx versions starting with v1.2. Thus, you could (once that happens) 
>>>>> build
>>>>> OMPI 2.0, 2.1, and 3.0 against the new PMIx release (probably PMIx v2.1.0)
>>>>> and the resulting containers would be future-proof as OMPI moves ahead. 
>>>>> The
>>>>> RMs plan to follow that path as well, so you should be in good shape once
>>>>> this is done if you prefer to “direct launch” your containers (e.g., “srun
>>>>> ./mycontainer” under SLURM).
>>>>>
>>>>> Sorry if that is all confusing - we sometimes get lost in the
>>>>> numbering schemes between OMPI and PMIx ourselves. Feel free to contact me
>>>>> directly, or on the OMPI or PMIx mailing lists, if you have more questions
>>>>> or encounter problems. We definitely want to make this work.
>>>>>
>>>>> Ralph
>>>>>
>>>>> On Sun, Jul 9, 2017 at 12:19 PM, Gregory M. Kurtzer <
>>>>> gmkurt...@gmail.com> wrote:
>>>>>
>>>>>> Hi Victor,
>>>>>>
>>>>>> Sorry for the latency, I'm on email overload.
>>>>>>
>>>>>> Open MPI uses PMI to communicate both inside and outside of the
>>>>>> container. Ralph Castain (on this list, but possibly not monitoring
>>>>>> actively) is leading the PMI effort and he is an active Open MPI 
>>>>>> developer.
>>>>>> We have had several talks about how to achieve "hetero-versionistic"
>>>>>> compatibility through the PMI handshake. I was under the impression that
>>>>>> PMI now supports that, as long as you are running equal or newer version 
>>>>>> on
>>>>>> the host (outside the container). Also, I don't know what version of PMI
>>>>>> this feature was introduced in, nor do I know what version of Open MPI
>>>>>> includes that compatibility.
>>>>>>
>>>>>> I have CC'ed Ralph, and hopefully he will be able to offer some
>>>>>> suggestions.
>>>>>>
>>>>>> Regarding your question about supporting the MPI libraries in the
>>>>>> same manner that we are doing the Nvidia libraries, that would be hard.
>>>>>> Nvidia specifically builds their libraries to be as generally compatible 
>>>>>> as
>>>>>> possible (e.g. the same libraries/binaries work on a large array of Linux
>>>>>> distributions). Most people do not build host libraries in a manner that
>>>>>> would be generally compatible as Nvidia does.
>>>>>>
>>>>>> Hope that helps!
>>>>>>
>>>>>> Greg
>>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Jul 3, 2017 at 2:07 AM, victor sv <victo...@gmail.com> wrote:
>>>>>>
>>>>>>> Dear Singularity team,
>>>>>>>
>>>>>>> first of all, thanks for the great work with Singularity. It looks
>>>>>>> amazing!
>>>>>>>
>>>>>>> Sorry if this topic is duplicated and for the length of the email,
>>>>>>> but I want to share my experience about Singularity and OpenMPI
>>>>>>> compatibility, and also ask some questions.
>>>>>>>
>>>>>>> I've being reading a lot about OpenMPI and Singularity compatibility
>>>>>>> because we are trying to find the generic way to run OpenMPI 
>>>>>>> applications
>>>>>>> within Singularity containers. It was not so clear (for me) in the
>>>>>>> documentation, forums and mailing lists, and this is why we've 
>>>>>>> performed an
>>>>>>> OpenMPI empiric compatibility study.
>>>>>>>
>>>>>>> We ran these comparisons in CESGA FinisTerrae II cluster (
>>>>>>> https://www.cesga.es/en/infraestructuras/computacion/FinisTerrae2).
>>>>>>>
>>>>>>> We used several versions of OpenMPI. The chosen versions of OpenMPI
>>>>>>> were the versions already installed in the cluster:
>>>>>>>
>>>>>>> - openmpi/1.10.2
>>>>>>> - openmpi/2.0.0
>>>>>>> - openmpi/2.0.1
>>>>>>> - openmpi/2.0.2
>>>>>>> - openmpi/2.1.1
>>>>>>>
>>>>>>> We have created Singularity images containing the same versions of
>>>>>>> OpenMPI and with the basic OpenMPI ring example. I share the bootstrap
>>>>>>> definition file template used below:
>>>>>>>
>>>>>>> ```
>>>>>>> BootStrap: docker
>>>>>>> From: ubuntu:16.04
>>>>>>> IncludeCmd: yes
>>>>>>>
>>>>>>> %post
>>>>>>>         sed -i 's/main/main restricted universe/g'
>>>>>>> /etc/apt/sources.list
>>>>>>>         apt-get update
>>>>>>>         apt-get install -y bash git wget build-essential gcc time
>>>>>>> libc6-dev libgcc-5-dev
>>>>>>>         apt-get install -y dapl2-utils libdapl-dev libdapl2
>>>>>>> libibverbs1 librdmacm1 libcxgb3-1 libipathverbs1 libmlx4-1 libmlx5-1
>>>>>>> libmthca1 libnes1 libpmi0 libpmi0-dev libslurm29 libslurm-dev
>>>>>>>
>>>>>>>         ##Install OpenMPI
>>>>>>>         cd /tmp
>>>>>>>         wget 'https://www.open-mpi.org/soft
>>>>>>> ware/ompi/vX.X/downloads/openmpi-X.X.X.tar.gz' -O
>>>>>>> openmpi-X.X.X.tar.gz
>>>>>>>         tar -xzf openmpi-X.X.X.tar.gz -C openmpi-X.X.X
>>>>>>>         mkdir -p /tmp/openmpi-X.X.X/build
>>>>>>>         cd /tmp/openmpi-X.X.X/build
>>>>>>>          ../configure --enable-shared --enable-mpi-thread-multiple
>>>>>>> --with-verbs --enable-mpirun-prefix-by-default --with-hwloc
>>>>>>> --disable-dlopen --with-pmi --prefix=/usr
>>>>>>>         make all install
>>>>>>>
>>>>>>>         # Install ring
>>>>>>>         cd /tmp
>>>>>>>         wget https://raw.githubusercontent.
>>>>>>> com/open-mpi/ompi/master/examples/ring_c.c
>>>>>>>         mpicc ring_c.c -o /usr/bin/ring
>>>>>>> ```
>>>>>>>
>>>>>>> Once the containers were created, we ran the ring app with mpirun
>>>>>>> using 2 cores of 2 different nodes mixing all possible combinations of
>>>>>>> those OpenMPI versions inside and outside the container.
>>>>>>>
>>>>>>> The obtained results shown that we need the same versions of OpenMPI
>>>>>>> inside and outside the container to succesfully run the contained
>>>>>>> application in parallel with mpirun.
>>>>>>>
>>>>>>> Is this the expected behaviour or am I missing something?
>>>>>>>
>>>>>>> Will be this the expected behaviour in the future (with future
>>>>>>> versions of OpenMPI)?
>>>>>>>
>>>>>>> Currently, we have slurm 14.11.10-Bull.1.0 installed as job
>>>>>>> scheduler at FinisTerrae II. We found the following tip/trick to use 
>>>>>>> srun
>>>>>>> as process manager:
>>>>>>>
>>>>>>> http://singularity.lbl.gov/tutorial-gpu-drivers-open-mpi-mtls
>>>>>>>
>>>>>>> In order to run whatever Singularity image containing OpenMPI
>>>>>>> applications using Slurm, we've adapted it to our infrastructure and
>>>>>>> checked the same test cases running them with srun. It seems that it's
>>>>>>> working properly (no real world applications were tested yet).
>>>>>>>
>>>>>>> What do you think about this strategy?
>>>>>>> Can you confirm that it provides portability of singularity images
>>>>>>> containing OpenMPI applications?
>>>>>>>
>>>>>>> I think this strategy is similar to the one you are following with
>>>>>>> "--nv" option  for NVidia drivers.
>>>>>>>
>>>>>>> Why not to do the same strategy with MPI, PMI, libibverbs, etc.?
>>>>>>>
>>>>>>> Thanks in advance and congrats again for your great work!
>>>>>>>
>>>>>>> Víctor.
>>>>>>>
>>>>>>> --
>>>>>>> You received this message because you are subscribed to the Google
>>>>>>> Groups "singularity" group.
>>>>>>> To unsubscribe from this group and stop receiving emails from it,
>>>>>>> send an email to singularity+unsubscr...@lbl.gov.
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> --
>>>>>> Gregory M. Kurtzer
>>>>>> CEO, SingularityWare, LLC.
>>>>>> Senior Architect, RStor
>>>>>> Computational Science Advisor, Lawrence Berkeley National Laboratory
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Gregory M. Kurtzer
>>>>> CEO, SingularityWare, LLC.
>>>>> Senior Architect, RStor
>>>>> Computational Science Advisor, Lawrence Berkeley National Laboratory
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "singularity" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to singularity+unsubscr...@lbl.gov.
>>>>>
>>>>
>>>> --
>>>> You received this message because you are subscribed to the Google
>>>> Groups "singularity" group.
>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>> an email to singularity+unsubscr...@lbl.gov.
>>>>
>>>
>>>
>>>
>>> --
>>> Gregory M. Kurtzer
>>> CEO, SingularityWare, LLC.
>>> Senior Architect, RStor
>>> Computational Science Advisor, Lawrence Berkeley National Laboratory
>>>
>>> --
>>> You received this message because you are subscribed to the Google
>>> Groups "singularity" group.
>>> To unsubscribe from this group and stop receiving emails from it, send
>>> an email to singularity+unsubscr...@lbl.gov.
>>>
>>
>> --
>> You received this message because you are subscribed to the Google Groups
>> "singularity" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to singularity+unsubscr...@lbl.gov.
>>
>
>
>
> --
> Gregory M. Kurtzer
> CEO, SingularityWare, LLC.
> Senior Architect, RStor
> Computational Science Advisor, Lawrence Berkeley National Laboratory
>
> --
> You received this message because you are subscribed to the Google Groups
> "singularity" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to singularity+unsubscr...@lbl.gov.
>

Attachment: container_bootstrap.def
Description: Binary data

Attachment: host_install.sh
Description: Bourne shell script

_______________________________________________
users mailing list
users@lists.open-mpi.org
https://lists.open-mpi.org/mailman/listinfo/users

Reply via email to