On 10/30/2014 07:32 PM, Ralph Castain wrote:
Just for FYI: I believe Nathan misspoke.
The new capability is in 1.8.4, which I hope
to release next Friday (Nov 7th)

Hi Ralph

That is even better!
Look forward to OMPI 1.8.4.

I still would love to hear from Nathan / OMPI team
about my remaining questions below
(specially the 12 vader parameters).

Many thanks,
Gus Correa

On Oct 30, 2014, at 4:24 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

Hi Nathan

Thank you very much for addressing this problem.

I read your notes on Jeff's blog about vader,
and that clarified many things that were obscure to me
when I first started this thread
whining that knem was not working in OMPI 1.8.3.
Thank you also for writing that blog post,
and for sending the link to it.
That was very helpful indeed.

As your closing comments on the blog post point out,
and your IMB benchmark graphs of pingpong/latency &
sendrecv/bandwidth show,
vader+xpmem outperforms the other combinations
of btl+memory_copy_mechanism of intra-node communication.

For the benefit of pedestrian OpenMPI users like me:

1) What is the status of xpmem in the Linux world at this point?
[Proprietary (SGI?) / open source, part of the Linux kernel (which),
part of standard distributions (which) ?]

2) Any recommendation for the values of the
various vader btl parameters?
[There are 12 of them in OMPI 1.8.3!
That is real challenge to get right.]

Which values did you use in your benchmarks?

In particular, is there an optimal value for the eager/rendevous threshold 
value? (btl_vader_eager_limit, default=4kB)
[The INRIA web site suggests 32kB for the sm+knem counterpart 
(btl_sm_eager_limit, default=4kB).]

3) Did I understand it right, that the upcoming OpenMPI 1.8.5
can be configured with more than one memory copy mechanism altogether
(e.g. --with-knem and --with-cma and --with-xpmem),
then select one of them at runtime with the btl_vader_single_copy_mechanism 
Or must OMPI be configured with only one memory copy mechanism?

Many thanks,
Gus Correa

On 10/30/2014 05:44 PM, Nathan Hjelm wrote:
I want to close the loop on this issue. 1.8.5 will address it in several

  - knem support in btl/sm has been fixed. A sanity check was disabling
    knem during component registration. I wrote the sanity check before
    the 1.7 release and didn't intend this side-effect.

  - vader now supports xpmem, cma, and knem. The best available
    single-copy mechanism will be used. If multiple single-copy
    mechanisms are available you can select which one you want to use are

More about the vader btl can be found here:

-Nathan Hjelm

On Fri, Oct 17, 2014 at 01:02:23PM -0700, Ralph Castain wrote:
      On Oct 17, 2014, at 12:06 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:
      Hi Jeff

      Many thanks for looking into this and filing a bug report at 11:16PM!

      Thanks to Aurelien, Ralph and Nathan for their help and clarifications


      Related suggestion:

      Add a note to the FAQ explaining that in OMPI 1.8
      the new (default) btl is vader (and what it is).

      It was a real surprise to me.
      If Aurelien Bouteiller didn't tell me about vader,
      I might have never realized it even existed.

      That could be part of one of the already existent FAQs
      explaining how to select the btl.


      Doubts (btl in OMPI 1.8):

      I still don't understand clearly the meaning and scope of vader
      being a "default btl".

    We mean that it has a higher priority than the other shared memory
    implementation, and so it will be used for intra-node messaging by

      Which is the scope of this default: intra-node btl only perhaps?

    Yes - strictly intra-node

      Was there a default btl before vader, and which?

    The "sm" btl was the default shared memory transport before vader

      Is vader the intra-node default only (i.e. replaces sm  by default),


      or does it somehow extend beyond node boundaries, and replaces (or
      brings in) network btls (openib,tcp,etc) ?

    Nope - just intra-node

      If I am running on several nodes, and want to use openib, not tcp,
      and, say, use vader, what is the right syntax?

      * nothing (OMPI will figure it out ... but what if you have
      IB,Ethernet,Myrinet,OpenGM, altogether?)

    If you have higher-speed connections, we will pick the fastest for
    inter-node messaging as the "default" since we expect you would want the
    fastest possible transport.

      * -mca btl openib (and vader will come along automatically)

    Among the ones you show, this would indeed be the likely choices (openib
    and vader)

      * -mca btl openib,self (and vader will come along automatically)

    The "self" btl is *always* active as the loopback transport

      * -mca btl openib,self,vader (because vader is default only for 1-node
      * something else (or several alternatives)

      Whatever happened to the "self" btl in this new context?
      Gone? Still there?

      Many thanks,
      Gus Correa

      On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:

        On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

          and on the MCA parameter file:

          btl_sm_use_knem = 1

        I think the logic enforcing this MCA param got broken when we revamped
        the MCA param system.  :-(

          I am scratching my head to understand why a parameter with such a
          suggestive name ("btl_sm_have_knem_support"),
          so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
          somehow vanished from ompi_info in OMPI 1.8.3.

        It looks like this MCA param was also dropped when we revamped the MCA
        system.  Doh!  :-(

        There's some deep mojo going on that is somehow causing knem to not be
        used; I'm too tired to understand the logic right now.  I just opened
        https://github.com/open-mpi/ompi/issues/239 to track this issue --
        feel free to subscribe to the issue to get updates.

Reply via email to