Hi Jeff

Many thanks for looking into this and filing a bug report at 11:16PM!

Thanks to Aurelien, Ralph and Nathan for their help and clarifications


Related suggestion:

Add a note to the FAQ explaining that in OMPI 1.8
the new (default) btl is vader (and what it is).

It was a real surprise to me.
If Aurelien Bouteiller didn't tell me about vader,
I might have never realized it even existed.

That could be part of one of the already existent FAQs
explaining how to select the btl.


Doubts (btl in OMPI 1.8):

I still don't understand clearly the meaning and scope of vader
being a "default btl".
Which is the scope of this default: intra-node btl only perhaps?
Was there a default btl before vader, and which?
Is vader the intra-node default only (i.e. replaces sm  by default),
or does it somehow extend beyond node boundaries, and replaces (or brings in) network btls (openib,tcp,etc) ?

If I am running on several nodes, and want to use openib, not tcp,
and, say, use vader, what is the right syntax?

* nothing (OMPI will figure it out ... but what if you have IB,Ethernet,Myrinet,OpenGM, altogether?)
* -mca btl openib (and vader will come along automatically)
* -mca btl openib,self (and vader will come along automatically)
* -mca btl openib,self,vader (because vader is default only for 1-node jobs)
* something else (or several alternatives)

Whatever happened to the "self" btl in this new context?
Gone? Still there?

Many thanks,
Gus Correa

On 10/16/2014 11:16 PM, Jeff Squyres (jsquyres) wrote:
On Oct 16, 2014, at 1:35 PM, Gus Correa <g...@ldeo.columbia.edu> wrote:

and on the MCA parameter file:

btl_sm_use_knem = 1

I think the logic enforcing this MCA param got broken when we revamped the MCA 
param system.  :-(

I am scratching my head to understand why a parameter with such a
suggestive name ("btl_sm_have_knem_support"),
so similar to the OMPI_BTL_SM_HAVE_KNEM cpp macro,
somehow vanished from ompi_info in OMPI 1.8.3.

It looks like this MCA param was also dropped when we revamped the MCA system.  
Doh!  :-(

There's some deep mojo going on that is somehow causing knem to not be used; 
I'm too tired to understand the logic right now.  I just opened 
https://github.com/open-mpi/ompi/issues/239 to track this issue -- feel free to 
subscribe to the issue to get updates.

Reply via email to