Hey Andrew I have one for you...

I get the following error message on a node that does not have any IB cards
--------------------------------------------------------------------------
[0,1,0]: uDAPL on host burl-ct-v40z-0 was unable to find any NICs.
Another transport will be used instead, although this may result in
lower performance.
--------------------------------------------------------------------------

but I don't see this for the openib btl. Why udapl and not openib? Am I missing something?

-DON

Andrew Friedley wrote On 08/10/06 17:06,:

Hopefully some of the other developers will correct me if I am wrong..

Brock Palen wrote:
I had a user ask this, its not a very practical question but I am curious.

This is good information for the archives :)

OMPI uses a 'fast' network if its available. (IB, GM, etc) I also infer that for process in the same SMP nodes the sm (shared memory) btl is used, even if the job has more than one node given to it? The real question is what happens if a job is given three nodes, two have IB adapters and all have ethernet. Will the entire job use TCP for process on different nodes and shared memory inner node? Or will the two that have ib connections use ib to communicate and only use TCP when talking to the third host that does not have IB.

You infer correctly - sm is just considered to be another network we support.

The two nodes with IB will use IB to communicate with each other, and ethernet (TCP) to communicate with the third node that lacks IB. This works the same for shared memory - MPI processes on the same node will use SM to communicate, and use say IB or TCP to communicate off-node.

Second would it be safe to say OMPI searches the BTL's in the following order when trying to reach a process?
Self
SM
IB, GM, MX, MVAPI
TCP

Actually, each BTL has an exclusivity value that we use to choose which BTL is given preference when we have several BTLs available for communication. A quick grep shows you're pretty much right on:

$ ompi_info --all|grep exclusivity
 MCA btl: parameter "btl_openib_exclusivity" (current value: "1024")
 MCA btl: parameter "btl_self_exclusivity" (current value: "65536")
 MCA btl: parameter "btl_sm_exclusivity" (current value: "65535")
 MCA btl: parameter "btl_tcp_exclusivity" (current value: "0")

These of course can be tuned, though expect trouble if you give something higher exclusivity than self. These numbers have no real meaning other than their relation to one another. For example, changing openib's exclusivity to 65000 won't change when/how it is used among the BTLs I have above, though it would possibly change relative to GM/MX/MVAPI if they're present.

Third, what about a hypothetical case when a node has both GM and IB on it? (evaluation machines)

(This is where I might be wrong) The network with the highest exclusivity is used for sending of eager messages and the initial part of large messages using rendezvous protocol. Beyond that, large message data is striped across all available BTLs for more bandwidth.

You probably know already that the 'btl' MCA parameter can be used to select a set of BTLs at runtime, ie to just use IB (and self).

Last does OMPI build something like a route list when mpi_init() is called? This way knowing how to get to other members off the job? Or is this done the first time a process needs to talk to another process thus not storing any route information not needed.

Yes - at initialization time (and when processes are dynamically added), each BTL is responsible for determining which other processes it can communicate with. This information is pushed back up to the higher levels (BML/PML) for use in scheduling decisions.

However, those BTLs that communicate over point-to-point connection pairs do not establish connections until data needs to be sent (lazy connection establishment). This way we do not immediately set up N^2 connections, but instead only as each pairwise communication path is used.

The route information consumes relatively few resources compared to all the buffers and handles that must be allocated for connections in most of the BTLs.

p.s. not having to recompile code for different networks has made evaluating network so much more enjoyable. Thank-you for all the work on the selection of networks 'just working'

That was our goal, stuff should just work. Glad you appreciate it as much as we do.

Andrew
_______________________________________________
users mailing list
us...@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

Reply via email to