There is a extensive revision of the dladm support for IPoIB coming where Brussels support and a change in the administrative model will be dealt with. But that may be a ways off (est. 2010.Q2?) and in the meantime, people are screaming for the performance that Connected Mode gives, so we don't want to wait for that.
-ted Garrett D'Amore wrote: > I feel very strongly that I'd prefer to avoid the use of a driver.conf > for this, and instead handle it as a Brussels property, at least on > Solaris Nevada. (This will support administration via dladm, and > ultimately also ndd, though we don't like to say that. ;-) > > If you need to use a driver.conf for Solaris 10, that's OK I suppose > (although an ndd tunable would be better there too, since it doesn't > require the driver to be unloaded and reloaded to change the setting -- > which can be very challenging for administrators to figure out.) > > I feel TCR-strong on this -- if it were a full case I'd insist that this > be part of the spec before I'd vote to approve. > > Is the project team amenable to making this change, or do they have some > other reason why driver.conf values need to be used instead. > > Also, I'd like the mtu to be set via Brussels as well, if it isn't > already handled that way. > > - Garrett > > Ted Kim wrote: >> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI >> This information is Copyright 2009 Sun Microsystems >> 1. Introduction >> 1.1. Project/Component Working Name: >> IPoIB Connected Mode >> 1.2. Name of Document Author/Supplier: >> Author: Kevin Ge >> 1.3 Date of This Document: >> 30 October, 2009 >> 4. Technical Description >> >> A. Overview >> ----------- >> >> This case proposes changes to the Solaris kernel to provide support >> for "Connected Mode" in the IPoIB driver ibd(7D) (described in [1] >> and [2]). >> >> The Infiniband Architecture [3] defines multiple "transport service >> types", including Unreliable Datagram (UD), Reliable Connected (RC) >> and Unreliable Connected (UC). Current ibd (based on [4]) runs in >> "Datagram Mode" over the UD transport service type. Connected Mode >> (described in [5]) can use either UC and/or RC. >> >> This IPoIB-CM project uses RC, because of the desire to >> inter-operate with Linux which also uses RC. The main advantage of >> Connected Mode is better performance (higher throughput and lower >> CPU utilization) based on using very large MTUs (see below for more >> discussion). Connected Mode, though, can have the disadvantage of >> consuming more resources, especially when scaling up to a large >> cluster (due to using an InfiniBand connection to each destination). >> >> Note that this case only covers all necessary changes to support >> IPoIB driver running in Connected Mode over RC. Other enhancements >> are outside the scope of this case. >> >> A micro/patch binding is asserted for this proposal. >> >> B. Connected Mode IPoIB driver >> ------------------------------ >> >> The revised ibd(7D) driver will support both Connected and Datagram >> mode. The features from the current Datagram mode ibd driver will >> be inherited. The remainder of this section discusses interface >> additions for the Connected mode capable driver. >> >> >> B.1 Switching between datagram and connected mode >> >> The existing ibd driver in OpenSolaris and Solaris 10 does not >> ship with a driver .conf file. However, the Connected Mode support >> described in this case introduces a new parameter 'enable_rc' that >> may be set via the ibd driver .conf file. >> >> This parameter specifies whether each ibd instance defaults to >> using Connected Mode over RC or not. >> >> # 1: unicast packets will be sent over Reliable Connected Mode >> # 0: unicast packets will be sent over Unreliable Datagram Mode >> # >> # Each element in the list below maps to the corresponding ibd >> # instance; the first element is for ibd instance 0, the second >> # element is for instance 1 and so on. >> # >> enable_rc=1,1,0,0; >> >> Please note that Connected Mode support in IPoIB is optional as per >> [5]. Therefore, if Connected Mode is not available for a remote >> node, the Datagram mode will automatically be used for that >> destination by the ibd driver. Therefore, the only meaning of >> 'enable_rc' is to decide whether to try Connected Mode first or >> not, and whether to advertise this as a capability supported by >> this instance or not. >> >> The default value for 'enable_rc' for each instance is 0. Hence >> without a ibd.conf file, Datagram mode will be used. We intend to >> ship a driver .conf file for ibd in ONNV (and hence OpenSolaris) >> with enable_rc set to all ones (enabling Connected Mode by >> default on all instances) for the best performance. >> >> However, for Solaris 10, we have received business guidance to have >> an "opt-in" approach due to a desire for greater stability in >> established enterprise environments. We will do this by not >> shipping the .conf file. Therefore, by default Solaris 10 will be >> Datagram mode. It will take an explicit administrator action >> (setting enable_rc) to cause Solaris 10 to use Connected Mode. >> OFED (Linux IB) originally made Connected Mode opt-in too. However, >> later OFED made it the default. We don't intend to change it later >> to be the default in Solaris 10. However, Solaris Next, being >> descended from ONNV, will have it as default. >> >> An edited ibd(7D) manpage documenting this change is in the >> materials directory. >> >> B.2 Change of default MTU size >> >> Connected Mode by virtue of using the RC transport service type >> offers link MTUs of up to 2^31-4 octets in length. Thus, the use of >> Connected Mode can offer benefits by supporting very large MTUs. >> Datagram Mode using UD is limited to 4092 (4K-4) octets, though >> commonly only 2044 (2K-4) is offered. >> >> Due to the limits of the TCP/IP protocol, it makes sense to only >> offer up to 65535 (64K-1) bytes. OFED (i.e. Linux IB) uses 65520 >> (64K-16) byte MTU for alignment reasons. To inter-operate with >> OFED at the best performance, we also adopt 65520 as the default >> MTU of the Connected Mode. >> >> >> C. Interfaces >> ------------- >> +-------------------------------------------------------------------+ >> | Interfaces Exported | >> +---------------------------+------------------+--------------------+ >> | Interface Name | Classification | Comment | >> +---------------------------+------------------+--------------------+ >> |/kernel/drv/ibd.conf* | Uncommitted | Configuration file | >> +---------------------------+------------------+--------------------+ >> * = only for OpenSolaris >> >> >> D. References >> ------------- >> >> [1] IP over InfiniBand, PSARC/2001/289 >> >> [2] IPoIB Conversion to GLDv3, PSARC/2007/636 >> >> [3] InfiniBand Architecture Specification Volume 1, Release 1.2.1, >> InfiniBand Trade Association, 2007. >> http://www.infinibandta.org/content/pages.php?pg=technology_download >> >> [4] Transmission of IP over InfiniBand (IPoIB), RFC 4391, IETF, >> http://www.ietf.org/rfc/rfc4391.txt >> >> [5] IP over InfiniBand: Connected Mode, RFC 4755, IETF, >> http://www.ietf.org/rfc/rfc4755.txt >> >> 6. Resources and Schedule >> 6.4. Steering Committee requested information >> 6.4.1. Consolidation C-team Name: >> ON >> 6.5. ARC review type: FastTrack >> 6.6. ARC Exposure: open >> >> > -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX