There is a extensive revision of the dladm
support for IPoIB coming where Brussels
support and a change in the administrative
model will be dealt with. But that may be
a ways off (est. 2010.Q2?) and in the
meantime, people are screaming for
the performance that Connected Mode gives,
so we don't want to wait for that.

-ted

Garrett D'Amore wrote:
> I feel very strongly that I'd prefer to avoid the use of a driver.conf 
> for this, and instead handle it as a Brussels property, at least on 
> Solaris Nevada.  (This will support administration via dladm, and 
> ultimately also ndd, though we don't like to say that. ;-)
> 
> If you need to use a driver.conf for Solaris 10, that's OK I suppose 
> (although an ndd tunable would be better there too, since it doesn't 
> require the driver to be unloaded and reloaded to change the setting -- 
> which can be very challenging for administrators to figure out.)
> 
> I feel TCR-strong on this -- if it were a full case I'd insist that this 
> be part of the spec before I'd vote to approve.
> 
> Is the project team amenable to making this change, or do they have some 
> other reason why driver.conf values need to be used instead.
> 
> Also, I'd like the mtu to be set via Brussels as well, if it isn't 
> already handled that way.
> 
>    - Garrett
> 
> Ted Kim wrote:
>> Template Version: @(#)sac_nextcase 1.68 02/23/09 SMI
>> This information is Copyright 2009 Sun Microsystems
>> 1. Introduction
>>     1.1. Project/Component Working Name:
>>      IPoIB Connected Mode
>>     1.2. Name of Document Author/Supplier:
>>      Author:  Kevin Ge
>>     1.3  Date of This Document:
>>     30 October, 2009
>> 4. Technical Description
>>
>> A. Overview
>> -----------
>>
>>    This case proposes changes to the Solaris kernel to provide support
>>    for "Connected Mode" in the IPoIB driver ibd(7D) (described in [1]
>>    and [2]).
>>
>>    The Infiniband Architecture [3] defines multiple "transport service
>>    types", including Unreliable Datagram (UD), Reliable Connected (RC)
>>    and Unreliable Connected (UC). Current ibd (based on [4]) runs in
>>    "Datagram Mode" over the UD transport service type. Connected Mode
>>    (described in [5]) can use either UC and/or RC.
>>
>>    This IPoIB-CM project uses RC, because of the desire to
>>    inter-operate with Linux which also uses RC. The main advantage of
>>    Connected Mode is better performance (higher throughput and lower
>>    CPU utilization) based on using very large MTUs (see below for more
>>    discussion). Connected Mode, though, can have the disadvantage of
>>    consuming more resources, especially when scaling up to a large
>>    cluster (due to using an InfiniBand connection to each destination).
>>
>>    Note that this case only covers all necessary changes to support
>>    IPoIB driver running in Connected Mode over RC. Other enhancements
>>    are outside the scope of this case.
>>
>>    A micro/patch binding is asserted for this proposal.
>>
>> B. Connected Mode IPoIB driver
>> ------------------------------
>>
>>    The revised ibd(7D) driver will support both Connected and Datagram
>>    mode. The features from the current Datagram mode ibd driver will
>>    be inherited. The remainder of this section discusses interface
>>    additions for the Connected mode capable driver.
>>
>>
>> B.1 Switching between datagram and connected mode
>>
>>    The existing ibd driver in OpenSolaris and Solaris 10 does not
>>    ship with a driver .conf file. However, the Connected Mode support
>>    described in this case introduces a new parameter 'enable_rc' that
>>    may be set via the ibd driver .conf file.
>>
>>    This parameter specifies whether each ibd instance defaults to
>>    using Connected Mode over RC or not.
>>
>>        # 1: unicast packets will be sent over Reliable Connected Mode
>>        # 0: unicast packets will be sent over Unreliable Datagram Mode
>>        #
>>        # Each element in the list below maps to the corresponding ibd
>>        # instance; the first element is for ibd instance 0, the second
>>        # element is for instance 1 and so on.
>>        #
>>        enable_rc=1,1,0,0;
>>
>>    Please note that Connected Mode support in IPoIB is optional as per
>>    [5]. Therefore, if Connected Mode is not available for a remote
>>    node, the Datagram mode will automatically be used for that
>>    destination by the ibd driver. Therefore, the only meaning of
>>    'enable_rc' is to decide whether to try Connected Mode first or
>>    not, and whether to advertise this as a capability supported by
>>    this instance or not.
>>
>>    The default value for 'enable_rc' for each instance is 0. Hence
>>    without a ibd.conf file, Datagram mode will be used. We intend to
>>    ship a driver .conf file for ibd in ONNV (and hence OpenSolaris)
>>    with enable_rc set to all ones (enabling Connected Mode by
>>    default on all instances) for the best performance.
>>
>>    However, for Solaris 10, we have received business guidance to have
>>    an "opt-in" approach due to a desire for greater stability in
>>    established enterprise environments. We will do this by not
>>    shipping the .conf file. Therefore, by default Solaris 10 will be
>>    Datagram mode. It will take an explicit administrator action
>>    (setting enable_rc) to cause Solaris 10 to use Connected Mode.
>>      OFED (Linux IB) originally made Connected Mode opt-in too. However,
>>    later OFED made it the default. We don't intend to change it later
>>    to be the default in Solaris 10. However, Solaris Next, being
>>    descended from ONNV, will have it as default.
>>
>>    An edited ibd(7D) manpage documenting this change is in the
>>    materials directory.
>>
>> B.2 Change of default MTU size
>>
>>    Connected Mode by virtue of using the RC transport service type
>>    offers link MTUs of up to 2^31-4 octets in length. Thus, the use of
>>    Connected Mode can offer benefits by supporting very large MTUs.
>>    Datagram Mode using UD is limited to 4092 (4K-4) octets, though
>>    commonly only 2044 (2K-4) is offered.
>>
>>    Due to the limits of the TCP/IP protocol, it makes sense to only
>>    offer up to 65535 (64K-1) bytes. OFED (i.e. Linux IB) uses 65520
>>    (64K-16) byte MTU for alignment reasons. To inter-operate with
>>    OFED at the best performance, we also adopt 65520 as the default
>>    MTU of the Connected Mode.
>>
>>
>> C. Interfaces
>> -------------
>> +-------------------------------------------------------------------+
>> |                     Interfaces Exported                           |
>> +---------------------------+------------------+--------------------+
>> |    Interface Name         |  Classification  |      Comment       |
>> +---------------------------+------------------+--------------------+
>> |/kernel/drv/ibd.conf*      |   Uncommitted    | Configuration file |
>> +---------------------------+------------------+--------------------+
>>  * = only for OpenSolaris
>>
>>
>> D. References
>> -------------
>>
>>    [1] IP over InfiniBand, PSARC/2001/289
>>
>>    [2] IPoIB Conversion to GLDv3, PSARC/2007/636
>>
>>    [3] InfiniBand Architecture Specification Volume 1, Release 1.2.1,
>>      InfiniBand Trade Association, 2007.
>>      http://www.infinibandta.org/content/pages.php?pg=technology_download
>>
>>    [4] Transmission of IP over InfiniBand (IPoIB), RFC 4391, IETF,
>>        http://www.ietf.org/rfc/rfc4391.txt
>>
>>    [5] IP over InfiniBand: Connected Mode, RFC 4755, IETF,
>>        http://www.ietf.org/rfc/rfc4755.txt
>>
>> 6. Resources and Schedule
>>     6.4. Steering Committee requested information
>>        6.4.1. Consolidation C-team Name:
>>         ON
>>     6.5. ARC review type: FastTrack
>>     6.6. ARC Exposure: open
>>
>>   
> 

-- 
Ted H. Kim
Sun Microsystems, Inc.                  ted.kim at sun.com
222 North Sepulveda Blvd., 10th Floor   (310) 341-1116
El Segundo, CA  90245                   (310) 341-1120 FAX

Reply via email to