I'm sponsoring this fasttrack for Lizhi Hou.  The timer is set for
11/10/2007.  The release binding is Patch.

1 Overview
----------
   This case proposes changes to the Solaris kernel to provide support
   for GLDv3-based IPoIB driver ibd(7D). It introduces the two primary
   components of this solution: mac_ib plugin, and GLDv3 IPoIB driver.

   Note that this case only covers all necessary changes for porting
   IPoIB driver to GLDv3 framework. Additional enhancements made to the
   IPoIB driver will be done. The whole IPoIB architecture is defined by
   PSARC/2001/289.

2 The mac_ib plugin
-------------------
   The mac_ib plugin is written to the Nemo MAC-Type Plugin architecture
   defined by PSARC 2006/248 (and updated by PSARC 2006/406 and
   2007/298.)  The plug-in will fill in all mtr_ops callbacks with
   functions appropriate for IB as below.

     static mactype_ops_t mac_ib_type_ops = {
           MTOPS_HEADER_COOK | MTOPS_HEADER_UNCOOK | MTOPS_LINK_DETAILS
           mac_ib_unicst_verify,
           mac_ib_multicst_verify,
           mac_ib_sap_verify,
           mac_ib_header,
           mac_ib_header_info,
           NULL,                     /* pdata verify */
           mac_ib_header_cook,
           mac_ib_header_uncook,
           mac_ib_link_details
     };

   A <sys/mac_ib.h> header file will contain the necessary information
   for drivers to use the plugin, namely a MAC_PLUGIN_IDENT_IB macro used
   to identify the plugin during mac_register().

2.1 Multicast/Broadcast address
-------------------------------
   The current MAC plug-in design makes an assumption that there is
   single broadcast address defined for the interconnect (like on
   Ethernet). However IPoIB defines a broadcast address per IPoIB link
   (See RFC4391).

   The IPoIB Multicast/Broadcast address is depicted in Figure 1: (see
   definition in RFC 4391, section 4)

    |  8 |24 bits| 8  | 4 |  4  | 16 bits  | 16 bits |      80 bits      |
    +----+-------+----+---+-----+----------+---------+-------------------+
    |Resv|  QPN  |0xFF|0x1|scope|IPoIB sign|  P_Key  |      group ID     |
    +----+-------+----+---+-----+----------+---------+-------------------+

                        Figure 1

   Since <scope> and <P_Key> have different values between two driver
   instances, the mac_ib plugin has to set them to zero. All other fields
   are filled with exact value in the mac_ib plugin. When the mac_ib
   plugin module loads, this broadcast address is registered with the
   GLDv3 framework by calling mactype_register(). In IPoIB driver mc_tx()
   and mc_multicst() callback functions, <scope> and <P_Key> will be
   filled with correct value if the QPN of the destination address is
   Multicast/Broadcast QPN (0xFFFFFF).

   Since mc_multicst() will fill <scope> and <P_Key>, no changes are
   necessary for the multicast related IB code in ip_if.c.

2.2 mac_ib_header
-----------------
   All IP and ARP datagrams transported over InfiniBand are prefixed by
    a 4-octet encapsulation header as illustrated below. (see RFC4391)

    | 16 bits  | 16 bits |
    +----------+---------+
    |  type    |  Resv   |
    +----------+---------+

   However, in order to transmit the datagram to correct destination, an
   extra header including destination address is required. IB does not
   provide an interface for sending a link layer header directly to the
   IB link and the link layer header received from the IB link is missing
   information that GLDv3 requires. So mac_ib plugin will specify a
   "soft" header in <sys/mac_ib.h> as illustrated below.

       typedef struct ib_addrs {
           ipoib_mac_t  ipib_src;
           ipoib_mac_t  ipib_dst;
       } ib_addrs_t;

       typedef struct ib_header_info {
           union {
                ipoib_pgrh_t    ipib_grh;
                ib_addrs_t      ipib_addrs;
           } ipib_prefix;
           ipoib_hdr_t  ipib_rhdr;
       } ib_header_info_t;

   This extra header will be this format below:

    | 20 bytes | 20 bytes|
    +----------+---------+
    | ipib_src | ipib_dst|
    +----------+---------+

    Header_info structure

   For outbound datagram, mac_ib_header() will create the Header_info
   structure and fill in destination address.

   For inbound datagrams, the IB link will deliver one of the IB link
   layer headers called, the Global Routing Header (GRH) and information
   from it is used by the IPoIB driver to build the Header_info structure
   and pass it with the datagram up to GLDv3.


2.3 mac_ib_header_cook/mac_ib_header_uncook
-------------------------------------------
   In IPoIB design PSARC/2001/289, GLDv2 is supposed to send this down to
   driver:

        |  20 bytes      |4 bytes|               |
        +----------------+-------+---------------+
        | destination    | Type  |  IP/ARP data  |
        +----------------+-------+---------------+

                   Format A

   And the driver is supposed to hand over this to GLDv2.

        |  40 bytes   |4 bytes|               |
        +-------------+-------+---------------+
        |     GRH     | Type  |  IP/ARP data  |
        +-------------+-------+---------------+

                   Format B

   After porting to GLDv3, the driver has to be compatible with raw dlpi
   client.  mac_ib_header_cook() will strip off 20 bytes destination
   address and create new header_info structure (see 2.2).
   mac_ib_header_uncook() will strip off the Extra Header.

2.4 mac_ib_link_details
-----------------------
   When the link is active, mac_ib_link_details() will be called to
   provide details on link speed.

2.5 mac_ib_sap_verify
---------------------
   mac_ib_sap_verify() Check the legality of an SAP value. Based on
   PSARC/2003/150, the SAP range 0-255 selects IEEE 802 semantics, so
   mac_ib_sap_verify() returns B_TRUE and sets bind_sap (if non_NULL) to
   LLC SAP to which GLDv3 should bind DLPI consumers. The SAP range
   256-65535 selects EtherType semantics. mac_ib_sap_verify() returns
   B_TRUE and sets bind_sap to the SAP value. For other SAP values,
   mac_ib_sap_verify() returns B_FALSE.

3 GLDv3 IPoIB driver
--------------------
   The ibd(7D) driver is converted from GLDv2 to GLDv3 (Nemo).  Most of
   the features from the GLDv2 driver will be inherited.  The remainder
   of this section discusses the important changes in GLDv3 IPoIB driver.
   The GLDv3 driver interfaces are defined in PSARC 2004/471, 2005/365,
   2006/248, and 2006/249.

3.1 Add/Remove multicast address
--------------------------------
   GLDv3 architecture assumes that add/remove multicast addresses and
   set/unset promiscuous mode are done by manipulating data structures
   managed by the NIC interface.  This is also true for IPoIB, however,
   it is also necessary to communicate with an IB fabric entity called
   the SA to make corresponding changes in the IB switches in the IB
   fabric. The communication with the SA is handled asynchronously by the
   IPoIB driver. In this scenario, the IPoIB driver is proposed to return
   zero(success) immediately if the request can be scheduled to be sent
   and wait for the reply in an async thread. If the SA
       (1) fails to respond or
       (2) can't satisfy the request,
   then an error is logged.

   This is reasonable because this sort of failure indicates a fabric
   problem and needs to be reported to the fabric administrator, not the
   host applications. There are no recovery operations that can be done
   by making changes to Solaris.

3.2 Service FIFO mechanism
--------------------------
   In GLD version IPoIB driver, it introduced service FIFO mechanism. In
   the interrupt handler, it does not call gld_recv() directly for
   machine armed with multiple CPUs. Instead, it will send the received
   packet to a service fifo. A work thread will get this packet and call
   gld_recv() later.  This mechanism will disabled by default in GLDv3
   driver, since GLDv3 is supposed to do the similar thing via soft ring
   PSARC/2005/654.

4. Interfaces
-------------

________________________________________________________________________
|                             Interfaces Added                         |
|_________________________|_______________________|____________________|
|      mac_ib.h           | Consolidation Private | <sys/mac_ib.h>     |
|  MAC_PLUGIN_IDENT_IB    | Consolidation Private | <sys/mac_ib.h>     |
|_________________________|_______________________|____________________|

5. References
-------------
   http://opensolaris.org/os/community/networking/nemo-design.pdf
   ftp://ftp.rfc-editor.org/in-notes/rfc4391.txt

Reply via email to