Note to PSARC admin folks: I may need manual intervention to get this into the agenda. (As far as I can tell, the tools don't support a fasttrack using an existing case with one-pager already in place.)
Thanks, -ted ++++ This information is Copyright 2009 Sun Microsystems 1. Introduction 1.1. Project/Component Working Name: Open Fabrics User Verbs (OFUV) primary kernel components 1.2. Name of Document Author/Supplier: Author: Brendan Doyle 1.3 Date of This Document: 6 November, 2009 4. Technical Description Open Fabrics User Verbs (OFUV) primary kernel components ======================================================== Table of Contents ----------------- I. Introduction II. Summary of Interfaces A. Exported OFED RDMA CM APIs B. Imported Contracted Project Private IBTF APIs C. New Imported IBTF API III. Description of Interfaces A. OFED RDMA CM APIs B. Contracts for private IBTF APIs C. Updated IBTF API IV. Summary of changes by man page I. Introduction --------------- In Linux, the most popular InfiniBand (IB) OS-bypass framework is the Open Fabrics User Verbs (OFUV) framework from the Open Fabrics Enterprise Distribution (OFED). The OFUV API itself is modeled to a large degree on the KPI of OFED, used by OFED Linux kernel modules. Many of the calls originating in userland eventually join kernel calls in common code further down the stack in the OFED framework. In Solaris, OFUV is being ported over in two parts: kernel and userland. The kernel part is more fully described in the one-pager of this case. The userland portion is a companion project which delivers the open source libraries. Because of the similarity of the OFUV API to the OFED KPI and their largely common back-end, there is an opportunity to port selected portions of OFED KPI along the way to accelerate development of certain key applications. In particular, our business objective is to port what is necessary to satisfy some of the requirements of Oracle's Reliable Datagram Sockets v3 (RDSv3, used in Exadata 2) and the Lustre Solaris port. Both applications are originally written for Linux OFED and benefit considerably from porting a certain facility known as the "RDMA-CM", which provides a KPI to manage IB connections. This KPI would be an alternative compatibility KPI to the one we already have in IBTF (Solaris IB framework). The kernel OFUV project is being delivered in phases. This phase provides the RDMA-CM KPI for the kernel applications mentioned above and lays the foundation for later phases. The actual OS-bypass functionality is not enabled in this phase, but code is architected with this goal in mind and as a result ends up being distributed in a number of driver modules that match the architecture of the final phase of the project. A later fasttrack will describe interface enabled and used for the support of the OS-bypass functionality. In summary, this fast track describes the RDMA-CM interfaces delivered in this first phase of the OFUV kernel project. References: o Open Fabrics User Verbs (OFUV) - primary kernel components PSARC/2009/421 one-pager http://sac.sfbay/PSARC/2009/421/20090731_brendan.doyle o IBTF: InfiniBand Transport Framework PSARC/2002/132 o RDS - Reliable Datagram Service PSARC/2006/356 o Kernel RDMA CM API Architecture and use: materials directory: ofuv_rdma_arch.txt o Solaris Open Fabrics User Verbs Architecture Document: materials directory: solaris_ofuv_arch.pdf o OFUV Implementation Details: materials directory: OFUVImplementationDetails.pdf II. Summary of Interfaces ------------------------- This case asserts a micro/patch binding. A. Exported OFED RDMA CM APIs - ON Consolidation Private rdma_accept() - OFED Defined rdma_bind_addr() - OFED Defined rdma_cm_event_handler() - OFED Defined rdma_connect() - OFED Defined rdma_create_id() - OFED Defined rdma_create_qp() - OFED Defined rdma_destroy_id() - OFED Defined rdma_destroy_qp() - OFED Defined rdma_disconnect() - OFED Defined rdma_init_qp_attr() - OFED Defined rdma_join_multicast() - OFED Defined rdma_leave_multicast() - OFED Defined rdma_listen() - OFED Defined rdma_reject() - OFED Defined rdma_resolve_addr() - OFED Defined rdma_resolve_route() - OFED Defined ib_get_ibt_channel_hdl() - Solaris Extension ib_get_ibt_hca_hdl() - Solaris Extension B. Imported Contracted Project Private IBTF APIs These two calls are private IBTF interfaces used in this project by contract: ibt_ofuvcm_get_req_data() ibt_ofuvcm_proceed() Additionally a Contracted Project Private IBTF interface flag is added to the ibt_open_rc_channel(9f) function as follows: IBT_OCHAN_OFUV Indicates this channel is for an Open Fabric User Verbs (OFUV) consumer. IBTF does not flush the QP associated with channel when a DREQ is received for OFUV channels. C. New Imported IBTF API IBT_GENERIC_MISC - ON Consolidation Private (IBTF Transport Interface) add new value to ibt_clnt_class_t arg of ibt_attach(9f) III. Description of Interfaces ------------------------------ A. OFED RDMA CM APIs This project provides the OFED kernel RDMA CM interfaces defined in the rdma_cm.h OFED header, with a number of Solaris specific extensions required in order to interface into IBTF (which map from "CM ID" concept to the related IBTF handles). The 'sol_ofs' kernel module exports the OFED RDMA CM interfaces to kernel consumers, and translates the OFED APIs into Solaris equivalent IBTF APIs. See the provided man pages (in the materials/man_pages directory) for details on each API. B. Contract for private IBTF APIs See the case directory for contract (contract-01.txt) to use the project private IBTF APIs. C. Updated IBTF API To support this framework, a new client class(IBT_GENERIC_MISC) is added to the list of support IBTF client classes (ibt_clnt_class_t). This change is documented in the revised man pages for ibt_attach.9f and ibt_clnt_modinfo_t.9s in the materials/man_pages directory. IV. Summary of changes by man page ---------------------------------- The OFED manual pages for the RDMA CM kernel APIs are taken from OFED and converted to Solaris conventions. A few new man pages for the Solaris specific extension are also provided. Modified versions of existing man pages have change bars. All new or changed man pages can be found in the case materials/man_pages directory. Man page Disposition Reasons for change (sorted by section and name) (subsection of III) ------------------------------------------------------------------ sol_ofs(7D) new A sol_uverbs(7D) new A sol_ucma(7d) new A rdmacm(9) new A rdma_cm_event_handler(9E) new A ib_get_ibt_channel_hdl(9F) new A ib_get_ibt_hca_hdl(9F) new A ibt_attach(9F) changed C rdma_accept(9F) new A rdma_bind_addr(9F) new A rdma_connect(9F) new A rdma_create_id(9F) new A rdma_create_qp(9F) new A rdma_destroy_id(9F) new A rdma_destroy_qp(9F) new A rdma_disconnect(9F) new A rdma_init_qp_attr(9F) new A rdma_join_multicast(9F) new A rdma_leave_multicast(9F) new A rdma_listen(9F) new A rdma_reject(9F) new A rdma_resolve_addr(9F) new A rdma_resolve_route(9F) new A ibt_clnt_modinfo_t(9S) changed C 6. Resources and Schedule 6.4. Steering Committee requested information 6.4.1. Consolidation C-team Name: ON 6.5. ARC review type: FastTrack 6.6. ARC Exposure: open -- Ted H. Kim Sun Microsystems, Inc. ted.kim at sun.com 222 North Sepulveda Blvd., 10th Floor (310) 341-1116 El Segundo, CA 90245 (310) 341-1120 FAX