This project has the required support from the Networking and Security communities.
Please setup a new project space for this with myself as the initial contact. Thanks, Erik -- OPENSOLARIS PROJECT PROPOSAL -- Project Name: IP Datapath Refactoring Project Synopsis: Simplify the IP Datapaths to make them more understandable and evolvable Project Purpose: The IP datapaths are extemely hard to follow both at the micro level (ip_output_options and ip_wput_ire, and ip_input) and at the macro level (an outbound packet needing IPsec and ARP resolution goes through an odd number of steps). That makes it hard to even fix bugs in that code, let alone getting it to perform. This has resulted improving performance by creating numerous fast paths, which are subsets of the full datapaths. This further makes maintenance of the code a hazardous activity. The root cause of the complexity is that ip_newroute introduces asynchrony in the wrong part of the code. Tradionally ARP is done at the very bottom of the IP output side, but to avoid a separate ARP table lookup Solaris has an IRE_CACHE entry which is created to include the ARP information. This is done early in ip_output because the IRE_CACHE is also used to pick an IP source address in some cases (unconnected UDP and RAWIP sockets) and we need the IP source address early (before doing IPsec etc). We need to move the ARP-related asynchrony to the bottom of IP output to get the output datapaths be more sane, and it also makes sense to disassociate source address selection from routing/IRE lookup. (In 1991 the source address selection was simpler than today to the association made some sense. But with IPMP, IPv6, shared-IP zones etc the source address selection can't simply be associated with the route.) A side effect of ip_newroute is that we need to carry various information from the transport protocols to the point after ip_newroute is done. We've created various ways to put this information in the messages so that they can be queued with the packets waiting for ARP resolution; the ip6i_t is there for this purpose as well as the ipsec_{in,out}_t which is currently used for more than just IPsec. There are also ad-hoc places we scribble information (b_prev, etc). Note that the ip6i_t and M_CTL are also used to carry information between the transport protocols (for both the input and output path). But after Fireengine in S10 introduced direct function calls between the transports and IP we are no longer limited to passing a message using putnext. Hence we can relatively easily add function call arguments up and down between the transports and IP and have those function call arguments carry the meta-data associated with the packet (an example of meta-data is that on the receive side the transports need the incoming interface - the ill_t - to handle IP_RECVPKTINFO and IPv6 link-local addresses correctly.) Having looked at the dependencies that unravel when ip_newroute is removed it turns out that the whole concept of IRE_CACHE isn't needed any more. We can do more efficient caching (and S10 already does for TCP) by caching the IRE and NCE (neighb or cache entry containing ARP information) in the conn_t. This results in the removal of ip_newroute* IRE_CACHE ip6i_t M_CTL usage, including ipsec_out_t and ipsec_in_t Various b_prev usage in the ip_input side and the addition of ip_xmit_attr_t - the transmit attributes passed to ip_output ip_recv_attr_t - receive attributes passed up to the ULP (and used internally in IP) A new way to track dependencies when IREs are added and removed Using nce_t for ARP information (we do this partially today; mostly for the IPv4 forwarding paths) Current prototyping indicates that about 30,000 lines of code can be removed as a result of these changes (combined with the ARP/IP merge pieces). The discussion will take place on the existing networking-discuss at opensolaris.org list. Proposed Community Sponsors: Networking and Security Participants: Project lead: Erik Nordmark Other Participants: Sowmini Varadhan Yunsong (Roamer) Lu Nitin Hande Other interested participants: please speak up. We have some prototype code, and contributions of review time, bug fixes, or testing are very welcome; there's a lot of code changes here. ------ _______________________________________________ networking-discuss mailing list networking-discuss at opensolaris.org