Dan McDonald wrote: > Pardon the top-post, but I think the Security community will be interested > in this project too because complexity is the enemy of security, and this > project reduces complexity.
Sorry for not thinking about that up front. > And as a core contributor in Security, I ACK/+1 this project for endorsing > this project. (The project team can deem this endorsement inappropriate if > they wish.) I don't have an issue with endorsements from the security community. Erik > Dan > > On Wed, Jan 28, 2009 at 03:07:57PM -0800, Erik Nordmark wrote: >> We would like to start the IP datapath refactoring project. >> We are requesting endorsement from the the networking community. >> >> Thanks, >> Erik >> >> >> -- OPENSOLARIS PROJECT PROPOSAL -- >> >> >> Project Name: >> IP Datapath Refactoring >> >> Project Synopsis: >> Simplify the IP Datapaths to make them more understandable and evolvable >> >> >> Project Purpose: >> >> The IP datapaths are extemely hard to follow both at the micro level >> (ip_output_options and ip_wput_ire, and ip_input) and at the macro level >> (an outbound packet needing IPsec and ARP resolution goes through an odd >> number of steps). >> >> That makes it hard to even fix bugs in that code, let alone getting it >> to perform. This has resulted improving performance by creating numerous >> fast paths, which are subsets of the full datapaths. This further makes >> maintenance of the code a hazardous activity. >> The root cause of the complexity is that ip_newroute introduces >> asynchrony in the wrong part of the code. Tradionally ARP is done at the >> very bottom of the IP output side, but to avoid a separate ARP table >> lookup Solaris has an IRE_CACHE entry which is created to include the >> ARP information. This is done early in ip_output because the IRE_CACHE >> is also used to pick an IP source address in some cases (unconnected UDP >> and RAWIP sockets) and we need the IP source address early (before doing >> IPsec etc). >> >> We need to move the ARP-related asynchrony to the bottom of IP output to >> get the output datapaths be more sane, and it also makes sense to >> disassociate source address selection from routing/IRE lookup. (In 1991 >> the source address selection was simpler than today to the association >> made some sense. But with IPMP, IPv6, shared-IP zones etc the source >> address selection can't simply be associated with the route.) >> >> A side effect of ip_newroute is that we need to carry various >> information from the transport protocols to the point after ip_newroute >> is done. We've created various ways to put this information in the >> messages so that they can be queued with the packets waiting for ARP >> resolution; the ip6i_t is there for this purpose as well as the >> ipsec_{in,out}_t which is currently used for more than just IPsec. There >> are also ad-hoc places we scribble information (b_prev, etc). >> >> Note that the ip6i_t and M_CTL are also used to carry information >> between the transport protocols (for both the input and output path). >> But after Fireengine in S10 introduced direct function calls between the >> transports and IP we are no longer limited to passing a message using >> putnext. Hence we can relatively easily add function call arguments up >> and down between the transports and IP and have those function call >> arguments carry the meta-data associated with the packet (an example of >> meta-data is that on the receive side the transports need the incoming >> interface - the ill_t - to handle IP_RECVPKTINFO and IPv6 link-local >> addresses correctly.) >> >> Having looked at the dependencies that unravel when ip_newroute is >> removed it turns out that the whole concept of IRE_CACHE isn't needed >> any more. We can do more efficient caching (and S10 already does for >> TCP) by caching the IRE and NCE (neighb or cache entry containing ARP >> information) in the conn_t. >> >> This results in the removal of >> ip_newroute* >> IRE_CACHE >> ip6i_t >> M_CTL usage, including ipsec_out_t and ipsec_in_t >> Various b_prev usage in the ip_input side >> and the addition of >> ip_xmit_attr_t - the transmit attributes passed to ip_output >> ip_recv_attr_t - receive attributes passed up to the ULP (and used >> internally >> in IP) >> A new way to track dependencies when IREs are added and removed >> Using nce_t for ARP information (we do this partially today; mostly for >> the >> IPv4 forwarding paths) >> >> Current prototyping indicates that about 30,000 lines of code can be >> removed as a result of these changes (combined with the ARP/IP merge >> pieces). >> >> The discussion will take place on the existing >> networking-discuss at opensolaris.org list. >> >> >> Proposed Community Sponsors: >> Networking >> >> >> Participants: >> Project lead: >> Erik Nordmark >> >> Other Participants: >> Sowmini Varadhan >> Yunsong (Roamer) Lu >> Nitin Hande >> >> Other interested participants: please speak up. We have some >> prototype code, and contributions of review time, bug fixes, or >> testing are very welcome; there's a lot of code changes here. >> >> ------ >> _______________________________________________ >> networking-discuss mailing list >> networking-discuss at opensolaris.org > _______________________________________________ > networking-discuss mailing list > networking-discuss at opensolaris.org