Pardon the top-post, but I think the Security community will be interested
in this project too because complexity is the enemy of security, and this
project reduces complexity.

And as a core contributor in Security, I ACK/+1 this project for endorsing
this project.  (The project team can deem this endorsement inappropriate if
they wish.)

Dan

On Wed, Jan 28, 2009 at 03:07:57PM -0800, Erik Nordmark wrote:
> We would like to start the IP datapath refactoring project.
> We are requesting endorsement from the the networking community.
> 
> Thanks,
>     Erik
> 
> 
> -- OPENSOLARIS PROJECT PROPOSAL --
> 
> 
> Project Name:
>       IP Datapath Refactoring
> 
> Project Synopsis:
>       Simplify the IP Datapaths to make them more understandable and evolvable
> 
> 
> Project Purpose:
> 
> The IP datapaths are extemely hard to follow both at the micro level 
> (ip_output_options and ip_wput_ire, and ip_input) and at the macro level 
> (an outbound packet needing IPsec and ARP resolution goes through an odd 
> number of steps).
> 
> That makes it hard to even fix bugs in that code, let alone getting it 
> to perform. This has resulted improving performance by creating numerous 
> fast paths, which are subsets of the full datapaths. This further makes 
> maintenance of the code a hazardous activity.
> The root cause of the complexity is that ip_newroute introduces 
> asynchrony in the wrong part of the code. Tradionally ARP is done at the 
> very bottom of the IP output side, but to avoid a separate ARP table 
> lookup Solaris has an IRE_CACHE entry which is created to include the 
> ARP information. This is done early in ip_output because the IRE_CACHE 
> is also used to pick an IP source address in some cases (unconnected UDP 
> and RAWIP sockets) and we need the IP source address early (before doing 
> IPsec etc).
> 
> We need to move the ARP-related asynchrony to the bottom of IP output to 
> get the output datapaths be more sane, and it also makes sense to 
> disassociate source address selection from routing/IRE lookup. (In 1991 
> the source address selection was simpler than today to the association 
> made some sense. But with IPMP, IPv6, shared-IP zones etc the source 
> address selection can't simply be associated with the route.)
> 
> A side effect of ip_newroute is that we need to carry various 
> information from the transport protocols to the point after ip_newroute 
> is done. We've created various ways to put this information in the 
> messages so that they can be queued with the packets waiting for ARP 
> resolution; the ip6i_t is there for this purpose as well as the 
> ipsec_{in,out}_t which is currently used for more than just IPsec. There 
> are also ad-hoc places we scribble information (b_prev, etc).
> 
> Note that the ip6i_t and M_CTL are also used to carry information 
> between the transport protocols (for both the input and output path). 
> But after Fireengine in S10 introduced direct function calls between the 
> transports and IP we are no longer limited to passing a message using 
> putnext. Hence we can relatively easily add function call arguments up 
> and down between the transports and IP and have those function call 
> arguments carry the meta-data associated with the packet (an example of 
> meta-data is that on the receive side the transports need the incoming 
> interface - the ill_t - to handle IP_RECVPKTINFO and IPv6 link-local 
> addresses correctly.)
> 
> Having looked at the dependencies that unravel when ip_newroute is 
> removed it turns out that the whole concept of IRE_CACHE isn't needed 
> any more. We can do more efficient caching (and S10 already does for 
> TCP) by caching the IRE and NCE (neighb or cache entry containing ARP 
> information) in the conn_t.
> 
> This results in the removal of
>       ip_newroute*
>       IRE_CACHE
>       ip6i_t
>       M_CTL usage, including ipsec_out_t and ipsec_in_t
>       Various b_prev usage in the ip_input side
> and the addition of
>       ip_xmit_attr_t - the transmit attributes passed to ip_output
>       ip_recv_attr_t - receive attributes passed up to the ULP (and used 
> internally
>       in IP)
>       A new way to track dependencies when IREs are added and removed
>       Using nce_t for ARP information (we do this partially today; mostly for 
> the
>       IPv4 forwarding paths)
> 
> Current prototyping indicates that about 30,000 lines of code can be 
> removed as a result of these changes (combined with the ARP/IP merge 
> pieces).
> 
> The discussion will take place on the existing 
> networking-discuss at opensolaris.org list.
> 
> 
> Proposed Community Sponsors:
>      Networking
> 
> 
> Participants:
>      Project lead:
>          Erik Nordmark
> 
>      Other Participants:
>          Sowmini Varadhan
>       Yunsong (Roamer) Lu
>       Nitin Hande
> 
>    Other interested participants: please speak up.  We have some
>    prototype code, and contributions of review time, bug fixes, or
>    testing are very welcome; there's a lot of code changes here.
> 
> ------
> _______________________________________________
> networking-discuss mailing list
> networking-discuss at opensolaris.org

Reply via email to