This project has the required support from the Networking and
Security communities.

Please setup a new project space for this with myself as the initial
contact.

Thanks,

    Erik


-- OPENSOLARIS PROJECT PROPOSAL --


Project Name:
        IP Datapath Refactoring

Project Synopsis:
        Simplify the IP Datapaths to make them more understandable and evolvable


Project Purpose:

The IP datapaths are extemely hard to follow both at the micro level
(ip_output_options and ip_wput_ire, and ip_input) and at the macro level
(an outbound packet needing IPsec and ARP resolution goes through an odd
number of steps).

That makes it hard to even fix bugs in that code, let alone getting it
to perform. This has resulted improving performance by creating numerous
fast paths, which are subsets of the full datapaths. This further makes
maintenance of the code a hazardous activity.
The root cause of the complexity is that ip_newroute introduces
asynchrony in the wrong part of the code. Tradionally ARP is done at the
very bottom of the IP output side, but to avoid a separate ARP table
lookup Solaris has an IRE_CACHE entry which is created to include the
ARP information. This is done early in ip_output because the IRE_CACHE
is also used to pick an IP source address in some cases (unconnected UDP
and RAWIP sockets) and we need the IP source address early (before doing
IPsec etc).

We need to move the ARP-related asynchrony to the bottom of IP output to
get the output datapaths be more sane, and it also makes sense to
disassociate source address selection from routing/IRE lookup. (In 1991
the source address selection was simpler than today to the association
made some sense. But with IPMP, IPv6, shared-IP zones etc the source
address selection can't simply be associated with the route.)

A side effect of ip_newroute is that we need to carry various
information from the transport protocols to the point after ip_newroute
is done. We've created various ways to put this information in the
messages so that they can be queued with the packets waiting for ARP
resolution; the ip6i_t is there for this purpose as well as the
ipsec_{in,out}_t which is currently used for more than just IPsec. There
are also ad-hoc places we scribble information (b_prev, etc).

Note that the ip6i_t and M_CTL are also used to carry information
between the transport protocols (for both the input and output path).
But after Fireengine in S10 introduced direct function calls between the
transports and IP we are no longer limited to passing a message using
putnext. Hence we can relatively easily add function call arguments up
and down between the transports and IP and have those function call
arguments carry the meta-data associated with the packet (an example of
meta-data is that on the receive side the transports need the incoming
interface - the ill_t - to handle IP_RECVPKTINFO and IPv6 link-local
addresses correctly.)

Having looked at the dependencies that unravel when ip_newroute is
removed it turns out that the whole concept of IRE_CACHE isn't needed
any more. We can do more efficient caching (and S10 already does for
TCP) by caching the IRE and NCE (neighb or cache entry containing ARP
information) in the conn_t.

This results in the removal of
        ip_newroute*
        IRE_CACHE
        ip6i_t
        M_CTL usage, including ipsec_out_t and ipsec_in_t
        Various b_prev usage in the ip_input side
and the addition of
        ip_xmit_attr_t - the transmit attributes passed to ip_output
        ip_recv_attr_t - receive attributes passed up to the ULP (and used
internally
        in IP)
        A new way to track dependencies when IREs are added and removed
        Using nce_t for ARP information (we do this partially today; mostly for 
the
        IPv4 forwarding paths)

Current prototyping indicates that about 30,000 lines of code can be
removed as a result of these changes (combined with the ARP/IP merge
pieces).

The discussion will take place on the existing
networking-discuss at opensolaris.org list.


Proposed Community Sponsors:
      Networking and Security


Participants:
      Project lead:
          Erik Nordmark

      Other Participants:
          Sowmini Varadhan
        Yunsong (Roamer) Lu
        Nitin Hande

    Other interested participants: please speak up.  We have some
    prototype code, and contributions of review time, bug fixes, or
    testing are very welcome; there's a lot of code changes here.

------
_______________________________________________
networking-discuss mailing list
networking-discuss at opensolaris.org

Reply via email to