[rrg] ETR Address Forwarding (EAF) for IPv4 - Bill's critique

Robin Whittle Sun, 21 Dec 2008 17:53:06 -0800

Hi Bill,

Thanks for your 24 November critique of my proposal for forwarding
IPv4 packets based on 30 bits in the existing header:


  Summary of architectural solution space - Ivip still isn't properly
  covered

  http://www.irtf.org/pipermail/rrg/2008-November/000261.html


>> ETR Address Forwarding (EAF) - for IPv4
>> http://tools.ietf.org/html/draft-whittle-ivip4-etr-addr-forw
>
> That's a bit broken. You can't discard the fragment offset and MF
> bit.

Since we are redesigning the Internet for the long-term, I think we
can do whatever we like.

> I'm pretty sure hosts are allowed to pre-fragment the packets
> and then set the DF bit on each fragment.

Can anyone discuss how common this is?

So rather than have the application send smaller packets, the sending
host's stack (or perhaps the application?!) burdens the receiving
host with the task of reassembly, makes the communication session
more vulnerable to packet loss, but does not expect routers to
further fragment the packets.

Is this allowed?  RFC 791 states (p7):

   The originating protocol module of a complete datagram sets the
   more-fragments flag to zero and the fragment offset to zero.

This may have been superseded, but I think it means the sending host
is supposed to send a single packet, not to send fragments.


The same situation could occur at the ITR if the sending host sent a
long, single, DF=0 packet, and a router between it and the ITR
fragmented it, setting DF=1.

Can anyone suggest how common this is?

The text following that quoted above from RFC 791 doesn't say
anything about a router setting fragments to DF=1.


> Hosts routinely fragment UDP and ICMP packets that are too large
> for the wire, even if they don't set the DF bit. These packets need
> to successfully cross the core.

Can anyone discuss how common this is?  What protocols and
applications do this?

I think the concept of fragmented packets, either fragmented in the
sending host, or by the network, is wrong.  It places too much
storage and computational burden on the receiving host and makes the
whole system unreasonably sensitive to the loss of a single packet.

In particular, I think it is completely unreasonable for a host to
send a packet which may be too long for the PMTU, expecting the
network to fragment it and the receiving host to reassemble it, while
refusing to accept any message from the network that the packet was
too big.

The IPv6 designers evidently held the same views.

RFC 1191 was developed in 1990 to provide a much better alternative
to hosts expecting routers to fragment their too-long packets.   By
the time we implement a scalable routing solution, it will be over
two decades after RFC 1191, which works fine (except when networks
unreasonably filter ICMP Packet To Big messages).

IPv6 (1996) doesn't support fragmented packets from hosts, or hosts
sending packets to the network expecting the network to fragment them.

EAF has major advantages over encapsulation, and I can't see a way of
implementing EAF while accepting fragmented packets, or DF=0 packets
above some agreed length, such as 1470 bytes.  The "1470" constant
would be chosen so that all ITRs and ETRs could send such length
bytes without any PMTU problems.  Google servers regularly send DF=0
packets of up to 1470 bytes today:

  http://www.firstpr.com.au/ip/ivip/ipv4-bits/actual-packets.html


Nor can I see a way of supporting DF=0 packets longer than about 1450
or 1470 bytes with encapsulation, without a lot of extra trouble in
my IPTM approach to handling the PMTUD problems of map-encap:

  http://www.firstpr.com.au/ip/ivip/pmtud-frag/

The ITR could use synthetic probe packets to determine the PMTU to
the ETR, and then fragment the too-long DF=0 packet itself, before
encapsulating the fragments and tunneling them to the ETR.  The ETR
would decapsulate the fragments and the receiving host would need to
reassemble them.  However, this is costly and unreliable.

So I can't see how an IPv4 core-edge separation solution - either
using encapsulation or EAF - could support long fragmentable packets
as the Net has to date.

These are the restrictions on Ivip for IPv4:

  Encapsulation, with IPTM:

      DF=1  Efficiently handles any size packet, with any PMTU from
            ~1500 to ~9000 and beyond.  No data loss: the packet is
            either delivered to the ETR and the ITR adjusts upwards
            its lower boundary to the zone of uncertainty about the
            PMTU, or if the packet hits a PMTU limit, the ITR gets
            the PTB message so the ITR learns some thing about the
            PMTU, lowering its upper limit to the zone of
            uncertainty about the PMTU, and generates a PTB to the
            sending host.

      DF=0  Ideally, for simplicity, the ITR should drop packets
            longer than some constant, such as ~1200 bytes.
            (Maybe more like ~1470 bytes?)

            Longer DF=0 packets could be fragmented by the ITR, or
            encapsulated and sent to the ETR (by using synthetic
            probe packets to determine the PMTU beforehand) but this
            is costly, less reliable and allows delinquent
            applications to continue their late-80s style
            me-generation antisocial behavior.

     Fragmented packets . . .

            I don't think I have fully considered the ITR receiving
            these - but at present think it would be undesirable to
            add any complexity to IPTM to handle such packets when
            they are long enough to potentially exceed PMTU limits,
            once encapsulated.  I think this sort of host behavior
            should not be supported.


   EAF - ETR Address Forwarding:

      DF=1  Should work fine for any packet length, since RFC 1191
            PMTUD should work fine with the routers en-route to
            the ETR, and with the standard sending host RFC 1191
            implementation.

      DF=0  ITR does not attempt to send DF=0 packets longer than
            some constant, such as 1470 bytes.  Such packets are
            dropped.  Shorter packets are sent, and the ETR
            reconstructs them as DF=0 packets, so if there is
            a PMTUD limit of less than 1470 bytes in a router
            between the ETR and the destination host, then the
            packet will fragmented there.

     Fragments:

           ITRs will drop them.  Sending hosts should not send
           fragments, or DF=0 packets which are long enough to be
           fragmented en-route to the ITR.


These restrictions are less onerous than the only alternatives I can see:

  1 - Change host stack, apps and Internet service from IPv4 to IPv6.
      (IPv6 doesn't support fragments or fragmentable packets
      either.)

  2 - Greatly complexify IPTM or EAF to handle the few apps which
      send fragments or too long fragmented packets - which would
      involve undesirable complexity and which could not deliver
      the packets with acceptable reliability and costs.

> There's also no point carrying the DF bit if you're not going to
> carry the fragment offset. By definition those packets are always
> DF=1.

In EAF, the DF flag of the original packet is carried so that DF=0
packets shorter than some constant, such as 1470 bytes, can be
reconstructed by the ETR with DF=0.  This enables them to be
fragmented, if necessary, between the ETR and the destination host.


> You could rely on the L2 checksum to maintain packet integrity.
> Many if not most core links are a form of ethernet today anyway,
> and the rest could theoretically be upgraded. But you're going to
> need to find some more bits somewhere; 16 isn't enough. I suppose
> you could steal 4 bits from the header length since there's not a
> lot of point in passing anything in the core with IP options
> attached anyway. Just send an "administratively prohibited"
> message if someone tries to send a packet with options. But that
> still only buys you 20 bits.

The IPv6 header doesn't have a checksum - so EAF should be just as
robust as IPv6.

> ICMP errors are gonna be a bit hairy.

The upgraded routers in the DFZ also need upgraded firmware to
reconstruct the original packet format, just as an ETR does, when the
packet hits a PMTU limit.  This is necessary to generate the correct
PTB to the sending host.  That function is surely defined in
firmware, not fixed in hardware, and since it doesn't happen too
often, the extra steps in reconstructing the packet should not be too
much of a burden on the router.

As far as I know, there shouldn't be any ICMP problems.  If you can
point out potential problems in greater detail, that would be great.

This PMTUD stuff is a real headache.  I am keen to hear of any
critiques, suggestions etc.

  Regards

   - Robin

_______________________________________________
rrg mailing list
[email protected]
https://www.irtf.org/mailman/listinfo/rrg

[rrg] ETR Address Forwarding (EAF) for IPv4 - Bill's critique

Reply via email to