Hi Bill, Thanks for your 24 November critique of my proposal for forwarding IPv4 packets based on 30 bits in the existing header:
Summary of architectural solution space - Ivip still isn't properly covered http://www.irtf.org/pipermail/rrg/2008-November/000261.html >> ETR Address Forwarding (EAF) - for IPv4 >> http://tools.ietf.org/html/draft-whittle-ivip4-etr-addr-forw > > That's a bit broken. You can't discard the fragment offset and MF > bit. Since we are redesigning the Internet for the long-term, I think we can do whatever we like. > I'm pretty sure hosts are allowed to pre-fragment the packets > and then set the DF bit on each fragment. Can anyone discuss how common this is? So rather than have the application send smaller packets, the sending host's stack (or perhaps the application?!) burdens the receiving host with the task of reassembly, makes the communication session more vulnerable to packet loss, but does not expect routers to further fragment the packets. Is this allowed? RFC 791 states (p7): The originating protocol module of a complete datagram sets the more-fragments flag to zero and the fragment offset to zero. This may have been superseded, but I think it means the sending host is supposed to send a single packet, not to send fragments. The same situation could occur at the ITR if the sending host sent a long, single, DF=0 packet, and a router between it and the ITR fragmented it, setting DF=1. Can anyone suggest how common this is? The text following that quoted above from RFC 791 doesn't say anything about a router setting fragments to DF=1. > Hosts routinely fragment UDP and ICMP packets that are too large > for the wire, even if they don't set the DF bit. These packets need > to successfully cross the core. Can anyone discuss how common this is? What protocols and applications do this? I think the concept of fragmented packets, either fragmented in the sending host, or by the network, is wrong. It places too much storage and computational burden on the receiving host and makes the whole system unreasonably sensitive to the loss of a single packet. In particular, I think it is completely unreasonable for a host to send a packet which may be too long for the PMTU, expecting the network to fragment it and the receiving host to reassemble it, while refusing to accept any message from the network that the packet was too big. The IPv6 designers evidently held the same views. RFC 1191 was developed in 1990 to provide a much better alternative to hosts expecting routers to fragment their too-long packets. By the time we implement a scalable routing solution, it will be over two decades after RFC 1191, which works fine (except when networks unreasonably filter ICMP Packet To Big messages). IPv6 (1996) doesn't support fragmented packets from hosts, or hosts sending packets to the network expecting the network to fragment them. EAF has major advantages over encapsulation, and I can't see a way of implementing EAF while accepting fragmented packets, or DF=0 packets above some agreed length, such as 1470 bytes. The "1470" constant would be chosen so that all ITRs and ETRs could send such length bytes without any PMTU problems. Google servers regularly send DF=0 packets of up to 1470 bytes today: http://www.firstpr.com.au/ip/ivip/ipv4-bits/actual-packets.html Nor can I see a way of supporting DF=0 packets longer than about 1450 or 1470 bytes with encapsulation, without a lot of extra trouble in my IPTM approach to handling the PMTUD problems of map-encap: http://www.firstpr.com.au/ip/ivip/pmtud-frag/ The ITR could use synthetic probe packets to determine the PMTU to the ETR, and then fragment the too-long DF=0 packet itself, before encapsulating the fragments and tunneling them to the ETR. The ETR would decapsulate the fragments and the receiving host would need to reassemble them. However, this is costly and unreliable. So I can't see how an IPv4 core-edge separation solution - either using encapsulation or EAF - could support long fragmentable packets as the Net has to date. These are the restrictions on Ivip for IPv4: Encapsulation, with IPTM: DF=1 Efficiently handles any size packet, with any PMTU from ~1500 to ~9000 and beyond. No data loss: the packet is either delivered to the ETR and the ITR adjusts upwards its lower boundary to the zone of uncertainty about the PMTU, or if the packet hits a PMTU limit, the ITR gets the PTB message so the ITR learns some thing about the PMTU, lowering its upper limit to the zone of uncertainty about the PMTU, and generates a PTB to the sending host. DF=0 Ideally, for simplicity, the ITR should drop packets longer than some constant, such as ~1200 bytes. (Maybe more like ~1470 bytes?) Longer DF=0 packets could be fragmented by the ITR, or encapsulated and sent to the ETR (by using synthetic probe packets to determine the PMTU beforehand) but this is costly, less reliable and allows delinquent applications to continue their late-80s style me-generation antisocial behavior. Fragmented packets . . . I don't think I have fully considered the ITR receiving these - but at present think it would be undesirable to add any complexity to IPTM to handle such packets when they are long enough to potentially exceed PMTU limits, once encapsulated. I think this sort of host behavior should not be supported. EAF - ETR Address Forwarding: DF=1 Should work fine for any packet length, since RFC 1191 PMTUD should work fine with the routers en-route to the ETR, and with the standard sending host RFC 1191 implementation. DF=0 ITR does not attempt to send DF=0 packets longer than some constant, such as 1470 bytes. Such packets are dropped. Shorter packets are sent, and the ETR reconstructs them as DF=0 packets, so if there is a PMTUD limit of less than 1470 bytes in a router between the ETR and the destination host, then the packet will fragmented there. Fragments: ITRs will drop them. Sending hosts should not send fragments, or DF=0 packets which are long enough to be fragmented en-route to the ITR. These restrictions are less onerous than the only alternatives I can see: 1 - Change host stack, apps and Internet service from IPv4 to IPv6. (IPv6 doesn't support fragments or fragmentable packets either.) 2 - Greatly complexify IPTM or EAF to handle the few apps which send fragments or too long fragmented packets - which would involve undesirable complexity and which could not deliver the packets with acceptable reliability and costs. > There's also no point carrying the DF bit if you're not going to > carry the fragment offset. By definition those packets are always > DF=1. In EAF, the DF flag of the original packet is carried so that DF=0 packets shorter than some constant, such as 1470 bytes, can be reconstructed by the ETR with DF=0. This enables them to be fragmented, if necessary, between the ETR and the destination host. > You could rely on the L2 checksum to maintain packet integrity. > Many if not most core links are a form of ethernet today anyway, > and the rest could theoretically be upgraded. But you're going to > need to find some more bits somewhere; 16 isn't enough. I suppose > you could steal 4 bits from the header length since there's not a > lot of point in passing anything in the core with IP options > attached anyway. Just send an "administratively prohibited" > message if someone tries to send a packet with options. But that > still only buys you 20 bits. The IPv6 header doesn't have a checksum - so EAF should be just as robust as IPv6. > ICMP errors are gonna be a bit hairy. The upgraded routers in the DFZ also need upgraded firmware to reconstruct the original packet format, just as an ETR does, when the packet hits a PMTU limit. This is necessary to generate the correct PTB to the sending host. That function is surely defined in firmware, not fixed in hardware, and since it doesn't happen too often, the extra steps in reconstructing the packet should not be too much of a burden on the router. As far as I know, there shouldn't be any ICMP problems. If you can point out potential problems in greater detail, that would be great. This PMTUD stuff is a real headache. I am keen to hear of any critiques, suggestions etc. Regards - Robin _______________________________________________ rrg mailing list [email protected] https://www.irtf.org/mailman/listinfo/rrg
