Hi Christian, Thanks for your message of 12 July:
Six/One Router Design Clarifications http://psg.com/lists/rrg/2008/msg01801.html and the paper with the revised Six/One Router design: http://users.piuha.net/chvogt/pub/2008/vogt-2008-six-one-router-design.pdf Below are some notes and questions on the paper. I will write more about this new design and about your replies to my 6 questions once I have read your reply to my notes and questions. - Robin Short version: I have several queries or perceived problems with the new version of Six/One Router, including: I have concerns regarding scaling with the number of Mapping Preference Messages to be sent when a multihoming failure occurs in a busy network. When a multihoming failure occurs, I am unsure how the Six/One routers know which remote networks to send Mapping Preference Messages to. Since these routers are stateless, how do they know which networks have recently been sending packets? How does the Six/One router know which remote Six/One router in another network to send the Mapping Preference Message to? How does one Six/One router find the address of another? I think the DNS AAAA requirements are unworkable in the case of a hosting company with thousands of customers requiring those customers to update the AAAA records in their DNS for the web server FQDNs for servers at the hosting company, every time the hosting company gets a new transit prefix (every time it gets a new provider). Does the bit 71 to 64 adjustment technique really satisfy all the crypto protocols which look at the header? Do they all use a simple 16 bit checksum? This technique seems to be the only way of making Six/One Router work with crypto protocols, but it seems to be incompatible with prefixes longer than /48. Yet you mention the ability to use prefixes as long as /128 in Mapping Preference Messages. Page 1 ------ Note 1: The overhead of map-encap in IPv6 can be pretty frightening when considering 50 VoIP packets a second, each carrying 20 bytes. This is one of the big motivating factors for trying to find a translation scheme such as Six/One Router for IPv6 rather than map-encap. A standard IPv6 VoIP packet, not counting Ethernet headers (18 bytes) is: 40 IPv6 8 UDP 12 RTP 20 VoIP data 60/20 = 3:1 header to data ratio Data rate 50 x 80 x 8 = 32,000 Bps Ethernet rate 50 x 98 x 8 = 39,200 Bps With Ivip: IP-in-IP encapsulation: 40 IPv6 outer 40 IPv6 8 UDP 12 RTP 20 VoIP data 100/20 = 5:1 header to data ratio Data rate 50 x 120 x 8 = 48,000 Bps Ethernet rate 50 x 138 x 8 = 55,200 Bps With LISP: IPv6, then UDP, then LISP headers: 40 IPv6 outer 8 UDP 8 LISP 40 IPv6 8 UDP 12 RTP 20 VoIP data 116/20 = 5.8:1 header to data ratio Data rate 50 x 136 x 8 = 54,400 Bps Ethernet rate 50 x 154 x 8 = 61,600 Bps Your note about overhead being 400% is not unreasonable, but it might be good to give a specific example, such as the LISP one. This expands an originally 8:1 compressed datastream almost back to the original 64,000 Bps rate. Page 2 ------ Para 1, line 5: As another disadvantage, proxies may prolong the path of packets because they are usually off the shortest path. "Proxies" in this context means LISP Proxy Tunnel Routers or Ivip OITRDs (Open ITRs in the DFZ.) I wouldn't say they are "usually" off the shortest path, since this is not necessarily the case. If they are at major Internet exchange points, they will often be on the shortest path. And finally, yet importantly, the proxy concept lacks convincing deployment incentives since the cost for deploying and operating the new infrastructure must be borne by providers that obtain little benefit from it. This may be the case for the LISP vision, but not for Ivip, where the OITRDs are to be deployed by the organisations who rent Ivip mapped address space to end-user networks. A summary of the LISP PTR debate and a pointer to my OITRD business case message are in a recent message: Business incentives for LISP PTRs and Ivip OITRDs http://psg.com/lists/rrg/2008/msg02021.html Col 2 para 5 Practically, however, Six/One Router will likely be used only with transit addresses from IP version 6: The one-to-one mapping between edge addresses and the transit addresses from a given provider consumes a high number of transit addresses, which will prospectively be unavailable in IP version 4. Thanks for clarifying this: Six/One Router is not a contender for the IPv4 scalable routing solution, unless using IPv6 as its transit network. Page 3 ------ 2.2 Address Rewriting: mapping record I think it would be helpful if you provided an example mapping record, showing how the edge prefix is specified, and how the one or more transit prefixes are specified, with any TE information and anything concerning the Six/One routers deciding where to send packets in multihoming failure events. Maybe Six/One Router doesn't have such things, which are used in LISP, APT and TRRP. If so, it might be good to state this explicitly. My best guess is that a mapping record looks like this: 128 bits Edge prefix address. 7 bits Edge prefix length. 8 bits Number of transit prefixes. 128 bits Transit prefix address 0. 128 bits Transit prefix address 1. etc. etc. All the mapping systems you list are "slow" - in that they do not attempt to give the end-user real-time control of the behaviour of the ITRs, or in this case Six/One routers. This means your system has to figure out for itself how to cope with multihoming failures. (Ivip is different. The fast push mapping system enables end-user networks to control the mapping in real-time, so they do their own failure detection and make their own decisions, completely removing these things from the map-encap system.) Page 4 ------ Figure 3 The first question which came to my mind with this is whether Six/One Router could be implemented with a single router, with two links - one to each provider. Later in the paper it becomes clear that you rely on this two router arrangement, together with the internal routing system, to determine which link packets go out on - and therefore which transit address they are sent from. If you do rely on two routers, how is this to work if they are both in the same room, and effectively in the same part of the local network? How would you respond to end-users who didn't want to buy a router and locate it somewhere different in the network for every upstream link they used for multihoming? 2.3.2 Traffic Redirection This section concerns Traffic Engineering (TE) and Multihoming Service Restoration (MSR). Outgoing TE for load balancing, and outgoing MSR (choice of outgoing link due to failure of another link) is achieved by the internal routing system somehow adapting to the TE needs and the link failure conditions to direct packets to one or the other of the Six/One routers. Incoming TE and MSR requires affecting the behaviour of Six/One routers in however many provider networks as are currently sending packets to this edge network. I found this sentence (middle of left column) impossible to fully parse: A Mapping Preferences message can be returned to any source transit address from a packet received from a remote edge network. The "froms" were my sticking points. Maybe: A Mapping Preferences message can be returned to any source transit address in response to a special packet received from a remote edge network. I am being pernickety, not least because I think that your paper is generally written with extreme clarity and with excellent expression sensibilities. You later explain the three packets of the Mapping Preference Message exchange. I think this initial explanation would be better if it was expanded a little. Mapping Preferences messages list combinations of address mappings and preference values. Similar to mapping records, the address mappings in Mapping Preferences messages are pairs of edge and transit address prefixes. But unlike mapping records, address mappings can be specified with variable granularity by scaling the length of their prefixes: Since edge addresses map one-to-one onto the transit addresses from any particular provider, the edge and transit address prefixes in an address mapping have the same length. Scaling this length facilitates preference feedback at granularities ranging from an edge network’s entire edge address space – in which case the edge address prefix is a complete routing prefix – to a single edge address. Address mappings are allowed to have overlapping edge address prefixes. To exclude ambiguities, those with longer edge address prefixes take precedence over those with shorter edge address prefixes. Some examples would be really helpful. Do you really want to have Six/One routers fussing over individual destination edge IPv6 addresses when they decide which transit address to send the packet too? 128 bits is a lot of bits to chew through with some CPU- and DRAM-intensive algorithm on a packet-by-packet basis. My plan with IPv6 Ivip is to limit the granularity of the mapping system and the ITR functionality to /64. That is bad enough, but your description above indicates that you want all Six/One routers to be engineered to potentially match all 128 bits of a destination address of some packet arriving from its local network, to the longest matching prefix in a potentially lengthy Mapping Preference Message. Presumably a Mapping Preference Message takes precedence over whatever mapping information a Six/One Router may have received. I think you need some kind of time-out on these Mapping Preference Messages. Otherwise, the Six/One router would be required to honour it forever, no matter what mapping information arrived. Say some network sent out a spurious Mapping Preference Message. How could the recipient Six/One router later know this was spurious, or that some corrective Mapping Preference Message was not received? That Mapping Preference Message could cause the remote Six/One router be sending packets to some other networks, so the Six/One router which sent the now unwanted Mapping Preference Message wouldn't know there was a problem. I think this section needs a clear explanation of multihoming failure detection, decision-making and of how all the sending networks are told, presumably via Mapping Preference Messages, to change which transit address they use. Let's say in Figure 3, the mapping and any currently active Mapping Preference Messages have the effect of causing all correspondent networks (all upgraded networks, since non-upgraded networks do not participate in multihoming) to send all packets to transit address 2000::/48 - via Provider 2. Here are some fault conditions: A - The link to Provider 2 fails. B - The router with the link to Provider 2 fails. C - Provider 2 itself is cut off from the Net, or is has severe congestion. Where are these conditions detected? Where is a decision made about which of potentially multiple other links and transit addresses should be used? How is that decision turned into the only possible response: sending Mapping Preference Message to the Six/One routers in every upgraded network which is currently sending packets to this network? In all cases, the messages need to be sent out of a link other than the one which has failed. In case A, there needs to be a way the two or more Six/One routers in the local network can communicate, make a decision and take the chosen action. In case B, the surviving one or more Six/One routers need to recognise the one linking to Provider 2 is dead, and likewise make a decision and take action. How would the Six/One routers decide that condition C had occurred? Assuming a decision was made, how would the routers know which upgraded networks to send Mapping Preference Messages to? They can't reasonably be expected to keep a record of recent traffic. If they were expected to, then how long would they need to keep such records? What about really busy sites which are receiving packets from tens of thousands of upgraded networks? That would require sending Mapping Preference Messages to every such network. When you send a Mapping Preference Message from Provider 1, using 1000:/48 you don't address it to a particular upgraded network, but to the Six/One router which was sending the packets to this edge network in recent times. How does the Six/One router know which remote Six/One router to send the message to? How does one Six/One router find the address of another? In the following diagram, there are two upgraded sites, both with 2-way multihoming. They use four separate providers. X and Y are /48 edge prefixes - allocated permanently to the edge network, and not in the BGP global routing table. So X and Y are prefixes in the the new, scalable form of space which is Provider Independent, but perhaps best not referred to as "PI" space, since this already has a specific meaning. A, B, C and D are transit prefixes, all /48, which are PA (part of some shorter prefix allocated PI to each provider) and which therefore appear in the BGP global routing table. There are four Six/One Routers: SOR1, SOR2, SOR3 and SOR4 Edge-1 | Provider 1 DFZ Provider 3 | Edge-2 | | SOR1----------A[---------]C-----------SOR3 | \ / | X[ | / | ]Y | / \ | SOR2----------B[---------]D-----------SOR4 | | | Provider 2 Provider 4 | Edge-1 is an upgraded network which in this example is the only one sending packets to Edge-2. There could be tens or hundreds of thousands of such networks sending packets to one or more hosts in Edge-2 when Edge-2 has a failure. In this example, Edge-1 is sending all its packets to Edge-2 from its A transit address, to the C transit address. Let's say SOR3 fails, or its link to Provider 3 fails, or Provider 3 fails or becomes very congested. Somehow, SOR4 has to send a Mapping Preference Message to SOR1. But how does it know that one or more hosts in Edge-1 have been sending packets to one or more hosts in Edge-2? The packets didn't go through SOR4. Even if SOR3 kept records of traffic - which I think is unworkable - lets say SOR3 is dead. I don't see how you can base Multihoming Service Recovery on any active decisions and messages emanating from the Edge-2. It might be OK for some failure modes, but not for all. Ivip involves the the end-user (whoever operates Edge-2 in this example) setting up their own multihoming monitoring and decision making system. They could do this themselves, and they could base it inside or outside their network, but the most likely arrangement is for them to hire the services of some company which does this sort of thing. That company has a 100% robust distributed global network of servers, and it constantly monitors connectivity to whatever ETRs Edge-2 relies on, and through them connectivity to whatever internal routers etc. need to be monitored. This way, the company can detect any failure, entirely from outside Edge-2's network, and then change the mapping system for Edge-2's micronet(s) accordingly. In the Ivip system, the monitoring company doesn't need to know what networks have been sending packets to Edge-2. The mapping change affects changes the behaviour of all the world's ITRs which are handling these packets, in a few seconds. Since multihoming monitoring and decision making is completely outside the Ivip system, there can be innovation, any number of techniques used, all sorts of customised arrangements involving probe packets, secure arrangements etc. to any depth in the networks being monitored, from multiple vantage points in the outside world. With the other map-encap systems, the ITRs are expected to do the failure detection and decision making, based on previously supplied options in the mapping data. They need to do this individually. It all needs to be specified as part of the map-encap system, and so can't be upgraded easily, or customised at all. Six/One Router is similar to these non-Ivip map-encap systems: you monolithically build in multihoming failure detection, decision-making and the actions required to change the path of packets. The only way any network can change the path of packets is by sending Mapping Preference Messages to all the particular Six/One routers which are currently sending packets which need to be redirected. It is no good sending the Mapping Preference Message to SOR2, since it is not sending those packets, and since you have no way of SOR2 communicating the contents of such messages to SOR1 or however many other Six/One routers there are in Edge-1. Of course, if Edge-2 was getting packets sent to SOR3 from both SOR1 and SOR2, then SOR4 would need to send a separate Mapping Preference Message to both SOR1 and SOR2. But again, how can SOR4 know where the packets have been coming from? It is not good enough to rely on SOR1 getting destination unreachable messages when the failure occurs. Maybe SOR3 simply dies in a way that the link stays open, but it can't communicate with the rest of Edge-2. The non-Ivip map-encap systems tend to rely on destination unreachable messages for their ITRs to figure out something is wrong. Ivip doesn't rely on such things. A properly engineered multihoming monitoring system will securely probe and get explicit responses from routers, internal nodes or whatever is desired to show that the links, routers etc. are working as expected. When these positive acknowledgements fail to arrive for more than a specified time, the multihoming monitoring system decides there has been a failure. Also, the multihoming monitoring system is in a much better position than the ITRs in the other map-encap systems to keep probing the network to detect when the failure has been resolved. Then it can change the mapping back to what it was. Arbitrarily complex detection and decision techniques can be used with Ivip, since it is nothing to do with Ivip itself. The other map-encap systems, and your own Six/One Router, are monolithic and have to specify every technique for multihoming monitoring, decision making, recovering to normal operation. Also, all such functionality needs to be built into all ITRs and ETRs, or into all Six/One routers and the local routing systems which largely control their operation. Ivip doesn't rely on anything being done by the network in question. A properly engineered multihoming monitoring system will be entirely independent of that network, and will securely change that network's micronet's mapping in a way the operators desire. I have further questions below about the Mapping Preference Message system. Page 4 continued ---------------- 2.4 Backwards Compatibility You make the hosts in the edge network reachable from non-upgraded networks (AKA "legacy" networks - but I dislike this term) via their one or more transit addresses. However, all such traffic is not subject to any multihoming service restoration system. That only works for packets from upgraded networks. A primary purpose of adopting a map-encap system, or Six/One router - whatever it is, with its new type of address space which will solve the routing scaling problem - is to have multihomable, portable, space, ideally space which can be used for incoming TE as well. Yet with Six/One Router the multihoming and TE functions only work for packets coming from upgraded networks. This means there is very limited motivation for anyone to adopt Six/One Router space initially - a situation which is likely to persist indefinitely unless there are other motivating factors sufficient to cause widespread adoption. In contrast, LISP with PTRs and Ivip with its OITRDs provides full multihoming and incoming TE for traffic from non-upgraded networks - so the impetus to adopt these is high, right from the start, even before a single other network has adopted it. Your backwards compatibility system has to cope with various scenarios. In one scenario - the correspondent host in the non-upgraded network initiating the communication, including sending a single packet - your system relies entirely on the correspondent host getting the one or more transit addresses for the upgraded network from the DNS AAAA record, and then finding and using one of these, after potentially trying one (or more?) edge network addresses which are not routable in the global BGP system. Each host in the upgraded edge network has no idea of what its address would be in the one or more transit prefixes that network is accessible by. (To do so would involve an impractical involvement of hosts with the local routing system, how the network organises its links to providers, which of those links is currently active and preferred etc.) So a host in Edge-2 above only knows its address in prefix Y. In the scenario in which the communication is initiated by the host in Edge-2, that host can send packets to some host in Net-13, which hasn't been upgraded to Six/One Router yet. It can tell the host in Net-13 its address in Y, but that will not enable the host in Net-13 to send packets to it. In this scenario, the only way a host in Net-13 could send packets to the host in Edge-2 is by using the source address in the packet it received from the host in Edge-2. This address would be a transit address, in C or D. There are other scenarios: There is no obvious way some other system (such as a P2P management system) could tell the correspondent host in Net-13 an address on which it could send a packet to the host in Edge-2. Edge-2 could tell that management system its Y address, and perhaps the management system could be specially crafted to observe the source address from which packets from the Edge-2 host arrived. But this is a flaky and irregular way for an external system to figure out what address to tell the correspondent host an IP address to use to send packets to the Edge-2 host. Host's don't know - and shouldn't have to know - whether or not they are in a Six/One Router edge network. They shouldn't have to know whether their own address, or the address of other hosts, are "edge" addresses or "transit" addresses. Nor is it reasonable to expect any separate management system to make these distinctions. Ivip and LISP with PTRs lets the hosts carry on as usual. All hosts can send packets to each other on their own addresses, no matter whether one or both hosts are on Ivip/LISP-managed addresses. Page 5 ------ 2.4.1 Destination Address Selection I found dot point 1 hard to understand at first. Perhaps it would be better to write: To organise the addressing system so that all edge addresses are from a clearly identified prefix, such as 1::/1, 11::/2 or 111::/3. I found the rest of the left column pretty hard to understand. This was very confusing at first, but it made more sense with later explanation: In case of a tie, a candidate destination address is chosen that has the longest prefix in common with the source address. I couldn't imagine what the purpose of this was. The following sentence could be rewritten to change "choose" into something more informative. I thought it referred to some algorithm in a host choosing something. In fact it refers to the designers of the entire system choosing to make a whole section of the IPv6 address space exclusively for edge networks - and not to have edge network addresses outside this section. A means to take maximum advantage of the longest-prefix match for destination address selection in Six/One Router would be to choose the highest-order bit in addresses so that it distinguishes edge from transit addresses. The rest of this column could probably be rewritten to be less confusing. I won't try to detail my difficulties here, but can talk about them by phone or write more offlist. The top (continuing) paragraph on the right column is pretty confusing to me. You propose some kind of overall prefix to contain all edge address space, but admit it is not a reliable approach. I don't fully understand the previous explanation of how it would work with your proposed address selection mechanism. To what extent are you proposing a change to all hosts in the way they select an address from multiple addresses in an AAAA DNS record? I foresee major problems with this reliance on DNS to enable hosts in non-upgraded networks to send packets to hosts in upgraded networks. Firstly, there are plenty of situations where a host needs to be told an IP address to use, by some system other than a FQDN and a DNS lookup. Even ignoring those instances, lets consider this example which uses DNS. Edge-2 is a hosting company. It has a customer xyz.com, who run their own nameservers. The web server www.xyz.com is on a host in the Edge-2 network. xyz.com needs to put an AAAA record in their DNS so hosts all over the world can send packets to their web server. This is no problem with Ivip, LISP etc. However I see serious and probably insurmountable problems with Six/One Router. You need xyz.com to have not just the Y prefix edge address of the server for www.xyz.com in their AAAA record, but every address on which that host would appear on each of Edge-2's transit prefixes: C and D. Lets say Edge-2 has 10,000 such customers. I know it is traditional for many hosting companies to do the DNS for their customers' domains, but this doesn't work for all customers, so I will assume all 10,000 customers run their own DNS, or have someone else run it. Whenever Edge changes one or more of its providers, it gains or loses a transit prefix such as C or D. Each time Edge-2 does this, it needs to get all its 10,000 customers to change the AAAA record for their web server in their DNSes! This is unworkable, and looks to me like a showstopper for Six/One Router. I am unlikely to accept arguments about why this is not a realistic example. For instance some may folks argue that such a hosting company is not the sort of edge network which would want, or should have, Six/One Router managed address space. We need the new scalable type of address space to be ubiquitously adopted. It is not good enough to get 50% of end-user networks using it, with the other 50% (however defined, such as by the total number of addresses used, the number of prefixes they use etc.) not using it, since at most that will only cut the scaling problem by a factor of 2. In order to provide the millions (some insist billions) of end-user networks with portable, multihomable address space, we need the new type of address space to be highly attractive to *all* end-user networks. This means all networks, except those of providers - all networks of any organisation except of those organisations who sell connectivity. So its not good enough to say hosting companies won't be using the new kind of address space. Nor is it good enough to say that the very large hosting companies, in which the above problem is most acute, wouldn't need to use the new kind of space. (Arguably, there would be few enough of these that we could cope with them using conventionally BGP managed space in perpetuity.) We need all hosting companies, large and small, to want to use the new kind of space. If there is a perception that the new kind of space is not suitable for the largest hosting companies, then every start-up company will insist on using conventional space, because they are sure they are going to grow into a large hosting company real soon. It is bad enough Edge-2 having to change all its DNS records every time it gets a new transit prefix, but I think it is unworkable for Six/One Router to require such DNS changes in organisations which are separate from Edge-2, but in some way involved with the hosts in Edge-2's network. At the end of 2.4.1, I have a rough idea of what you are suggesting about using existing NAT traversal approaches to cope with the problems inherent in the current design for Six/One Router. I think a fuller explanation, with examples, would be helpful. Pages 5 and 6 ------------- 2.4.2 Source Address Consistency I found this sentence confusing: The mode in which a packet is exchanged can consequently be determined based on the type of remote address as long as the packet is within the boundary of an edge network: The packet is exchanged in Bilateral mode if the remote address is an edge address. "within the boundary of an edge network" sounded like the physical location of a packet at some point in its travels. In fact, I think it means whether its source address can be determined as being within an edge network. I had to re-read this whole section carefully. I don't think I fully understand it, but I surmise: 1 - This section only concerns communications between hosts which are both in upgraded networks. So the sentence: With this invariant, every packet exchange between two hosts must be in either of two modes: * Bilateral mode — both hosts in the packet exchange use edge addresses to reach each other. * Unilateral mode — both hosts in the packet exchange use a transit address to reach each other. needs to be understood as not referring to the Unilateral arrangement used for communicating with a host in an non-upgraded network (Figure 4), but only to the situation of both hosts being in upgraded networks: Figure 2 (Bilateral) and Figure 5 (Unilateral). BTW, I think these terms are confusing, since Figure 3 is perfectly symmetrical in terms of both sides doing the same thing. The Uni/Bi-lateral concept refers to whether incoming packets from the provider are rewritten or not. Outgoing packets are always rewritten. 2 - Each Six/One router can't figure whether or not the source address of a packet arriving on the provider link is an edge address or not. (But on the page before, was a plan to arrange the addressing system so edge addresses always came from a defined short prefix, and so could be identified as such by some simple algorithm. I don't understand how these two concepts relate to each other.) 3 - In order to solve this, there are two approaches. a - The preferred approach is to use a bit in the IPv6 header to indicate this. A set state for this "Bilateral/Unilateral" bit means that this is a "Bilateral" exchange, so the source address of the incoming packet should be rewritten (always to an edge address?) before the packet is sent to the local destination host. b - (Note 5) An alternative to the new header bit is to use an option header, but this raises all sorts of problems with DFZ routers not handling such packets efficiently, and with the packet becoming longer, and different. This would be inefficient, raise PMTUD problems, and generally nullify Six/One Router's greatest attraction over map-encap schemes: that the packets do not get any longer. I think it would be good to have more discussion of this new bit being used in the IPv6 header. Without it, Six/One Router is not going to work at all - since the option header alternative means it probably can't work with current DFZ routers, and would lose most of its attractiveness over map-encap. The most likely place for such a bit is the Flow Label, which according to the Wikipedia and: http://www.tcpipguide.com/free/t_IPv6DatagramMainHeaderFormat.htm is not used at present. Pages 6 and 7 ------------- 2.4.3 Multi-homing Support I had to re-read this section too. The Unilateral mode referred to here is the Unilateral mode between hosts in two upgraded networks: Figure 5 - NOT the Unilateral mode between a host in an upgraded network and a host in a non-upgraded network (Fig 4.) I got completely lost here: Six/One Router achieves this by making the providers of a multi-homed edge network responsible for connectivity to disjoint and complementary subsets of the transit address space, while having all of them provide connectivity to the complete remote edge address space. Providers back up each other’s routes to remote transit addresses. and the paragraphs which follow. Even without understanding the foregoing, I have some understanding of: Finally, to enable fast re-establishment of packet exchanges in Unilateral mode after a provider failure, Six/One Router must meet the following two requirements: * Backup routes for the defunct ones must be available quickly in the edge-network-internal routing system. * Backup Six/One routers must be aware of the fail-over so that they can start accepting incoming packets that used to be bound to the failed provider. To achieve the first, providers offer backup service for the routes to remote transit addresses that other providers are responsible for. While I don't understand this, does it mean something do to with provider 1 advertising in the DFZ a prefix which was until recently advertised by provider 2? That doesn't sound desirable or practical, but it is the most sense I could make out of the section to this point. Or does it just mean that if the provider 2 link (Figure 6) fails, that provider 1 will accept (and forward to the DFZ) packets from the Six/One router on the left, which have a source address in the 2000::/48 prefix? The second last paragraph of this section does explain something about multihoming monitoring decisions and how those decisions lead to the desired flow of packets from another Six/One router, link and transit address. However, this is just for the scenario in which the link to provider 2 is dead, and the Six/One router on the right recognises it. What if that router is dead? Or what if provider 2 is disconnected from the Net, or is so congested as to make this link incapable of handling the traffic? Page 7 ------ 2.4.4 Avoiding Adverse Effects of Unilateral Mode on Transport Protocols and Applications Again, I think this concerns Unilateral mode for hosts both in upgraded networks, not Unilateral mode with one host in a non-upgraded network. Applications are affected if they reference addresses in packet payloads because unilateral address rewriting in the IP header of a packet then leads to address inconsistencies between the IP header and the packet payload. Six/One Router relies on application functionality for network address translator traversal [Ro2003, Ro2007] to avoid such address inconsistencies. These references are to: STUN http://tools.ietf.org/html/rfc3489 ICE http://tools.ietf.org/html/draft-ietf-mmusic-ice-19 http://www.rfc-editor.org/queue.html#draft-ietf-mmusic-ice (Last updated 2007-10-29) It would be good to discuss how STUN and ICE, which are meant to work with conventional NAT, would work with Six/One Router. The following text seems to indicate a showstopper: Applications that reference addresses in packet payloads depend on this functionality already today, due to the existing deployment of network address translators. It is hence safe to assume that those applications, which use addresses in packet payloads, also support network address translator traversal. This seems to assume the hosts in the upgraded networks are clients. However, the new address space for the scalable routing solution absolutely needs to support servers. I don't see how servers can work if there is any requirement for hosts in the new space to use STUN or ICE in any way. Page 7 ------ Para 3 and beyond in right column: Header checksums The system absolutely has to work with existing cryptographic techniques. An alternative to re-computing Internet checksums in Six/One Router is to choose the mapping between edge and transit addresses such that the checksum does not change during address rewriting. This technique is applicable to all packets, even if their payloads are integrity-protected or encrypted. It is also efficient because the checksum does not have to be localized within a packet. Mapping edge and transit addresses such that the checksum does not change during address rewriting is practical where the routing prefixes of IPv6 edge and transit addresses are at most 48 bits long: The remaining 16 or more bits in the standard 64-bit subnet prefix of an IPv6 address can then be used to compensate for the checksum difference that rewriting of the routing prefixes alone would create. There is a potentially serious problem here: Since you must support crypto on all packets, and since this technique (if it works) is the only one available, it restricts the granularity of the use of Six/One Router to prefixes of 48 or less. That doesn't seem to be a big problem to me, but in other parts of the paper you discuss granularity down to single IP addresses (/128) and using finer granularity prefixes (presumably longer than /48) in Mapping Preference Messages, I think to spread load over multiple incoming links. Do all these crypto arrangements involving headers simply treat the source and destination addresses as checksums modulo 16 bits? I haven't checked this, but it doesn't sound very secure. It would be good for you to list the crypto arrangements you have investigated, and point out why you are sure they would be unaffected by the technique you propose: More specifically, the difference (delta) between the checksum of an edge address routing prefix and the checksum of a corresponding transit address routing prefix is the value by which the lower 16 bits in the subnet prefix must be adjusted during address rewriting to avoid changing the checksum of the packet. 16 bits are sufficient for this because the checksum, too, is 16 bits long. And since the routing prefixes are static, so is (delta). So does this mean something like the following:? Host A has the edge address 4000::1. It is in a network using a transit prefix 6000::/48 So when a packet sent from this host has its source address rewritten, it would (without the above arrangements to keep the crypto protocols happy) be rewritten to 6000::1. Since this bumps up the (assumed) 16 bit checksum by 2000 Hex, the above workaround actually rewrites the address with a new value in the bits 65 to 71 positions, to subtract 2000 from the checksum. (I am using ordinary binary order here.) So the addresses are: 1 2 7 7 6 6 7 2 1 5 4 0 Edge address 4000 0000 0000 0000 0000 0000 0000 0001 Ordinarily rewritten address 6000 0000 0000 0000 0000 0000 0000 0001 Rewrite with crypto- friendly workaround: 6000 0000 0000 E000 0000 0000 0000 0001 This clearly needs to be implemented for all Six/One Router rewrites. I am not sure how it could work with edge and therefore transit prefixes longer than /48. It messes up the conceptually clean idea of simply translating a linear range of addresses into some other linear range. This business of always rewriting bits 71 to 65 of the destination and source addresses in order to adjust the header checksum to keep crypto protocols happy . . . this algorithm needs to be applied to all the transit addresses provided in AAAA records. Unless you can show that all relevant crypto protocols would be happy with this workaround, I think the header checksum problem is a showstopper. Page 8 ------ Delays inherent in relying on mapping information In the first para in 2.5.1: Six/One Router relies on the trustworthiness of the mapping system to ensure that remote edge and transit addresses are rewritten correctly. Six/One routers can rewrite the destination edge address of a packet that leaves their edge network only after retrieving the corresponding mapping record from the mapping system. And they can rewrite the source transit address in a packet that enters their edge network only after retrieving the corresponding mapping record from the mapping system. Assuming the two hosts are in two upgraded networks, the Six/One routers are using Bilateral rewriting, and these routers have no cached mapping information for the relevant prefixes, then this means that before a packet sent by host A will reach host B, the following has to occur in sequence: 1 - A's Six/One router needs to request the mapping information for the edge prefix in which the destination address (B's edge address). (Request type F below.) 2 - That request needs to be forwarded to a query server of some kind which can respond authoritatively and in a manner which the router can authenticate as being secure. 3 - The response needs to be forwarded back to A's Six/One router. 4 - That router rewrites the address and forwards the packet towards the DFZ. 5 - When it arrives at the border of B's network, B's Six/One router somehow determines how to request the mapping it needs to rewrite this packet's destination address to be the desired outcome: B's edge address - and to rewrite the source address to the desired outcome, A's edge address. The first part is easy. B's Six/One router knows the transit prefix the packet was received within and the edge prefix, so it can reverse the rewrite done by A's Six/One router, including reversing the alteration of bits 71 to 65 (crypto header checksum workaround). However, to rewrite the source address correctly, it needs mapping information. How does B's Six/One router figure out the edge prefix of A's network? All it has is some source address, which was rewritten to be in one of the potentially numerous transit prefixes used by A's network. So I think the mapping query server has to handle two types of query: F - Given an edge address, return the length and base address of the the edge prefix of that network, as well as the start addresses of the one or more transit prefixes used by that network. R - Given a transit address, return the starting address of that transit prefix, and the edge prefix (starting address and length) for the network which is using this transit prefix. (Maybe also return this network's other transit prefixes?) For the moment, lets not even think about how the system handles multihoming outages, Mapping Preference Messages etc. Is this correct? I don't recall two types of mapping request being mentioned in your paper. So B's Six/One router generates a type R mapping request. 6 - The mapping request reaches the appropriate query server, and it sends the response. 7 - The response arrives at B's Six/One router. This enables it to know the edge prefix and transit prefix used by the A's Six/One router. Therefore it can calculate what was added to A's edge source address to create the transit source address in the just-arrived packet. This enables it to reverse that rewrite. The rewritten packet is now forwarded to host B. This is two sets of query response: * Query * Response Send translated packet Translated packet arrives * Query * Response Translated packet delivered If you are using these mapping systems (top right of page 3): CONS ALT DNS Map (I think) then these all involve a global query server system. This raises problems with long delays and unreliability in getting a mapping response. With Six/One Router, the situation is worse than for the map-encap systems, since there are two cycles of query and response. I think these mapping systems would mean that address space relying on Six/One Router would involve unacceptable delays in delivering initial packets as discussed here: http://www.firstpr.com.au/ip/ivip/lisp-links/#long_paths Host B now wants to send a packet back to host A, and this packet happens to go through the same Six/One router just mentioned. That Six/One router retains no state based on the previous packet rewriting, but it does have cached mapping information of A's edge prefix. How does this Six/One router in B's network know which of potentially multiple transit prefixes used by A's network should be used for the rewriting of the destination address of the current outgoing packet? Assuming the type B response included all of A's transit prefixes, then this gives the Six/One router a list of such prefixes to choose between. Maybe, like with the multiple transit addresses in the AAAA record, the Six/One router simply chooses one. Or does the mapping information include weighs for each transit prefix to implement incoming load balancing for the remote network? If so, then this doesn't apply to packets coming in response to finding a transit address in an AAAA record. Once the Six/One router has chosen a destination transit prefix to translate the packet's destination address to, it can do the rewrites and send the packet on its way. What of the next packet destined for the same edge prefix which arrives at this Six/One router? Does it go through the same procedure, potentially choosing another destination prefix in the remote network? There is no state in the router concerning packets handled, so I guess there is no continuity in Six/One router behaviour from one packet to the next. I haven't yet looked at how Six/One Router handles PMTUD and packet too big messages in the translated portion of the path to the destination host. -- to unsubscribe send a message to [EMAIL PROTECTED] with the word 'unsubscribe' in a single line as the message text body. archive: <http://psg.com/lists/rrg/> & ftp://psg.com/pub/lists/rrg
