After a number of exchanges with Robin and Noel, here is a revision of LISP critique (sorry Noel I'm one day late -- it took much longer than I thought to put things together and I may have missed some worthwhile point due to word limit)

This is largely a revision from Noel's critiques, combined with comments from Robin and myself. The focus is on LISP as a routing scalability solution; I did not include any comments on the mobility support part

Comments welcome!


LISP Critique

LISP continues to go through changes based on lessons learned in actual deployment. This critique is based on the (understanding of) description at the time of this writing. This critique includes issues from fundamental architectural limitations, potential problems that require co-ordinated change, and new issues as the results of the design; in addition it also includes a clarification of some basic definitions.

First, two basic terms in LISP needs clarification: as state in LISP drafts:
   "LISP specifies an architecture and mechanism for replacing the
   addresses currently used by IP with two separate name spaces:
Endpoint IDS (EIDs), used within sites, and Routing Locators (RLOCs),
   used on the transit networks that make up the Internet
Thus "EID" in LISP is not a host identifier, but IP addresses used within a site for packet delivery. Furthermore, an RLOC is not simply any IP address reachable in the global default free zone; it is an special address that binds to an attachment point of a site.

Regarding the architecture, LISP's most serious challenges are due to the fact that it effectively divides today's routed IP address space into two, edges and the core, which comes with all the challenges that such a grand division brings; the list below attempts to capture the major ones that have been identified (not in priority order).

The first question is whether, or how, one can draw a clear boundary to sort existing networks into the core and edges. For example where should one put those transit networks that do not provide global connectivity (e.g. Internet2)? Do such networks belong to the core or edges? Does the core represent a connected cloud (bar transient failures)?

The second class of challenges arises from the fact that the reachability to each edge destination is now a combined result of 3 major components: the mapping database that captures connectivity between an edge site and its TRs, realtime status between an edge site and its TRs, and the connectivity between ITR and ETR to encapsulate packets over. In designing these components, three goals are often in conflict: minimizing overhead, minimizing complexity, and maximizing performance.

Because the mapping database will be very large in size, LISP lets ITRs query mapping on demand, which brings up the question of how to handle packets while ITR waiting for the mapping information. The current decision (dropping packets) favors simplicity at the cost of data performance. Would be feasible to buffer the packets? How deep such a buffer could be? Such questions need future research.

Another issue arises from caching the mapping information: caching improves the performance, but introduces the problem of detecting, and replacing, outdated mappings. This is a very lengthy topic with many subtly different failure modes, which cannot be covered here in any detail.

Because of this caching effect, and the fact that the ETR to a multihomed destination site is chosen at ITR, LISP design also faces challenges of response to component failures. LISP cannot easily test reachability of ultimate destinations (e.g. behind an ETR).

Regarding the mapping system, the ALT design has potential performance and scaling issues (e.g. concentration of request load at the top- level nodes); an interface has been built to allow replacement of the mapping system. Another issue is potential identification (EIDs) namespace provider lock-in, unless some mechanism can be worked out to allow multiple competing providers to provide resolutions from EIDs to ETRs (perhaps as part of a new mapping system). This last point touches on an even more important issue: an ISP's performance critically depends on the performance of the mapping system, a single mapping system to serve all seems problematic.

Another class of issues relates to network management and operations.
- Although LISP does provide significant tools for multi-homing,
load-sharing, optimal-entry-selection, etc, these currently depend on correct
  configuration; response to component failures is also limited.
- LISP is currently working through NAT boxes, but only in limited
configurations. In particular, due to the use of fixed UDP ports, it is not
  currently possible to support more than one ETR behind a NAT box.
- Encapsulations of data packets increase the packet size and may lead to PMTU problems.

Last but not least: One would not be able to see global routing table size reduction unless/until LISP has been adopted by significant number of networks. On the other hand, LISP is potentially a useful tool in data centers where its one-level of indirection may help significantly simplify the support for virtual servers.

rrg mailing list

Reply via email to