Hi Folks, Just to remind you of these comments. PTAL when you have time so we can look towards progressing this doc.
Thanks, r. > On Aug 11, 2016, at 18:39, Rob Shakir <[email protected]> wrote: > > Ahmed, Clarence, Pradosh, > > I did a review of draft-ietf-rtgwg-bgp-pic-02 as part of the process of doing > a shepherd write-up for this document. I have a number of comments which I > want to try and address first. > > General: > > 1) This document feels really difficult to get the concepts from to me. The > key intent (in my view) is to describe the way that one can structure a FIB > to be able to reduce the number of updates that are required to reconverge. > Unfortunately, the extensive use of examples (to me) does not make the core > concepts that are being described super-clear. To this end, what I would > propose is re-working the overview section. In this section the two “pillars” > that are described can be outlined in more detail. For example: > > BGP PIC is based on two FIB design concepts: > > 1. Rather than implementing a forwarding construct each BGP NLRI has > a corresponding next-hop property, the FIB is implemented as a series > of linked > entries, whereby many BGP NLRIs can link to a single next-hop entry > (or set of > next-hop entries). > > 2. Forwarding entries are represented as a chain - which may have > multiple > levels of hierarchy representing the forwarding dependencies for > packets > matching that criteria, for example, allowing a BGP NLRI to map to a > BGP > next-hop which itself maps to an IGP next-hop, which maps to an > interface. > > Designing the FIB in this manner allows commonalities between different > prefixes > to be exploited to improve convergence time. Particularly, where a > particular > element is updated (e.g., IGP path to a BGP next-hop), then these changes > are > inherited by the other entries which are linked to the updated element > (i.e., > all BGP NLRIs which share the BGP next-hop above). > > This high-level overview then allows a reader to be able parse *what* is > really implemented prior to getting stuck into the detailed examples and > walk-throughs. Defining this high-level approach would also help with the > definitions section, such that there is some context before trying to > understand the definitions. > > 2) The “Properties” section feels misplaced to me — it seems to be a back > reference to the rest of the document, whereas it would feel more natural if > it were a description of the benefits of implementing the concepts that are > described in the document, and a link to the sections where these are > described in more detail. > > Section 4 has a similar issue, insofar that it now informs us of how to > implement some specifics of the actual forwarding behaviour following a > number of examples. I’d propose that the layout of the document should be > something more like: > > Intro — why do we need PIC? > Overview — what is PIC? > a) FIB layout > b) FIB operation with PIC > c) Dealing with limited hierarchy FIBs > Characteristics of PIC - what does PIC require, what does it not require? > Examples - demonstrating how PIC deals with various scenarios. > a) Basic example (as per Figure 2) > - Operation with iPE link failure > - Operation with remote link failure in the core > - Operation with ePE failure > - Operation with ePE-CE link failure > b) Optionally have the more complex example here. > > 3) Use of the phrase “Prefix Independent Convergence”. I understand that this > is an established concept, which is widely implemented, however, if we are > trying to be more technically robust then I think that we need to work on how > this document describes the advantages of the methods that it proposes. In a > network that has 1:1 mapping of prefix to next-hop then the mechanisms that > are provided do not really mean that convergence is “prefix independent”. A > more robust way to describe this (IMHO) is that convergence time is now > related to the number of next-hop updates that need to be made, rather than > the number of NLRI. So - in the case that we have our 1:1 mapping of prefix > to next-hop, then we describe the number of updates we need to make is still > # of prefixes == # of next-hops. However, since we expect prefix to next-hop > to be N:1 in most cases, then we decouple of total number of *prefix* > entries, but we’re still coupled to the number of next-hops. I’d encourage > that if we’re going to have this document as output of the RTG WG then we are > a little more robust in describing the characteristics of the solution - even > if it does not sound quite so marketing-friendly as “prefix independent”. > > 4) Examples — in the doc, there is quite a lot of text which describes - in > words - what the routing table looks like for certain scenarios. It was my > feeling reviewing the document that actually adding this wording makes the > examples difficult to understand. Could one rather describe the RIB by > writing out what the entries would look like, or putting it in the context of > each device. It might be worth trying to use some actual documentation > prefixes in the examples, rather than the current approach, which creates > some confusion as to what kind of information is being represented by the > notation used. > > For instance, Figure 2 feels more readable to me if written as follows: > > > +--------------------------------+ > | | > | ePE2 (loopback 192.0.2.1) > | | \ > | | \ > | | \ > iPE | CE.......VRF "Blue" > | | / (VPN-IP1 - > 65000:1:2001:DB8:1::/64) > | | / (VPN-IP2 - > 65000:1:2001:DB8:2::/64) > | LDP/Segment-Routing Core | / > | ePE1 (loopback 192.0.2.2) > | | > +--------------------------------+ > > > iPE’s routing table: > > 65000:1:2001:DB8:1::/64 > via ePE1 (192.0.2.1), VPN Label: VPN-L11 > via ePE2 (192.0.2.2), VPN Label:VPN-L21 > > 65000:1:2001:DB8:2::/64 > via ePE1 (192.0.2.1), VPN Label: VPN-L12 > via ePE2 (192.0.2.2), VPN Label: VPN-L22 > > 192.0.2.1/32 > via Core, Label: IGP-L11 > via Core, Label: IGP-L12 > > 192.0.2.2/32 > via Core, Label: IGP-L21 > via Core, Label: IGP-L22 > > > Tabulating the routing information then makes it much easier to refer to in > the subsequent diagram of the FIB layout. > > 5) In Section 3.2 - it seems like we go into a lot of detail of quite a > complex forwarding topology to explain the idea of how we can flatten entries > together in FIB hierarchy. I would propose that some text is added (ideally > in the same section as the proposal in 1) that describes the fundamentals of > this section. It also does not seem clear from the text (or the diagrams) > what the “0” and “1” in the pathlist and then subsequent flattened pathlist > actually represent? Are these next-hop IP addresses, or are they labels, or > something different? (I presume that they are next-hop addresses, but this > doesn’t seem super-evident). > > 6) There are some assertions in this document which I think are quite > implementation specific: > - 50msec failure detection of links — one could indicate that this might > be a typical time, but nothing in the mechanism that is described in this > document guarantees this. > - “Perspective” - 6.2.1 — I think quite a few things here depend on how > the implementation works. I would say that this section would be best moved > to an appendix as something that demonstrates the potential advantages based > on some example implementation and network topology. > > At the moment, this feels like marketing numbers rather than anything that > I’d really expect to be in a technical publication from this WG. > > Specifics: > > 1) OutLabel-List: There is a definition here that says that the array can be > of “one or more outgoing labels and/or label actions“, there is no provision > within this definition that the label can be unlabelled (or a no-op) - but > this is clearly implemented and used in a later example. I presume it should > be included in this definition. > > 2) The definition of dependency seems to say that an object X is dependent on > object Y if Y can’t be deleted without X not being a dependency. It seems to > me like we might want to make this definition less self-referential in the > text. Perhaps something akin to: > > One object X is determined to be a dependent of another object Y if a link > within > a specified forwarding chain exists between the two objects. > > 3) Definition of primary path — the definition here states that the path “can > be used all the time”. However, we have cases where the primary BGP path > cannot be used because the next-hop is unreachable in the IGP (for example). > I think the definition stating that path can be used all the time is not > correct, and rather one wants to indicate that the primary path is the first > usable path within a path list or similar. > > 4) Definition of path — the definition here states that the path is the “next > hop in a sequence of unique connected nodes”. Is this really what we mean - > that a path needs to represent A-B-C-D. This doesn’t seem to tally with the > definition (and use) of PathList which is simply the next-hop. Is a more > rigorous definition that the a Path is an entry that corresponds to the first > hop (i.e., the local device’s next-hop) of a series of nodes that end at the > destination prefix? > > 5) Figure 3: It’s generally unclear to me what this figure is trying to show > with some of the arrows. We have VPN-L11 which is (I assume) an ingress LFIB > entry, and then VPN-IP1 which is an ingress IP leaf (within a VRF, I > presume). What do the arrows from the top to the bottom indicate? > > 6) “BGP is inherently slow” — this seems to require some subjective view of > what ‘slow’ is. Would it not be better to state that the speed of convergence > of BGP is limited by the time taken to serially propagate reachability > information from the point of failure to the device that must reconverge. > > 7) “Another more common and widely deployed scenario is L3VPN” — do you want > to state that this is another *common* deployment scenario, otherwise, what > is this L3VPN deployment more common than? > > 8) There are some statements that this is “enabled with zero operator > intervention” — it is probably enabled by some software upgrade, so I’m not > sure that this is quite zero operator intervention - however, given that this > is mentioned, do you also need to discuss why this is something that can be > done on a per-node basis and what the potential negative impacts could be (if > any) of this happening? In general, having new things turn on by default and > not being able to test them, or know that they’ve changed is not a good way > to ensure that you can resolve faults in a network swiftly and effectively. > > 9) (nit) Through the “terminology” section - the definitions probably don’t > need to start with “it is” - one can simply say “BGP prefix: A prefix p/m…”. > > 10) (nit) Forwarding Chain is inconsistently capitalised. Where it is > capitalised, I expected to be able to find the definition in terminology - > I’d suggest making this consistent. > > 11) In Section 5. there are a number of references to software components of > the device that is then referenced throughout this section. It would be > useful to define what those software components are - i.e., we have a BGP > process which does X, an IGP process which does Y, and then a FIB manager > process that does Z. > > I appreciate that there are quite a lot of comments here, so happy to discuss > this. I’d appreciate seeing some of them addressed before this draft > progresses to IESG though. > > Thanks! > r. > _______________________________________________ > rtgwg mailing list > [email protected] > https://www.ietf.org/mailman/listinfo/rtgwg _______________________________________________ rtgwg mailing list [email protected] https://www.ietf.org/mailman/listinfo/rtgwg
