Re: Document Shepherd Review of draft-ietf-rtgwg-bgp-pic-02

Rob Shakir Thu, 08 Sep 2016 13:09:06 -0700

Hi Folks,

Just to remind you of these comments. PTAL when you have time so we can look 
towards progressing this doc.


Thanks,
r. 

> On Aug 11, 2016, at 18:39, Rob Shakir <[email protected]> wrote:
> 
> Ahmed, Clarence, Pradosh,
> 
> I did a review of draft-ietf-rtgwg-bgp-pic-02 as part of the process of doing 
> a shepherd write-up for this document. I have a number of comments which I 
> want to try and address first.
> 
> General:
> 
> 1) This document feels really difficult to get the concepts from to me. The 
> key intent (in my view) is to describe the way that one can structure a FIB 
> to be able to reduce the number of updates that are required to reconverge. 
> Unfortunately, the extensive use of examples (to me) does not make the core 
> concepts that are being described super-clear. To this end, what I would 
> propose is re-working the overview section. In this section the two “pillars” 
> that are described can be outlined in more detail. For example:
> 
>    BGP PIC is based on two FIB design concepts:
>    
>      1. Rather than implementing a forwarding construct each BGP NLRI has
>         a corresponding next-hop property, the FIB is implemented as a series 
> of linked
>         entries, whereby many BGP NLRIs can link to a single next-hop entry 
> (or set of
>             next-hop entries).
> 
>      2. Forwarding entries are represented as a chain - which may have 
> multiple
>         levels of hierarchy representing the forwarding dependencies for 
> packets
>         matching that criteria, for example, allowing a BGP NLRI to map to a 
> BGP
>         next-hop which itself maps to an IGP next-hop, which maps to an 
> interface.
> 
>    Designing the FIB in this manner allows commonalities between different 
> prefixes
>    to be exploited to improve convergence time. Particularly, where a 
> particular
>    element is updated (e.g., IGP path to a BGP next-hop), then these changes 
> are
>    inherited by the other entries which are linked to the updated element 
> (i.e.,
>    all BGP NLRIs which share the BGP next-hop above).
> 
> This high-level overview then allows a reader to be able parse *what* is 
> really implemented prior to getting stuck into the detailed examples and 
> walk-throughs. Defining this high-level approach would also help with the 
> definitions section, such that there is some context before trying to 
> understand the definitions.
> 
> 2) The “Properties” section feels misplaced to me — it seems to be a back 
> reference to the rest of the document, whereas it would feel more natural if 
> it were a description of the benefits of implementing the concepts that are 
> described in the document, and a link to the sections where these are 
> described in more detail.
> 
> Section 4 has a similar issue, insofar that it now informs us of how to 
> implement some specifics of the actual forwarding behaviour following a 
> number of examples. I’d propose that the layout of the document should be 
> something more like:
> 
>    Intro — why do we need PIC?
>    Overview — what is PIC?
>        a) FIB layout
>        b) FIB operation with PIC
>        c) Dealing with limited hierarchy FIBs
>    Characteristics of PIC - what does PIC require, what does it not require?
>    Examples - demonstrating how PIC deals with various scenarios.
>        a) Basic example (as per Figure 2)
>            - Operation with iPE link failure
>            - Operation with remote link failure in the core
>            - Operation with ePE failure
>            - Operation with ePE-CE link failure
>        b) Optionally have the more complex example here.
> 
> 3) Use of the phrase “Prefix Independent Convergence”. I understand that this 
> is an established concept, which is widely implemented, however, if we are 
> trying to be more technically robust then I think that we need to work on how 
> this document describes the advantages of the methods that it proposes. In a 
> network that has 1:1 mapping of prefix to next-hop then the mechanisms that 
> are provided do not really mean that convergence is “prefix independent”. A 
> more robust way to describe this (IMHO) is that convergence time is now 
> related to the number of next-hop updates that need to be made, rather than 
> the number of NLRI. So - in the case that we have our 1:1 mapping of prefix 
> to next-hop, then we describe the number of updates we need to make is still 
> # of prefixes == # of next-hops. However, since we expect prefix to next-hop 
> to be N:1 in most cases, then we decouple of total number of *prefix* 
> entries, but we’re still coupled to the number of next-hops. I’d encourage 
> that if we’re going to have this document as output of the RTG WG then we are 
> a little more robust in describing the characteristics of the solution - even 
> if it does not sound quite so marketing-friendly as “prefix independent”.
> 
> 4) Examples — in the doc, there is quite a lot of text which describes - in 
> words - what the routing table looks like for certain scenarios. It was my 
> feeling reviewing the document that actually adding this wording makes the 
> examples difficult to understand. Could one rather describe the RIB by 
> writing out what the entries would look like, or putting it in the context of 
> each device. It might be worth trying to use some actual documentation 
> prefixes in the examples, rather than the current approach, which creates 
> some confusion as to what kind of information is being represented by the 
> notation used.
> 
> For instance, Figure 2 feels more readable to me if written as follows:
> 
> 
>           +--------------------------------+
>           |                                |
>           |                               ePE2 (loopback 192.0.2.1)
>           |                                |  \
>           |                                |   \
>           |                                |    \
>          iPE                               |    CE.......VRF "Blue"
>           |                                |    /       (VPN-IP1 - 
> 65000:1:2001:DB8:1::/64)
>           |                                |   /        (VPN-IP2 - 
> 65000:1:2001:DB8:2::/64)
>           |   LDP/Segment-Routing Core     |  /
>           |                               ePE1 (loopback 192.0.2.2)
>           |                                |
>           +--------------------------------+
> 
> 
>    iPE’s routing table:
>    
>        65000:1:2001:DB8:1::/64
>            via ePE1 (192.0.2.1), VPN Label: VPN-L11
>            via ePE2 (192.0.2.2), VPN Label:VPN-L21
> 
>        65000:1:2001:DB8:2::/64
>            via ePE1 (192.0.2.1), VPN Label: VPN-L12
>            via ePE2 (192.0.2.2), VPN Label: VPN-L22
> 
>        192.0.2.1/32
>            via Core, Label:        IGP-L11
>            via Core, Label:        IGP-L12
>    
>        192.0.2.2/32
>            via Core, Label:        IGP-L21
>            via Core, Label:        IGP-L22
> 
> 
> Tabulating the routing information then makes it much easier to refer to in 
> the subsequent diagram of the FIB layout.
> 
> 5) In Section 3.2 - it seems like we go into a lot of detail of quite a 
> complex forwarding topology to explain the idea of how we can flatten entries 
> together in FIB hierarchy. I would propose that some text is added (ideally 
> in the same section as the proposal in 1)  that describes the fundamentals of 
> this section. It also does not seem clear from the text (or the diagrams) 
> what the “0” and “1” in the pathlist and then subsequent flattened pathlist 
> actually represent? Are these next-hop IP addresses, or are they labels, or 
> something different? (I presume that they are next-hop addresses, but this 
> doesn’t seem super-evident).
> 
> 6) There are some assertions in this document which I think are quite 
> implementation specific:
>    - 50msec failure detection of links — one could indicate that this might 
> be a typical time, but nothing in the mechanism that is described in this 
> document guarantees this.
>    - “Perspective” - 6.2.1 — I think quite a few things here depend on how 
> the implementation works. I would say that this section would be best moved 
> to an appendix as something that demonstrates the potential advantages based 
> on some example implementation and network topology.
> 
> At the moment, this feels like marketing numbers rather than anything that 
> I’d really expect to be in a technical publication from this WG.
> 
> Specifics:
> 
> 1) OutLabel-List: There is a definition here that says that the array can be 
> of “one or more outgoing labels and/or label actions“, there is no provision 
> within this definition that the label can be unlabelled (or a no-op) - but 
> this is clearly implemented and used in a later example. I presume it should 
> be included in this definition.
> 
> 2) The definition of dependency seems to say that an object X is dependent on 
> object Y if Y can’t be deleted without X not being a dependency. It seems to 
> me like we might want to make this definition less self-referential in the 
> text. Perhaps something akin to:
> 
>    One object X is determined to be a dependent of another object Y if a link 
> within
>    a specified forwarding chain exists between the two objects.
> 
> 3) Definition of primary path — the definition here states that the path “can 
> be used all the time”. However, we have cases where the primary BGP path 
> cannot be used because the next-hop is unreachable in the IGP (for example). 
> I think the definition stating that path can be used all the time is not 
> correct, and rather one wants to indicate that the primary path is the first 
> usable path within a path list or similar.
> 
> 4) Definition of path — the definition here states that the path is the “next 
> hop in a sequence of unique connected nodes”. Is this really what we mean - 
> that a path needs to represent A-B-C-D. This doesn’t seem to tally with the 
> definition (and use) of PathList which is simply the next-hop. Is a more 
> rigorous definition that the a Path is an entry that corresponds to the first 
> hop (i.e., the local device’s next-hop) of a series of nodes that end at the 
> destination prefix?
> 
> 5) Figure 3: It’s generally unclear to me what this figure is trying to show 
> with some of the arrows. We have VPN-L11 which is (I assume) an ingress LFIB 
> entry, and then VPN-IP1 which is an ingress IP leaf (within a VRF, I 
> presume). What do the arrows from the top to the bottom indicate?
> 
> 6) “BGP is inherently slow” — this seems to require some subjective view of 
> what ‘slow’ is. Would it not be better to state that the speed of convergence 
> of BGP is limited by the time taken to serially propagate reachability 
> information from the point of failure to the device that must reconverge. 
> 
> 7) “Another more common and widely deployed scenario is L3VPN” — do you want 
> to state that this is another *common* deployment scenario, otherwise, what 
> is this L3VPN deployment more common than?
> 
> 8) There are some statements that this is “enabled with zero operator 
> intervention” — it is probably enabled by some software upgrade, so I’m not 
> sure that this is quite zero operator intervention - however, given that this 
> is mentioned, do you also need to discuss why this is something that can be 
> done on a per-node basis and what the potential negative impacts could be (if 
> any) of this happening? In general, having new things turn on by default and 
> not being able to test them, or know that they’ve changed is not a good way 
> to ensure that you can resolve faults in a network swiftly and effectively.
> 
> 9) (nit) Through the “terminology” section - the definitions probably don’t 
> need to start with “it is” - one can simply say “BGP prefix: A prefix p/m…”.
> 
> 10) (nit) Forwarding Chain is inconsistently capitalised. Where it is 
> capitalised, I expected to be able to find the definition in terminology - 
> I’d suggest making this consistent.
> 
> 11) In Section 5. there are a number of references to software components of 
> the device that is then referenced throughout this section. It would be 
> useful to define what those software components are - i.e., we have a BGP 
> process which does X, an IGP process which does Y, and then a FIB manager 
> process that does Z.
> 
> I appreciate that there are quite a lot of comments here, so happy to discuss 
> this. I’d appreciate seeing some of them addressed before this draft 
> progresses to IESG though.
> 
> Thanks!
> r.
> _______________________________________________
> rtgwg mailing list
> [email protected]
> https://www.ietf.org/mailman/listinfo/rtgwg

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Document Shepherd Review of draft-ietf-rtgwg-bgp-pic-02

Reply via email to