Document Shepherd Review of draft-ietf-rtgwg-bgp-pic-02

Rob Shakir Thu, 11 Aug 2016 18:40:31 -0700

Ahmed, Clarence, Pradosh,

I did a review of draft-ietf-rtgwg-bgp-pic-02 as part of the process of doing a 
shepherd write-up for this document. I have a number of comments which I want 
to try and address first.


General:

1) This document feels really difficult to get the concepts from to me. The key 
intent (in my view) is to describe the way that one can structure a FIB to be 
able to reduce the number of updates that are required to reconverge. 
Unfortunately, the extensive use of examples (to me) does not make the core 
concepts that are being described super-clear. To this end, what I would 
propose is re-working the overview section. In this section the two “pillars” 
that are described can be outlined in more detail. For example:

        BGP PIC is based on two FIB design concepts:
        
          1. Rather than implementing a forwarding construct each BGP NLRI has
             a corresponding next-hop property, the FIB is implemented as a 
series of linked
             entries, whereby many BGP NLRIs can link to a single next-hop 
entry (or set of
             next-hop entries).

          2. Forwarding entries are represented as a chain - which may have 
multiple
             levels of hierarchy representing the forwarding dependencies for 
packets
             matching that criteria, for example, allowing a BGP NLRI to map to 
a BGP
             next-hop which itself maps to an IGP next-hop, which maps to an 
interface.

        Designing the FIB in this manner allows commonalities between different 
prefixes
        to be exploited to improve convergence time. Particularly, where a 
particular
        element is updated (e.g., IGP path to a BGP next-hop), then these 
changes are
        inherited by the other entries which are linked to the updated element 
(i.e.,
        all BGP NLRIs which share the BGP next-hop above).

This high-level overview then allows a reader to be able parse *what* is really 
implemented prior to getting stuck into the detailed examples and 
walk-throughs. Defining this high-level approach would also help with the 
definitions section, such that there is some context before trying to 
understand the definitions.

2) The “Properties” section feels misplaced to me — it seems to be a back 
reference to the rest of the document, whereas it would feel more natural if it 
were a description of the benefits of implementing the concepts that are 
described in the document, and a link to the sections where these are described 
in more detail.

Section 4 has a similar issue, insofar that it now informs us of how to 
implement some specifics of the actual forwarding behaviour following a number 
of examples. I’d propose that the layout of the document should be something 
more like:

        Intro — why do we need PIC?
        Overview — what is PIC?
                a) FIB layout
                b) FIB operation with PIC
                c) Dealing with limited hierarchy FIBs
        Characteristics of PIC - what does PIC require, what does it not 
require?
        Examples - demonstrating how PIC deals with various scenarios.
                a) Basic example (as per Figure 2)
                        - Operation with iPE link failure
                        - Operation with remote link failure in the core
                        - Operation with ePE failure
                        - Operation with ePE-CE link failure
                b) Optionally have the more complex example here.

3) Use of the phrase “Prefix Independent Convergence”. I understand that this 
is an established concept, which is widely implemented, however, if we are 
trying to be more technically robust then I think that we need to work on how 
this document describes the advantages of the methods that it proposes. In a 
network that has 1:1 mapping of prefix to next-hop then the mechanisms that are 
provided do not really mean that convergence is “prefix independent”. A more 
robust way to describe this (IMHO) is that convergence time is now related to 
the number of next-hop updates that need to be made, rather than the number of 
NLRI. So - in the case that we have our 1:1 mapping of prefix to next-hop, then 
we describe the number of updates we need to make is still # of prefixes == # 
of next-hops. However, since we expect prefix to next-hop to be N:1 in most 
cases, then we decouple of total number of *prefix* entries, but we’re still 
coupled to the number of next-hops. I’d encourage that if we’re going to have 
this document as output of the RTG WG then we are a little more robust in 
describing the characteristics of the solution - even if it does not sound 
quite so marketing-friendly as “prefix independent”.

4) Examples — in the doc, there is quite a lot of text which describes - in 
words - what the routing table looks like for certain scenarios. It was my 
feeling reviewing the document that actually adding this wording makes the 
examples difficult to understand. Could one rather describe the RIB by writing 
out what the entries would look like, or putting it in the context of each 
device. It might be worth trying to use some actual documentation prefixes in 
the examples, rather than the current approach, which creates some confusion as 
to what kind of information is being represented by the notation used.

For instance, Figure 2 feels more readable to me if written as follows:


           +--------------------------------+
           |                                |
           |                               ePE2 (loopback 192.0.2.1)
           |                                |  \
           |                                |   \
           |                                |    \
          iPE                               |    CE.......VRF "Blue"
           |                                |    /       (VPN-IP1 - 
65000:1:2001:DB8:1::/64)
           |                                |   /        (VPN-IP2 - 
65000:1:2001:DB8:2::/64)
           |   LDP/Segment-Routing Core     |  /
           |                               ePE1 (loopback 192.0.2.2)
           |                                |
           +--------------------------------+


        iPE’s routing table:
        
                65000:1:2001:DB8:1::/64
                        via ePE1 (192.0.2.1), VPN Label: VPN-L11
                        via ePE2 (192.0.2.2), VPN Label:VPN-L21

                65000:1:2001:DB8:2::/64
                        via ePE1 (192.0.2.1), VPN Label: VPN-L12
                        via ePE2 (192.0.2.2), VPN Label: VPN-L22

                192.0.2.1/32
                        via Core, Label:                IGP-L11
                        via Core, Label:                IGP-L12
        
                192.0.2.2/32
                        via Core, Label:                IGP-L21
                        via Core, Label:                IGP-L22


Tabulating the routing information then makes it much easier to refer to in the 
subsequent diagram of the FIB layout.

5) In Section 3.2 - it seems like we go into a lot of detail of quite a complex 
forwarding topology to explain the idea of how we can flatten entries together 
in FIB hierarchy. I would propose that some text is added (ideally in the same 
section as the proposal in 1)  that describes the fundamentals of this section. 
It also does not seem clear from the text (or the diagrams) what the “0” and 
“1” in the pathlist and then subsequent flattened pathlist actually represent? 
Are these next-hop IP addresses, or are they labels, or something different? (I 
presume that they are next-hop addresses, but this doesn’t seem super-evident).

6) There are some assertions in this document which I think are quite 
implementation specific:
        - 50msec failure detection of links — one could indicate that this 
might be a typical time, but nothing in the mechanism that is described in this 
document guarantees this.
        - “Perspective” - 6.2.1 — I think quite a few things here depend on how 
the implementation works. I would say that this section would be best moved to 
an appendix as something that demonstrates the potential advantages based on 
some example implementation and network topology.

At the moment, this feels like marketing numbers rather than anything that I’d 
really expect to be in a technical publication from this WG.

Specifics:

1) OutLabel-List: There is a definition here that says that the array can be of 
“one or more outgoing labels and/or label actions“, there is no provision 
within this definition that the label can be unlabelled (or a no-op) - but this 
is clearly implemented and used in a later example. I presume it should be 
included in this definition.

2) The definition of dependency seems to say that an object X is dependent on 
object Y if Y can’t be deleted without X not being a dependency. It seems to me 
like we might want to make this definition less self-referential in the text. 
Perhaps something akin to:

        One object X is determined to be a dependent of another object Y if a 
link within
        a specified forwarding chain exists between the two objects.

3) Definition of primary path — the definition here states that the path “can 
be used all the time”. However, we have cases where the primary BGP path cannot 
be used because the next-hop is unreachable in the IGP (for example). I think 
the definition stating that path can be used all the time is not correct, and 
rather one wants to indicate that the primary path is the first usable path 
within a path list or similar.

4) Definition of path — the definition here states that the path is the “next 
hop in a sequence of unique connected nodes”. Is this really what we mean - 
that a path needs to represent A-B-C-D. This doesn’t seem to tally with the 
definition (and use) of PathList which is simply the next-hop. Is a more 
rigorous definition that the a Path is an entry that corresponds to the first 
hop (i.e., the local device’s next-hop) of a series of nodes that end at the 
destination prefix?

5) Figure 3: It’s generally unclear to me what this figure is trying to show 
with some of the arrows. We have VPN-L11 which is (I assume) an ingress LFIB 
entry, and then VPN-IP1 which is an ingress IP leaf (within a VRF, I presume). 
What do the arrows from the top to the bottom indicate?

6) “BGP is inherently slow” — this seems to require some subjective view of 
what ‘slow’ is. Would it not be better to state that the speed of convergence 
of BGP is limited by the time taken to serially propagate reachability 
information from the point of failure to the device that must reconverge. 

7) “Another more common and widely deployed scenario is L3VPN” — do you want to 
state that this is another *common* deployment scenario, otherwise, what is 
this L3VPN deployment more common than?

8) There are some statements that this is “enabled with zero operator 
intervention” — it is probably enabled by some software upgrade, so I’m not 
sure that this is quite zero operator intervention - however, given that this 
is mentioned, do you also need to discuss why this is something that can be 
done on a per-node basis and what the potential negative impacts could be (if 
any) of this happening? In general, having new things turn on by default and 
not being able to test them, or know that they’ve changed is not a good way to 
ensure that you can resolve faults in a network swiftly and effectively.

9) (nit) Through the “terminology” section - the definitions probably don’t 
need to start with “it is” - one can simply say “BGP prefix: A prefix p/m…”.

10) (nit) Forwarding Chain is inconsistently capitalised. Where it is 
capitalised, I expected to be able to find the definition in terminology - I’d 
suggest making this consistent.

11) In Section 5. there are a number of references to software components of 
the device that is then referenced throughout this section. It would be useful 
to define what those software components are - i.e., we have a BGP process 
which does X, an IGP process which does Y, and then a FIB manager process that 
does Z.

I appreciate that there are quite a lot of comments here, so happy to discuss 
this. I’d appreciate seeing some of them addressed before this draft progresses 
to IESG though.

Thanks!
r.
_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Document Shepherd Review of draft-ietf-rtgwg-bgp-pic-02

Reply via email to