Re: Adrian Farrel's Discuss on draft-ietf-rtgwg-remote-lfa-09: (with DISCUSS and COMMENT)

Alia Atlas Mon, 05 Jan 2015 10:11:19 -0800

[+rtgwg mailing list]

Adrian,


Thanks very much for working through the example.  It was very interesting
to see an understanding by someone who isn't as close to the problem-space
and helped pick up on imprecisions and lack of clarity in the definitions.

Not to speak for Stewart - whom I'm sure will be responding quite soon,
but...

On Mon, Dec 29, 2014 at 11:09 AM, Adrian Farrel <[email protected]> wrote:

> Adrian Farrel has entered the following ballot position for
> draft-ietf-rtgwg-remote-lfa-09: Discuss
>
> When responding, please keep the subject line intact and reply to all
> email addresses included in the To and CC lines. (Feel free to cut this
> introductory paragraph, however.)
>
>
> Please refer to http://www.ietf.org/iesg/statement/discuss-criteria.html
> for more information about IESG DISCUSS and COMMENT positions.
>
>
> The document, along with other ballot positions, can be found here:
> http://datatracker.ietf.org/doc/draft-ietf-rtgwg-remote-lfa/
>
>
>
> ----------------------------------------------------------------------
> DISCUSS:
> ----------------------------------------------------------------------
>
> I'm placing this Discuss because I found the description of the
> algorithm in 4.2.1 and the worked example in Section 2 to be at odds
> with the definitions of P-space, extended P-space, and Q-space.
>
> I have been able to make things work by messing with the algorithm and
> keeping the current definitions. You could probably do it by keeping
> the algorithm and messing with the definitions.
>

Yes - we want to keep the algorithm.  In particular, the pseudo-code gets it
write and the definitions are a little sloppy.   Stewart clarified a bit
around the
extended P-space not including paths through the failed link in the text
before
I put it to the IESG.



> My workings were as follows, based on the example in Section 2:
>
> >            S---E
> >           /     \
> >          A       D
> >           \     /
> >            B---C
> >
> >  In Figure 1 S can reach A, B, and C without going via E;
> >  these form S's extended P-space.
>
> First, this should say "via S-E" and "extended P-space with respect to
> S-E".
>
> But...
> >  Extended P-space
> >                 The union of the P-space of the neighbours of a
> >                 specific router with respect to the protected link
>
> (Noting that 4.2.1.2 changes this definition *significantly* by saying
> that the neighbour at the far end of the failing link - i.e., E in this
> case - must be excised from the list of neighbours whose P-spaces are
> combined).
>

To be fair, the definition of P-space (below) includes that the paths
can't transit the protected link (S-E in the example).

I think that the definition needs to be updated to be "the neighbors of a
specific router that are reachable without going via the protected link".

When there's only a single link S-E, S has no direct way of forcing traffic
to E so E's P-space can't be included.

...and...
> >  P-space        P-space is the set of routers reachable from a
> >                 specific router using the normal FIB, without any path
> >                 (including equal cost path splits) transiting the
> >                 protected link.
>
> Now, S's neighbours are A and E.
> The P-space of A with respect to S-E is {B, C}
> And the P-space of E with respect to S-E is {C, D}
> So the extended P-Space of S with respect to S-E is {B, C, D}
>
> Something is broken!
>

Yes, can't include the P-space of E when the failed link is S-E and there's
no way to reach E directly (one hop) from S.


> {A, B, C} is not even the (not extended) P-space of S with respect to
> S-E which is {A, B} since C is not in that set because of SEDC.
> On the other hand {A, B, C} *is* the extended P-space of E wrt S-E.
>

Although, I would observe that the pseudocode in 4.2.1 does derive
> A, B, C as the extended P-space of S wrt S-E, but I think that is
> because it has an entirely different definition of an extended P-space.
>

?? Because it omits E?  Do you see anything else different that needs
to be better clarified?


> Now...
> >  Q-space        Q-space is the set of routers from which a specific
> >                 router can be reached without any path (including
> >                 equal cost path splits) transiting the protected link.
> ...so the Q-space of S wrt S-E is {A, B} since CDES.
> And, for the record, the Q-space or E wrt S-E is {C, D}
>
> Now, to compound the confusion, the example determines the PQ nodes for
> S wrt S-E by taking the intersection of the extended P-space for S wrt
> S-E  and the Q-space of E wrt S-E. This is done notwithstanding the
> definition...
>
> >  PQ node        A node which is a member of both the P-space and the
> >                 Q-space.  Where extended P-space is in use it is a
> >                 node which is a member of both the extended P-space
> >                 and the Q-space.  In remote LFA this is used as the
> >                 repair tunnel endpoint.
>

Yup - so clearly it means "of both the extended P-space of the PLR (S) and
the Q-space of the far-end of the failed link (E)".  Some improved
definitions
are definitely needed.


> This definition gives the PQ nodes of S wrt SE as either
> - the intersection of {A, B} and {A, B} if P-space is being discussed
> or
> - the intersection of {B, C, D} and {A, B} if extended P-space is being
>   used.
>
> So the correct tunnel end point for your example is B.
> But it clearly doesn't work since traffic to E that is tunneled to B may
> still be ECMP routed back along BAS.
>
> So I think in this whole example, you sit at S and you say "I want to
> protect traffic to E". Then you work out the extended P-space of *E* wrt
> S-E (which is {A, B, C}) and the Q-space of *E* wrt S-E (which is
> {C, D}) giving you the correct PQ node for S to use to protect traffic
> to E in the event of a failure of S-E as C.
>

Extended-P space is the space that the PLR S can send traffic to - what
nodes
can it reach without using the failed link.

Q-space is what nodes can reach the far-end E of the failed-link S-E
without using
the failed link.

Obviously the definitions need improvement.


> It is simple! All you have to do is update the text to describe the
> actual process and not the wrong one. Then the right result will pop
> out :-)
>
> The replacement is
> OLD
>    In Figure 1 S can reach A, B, and C without going via E;
>    these form S's extended P-space.
> NEW
>    In Figure 1 S can reach A and B without going via S-E, and
>    D can reach B and C without going via S-E. So E's extended P-space
>    with respect to S-E is the nodes A, B, and C.
> END
>
>
> BUT, given all of this, are you sure that Section 4.2.1 is right? I'm
> not.
>

Yes, not concerned about that.  You are just high-lighting some imprecisions
and lack of clarity in the definitions.


> ------
>
> Shouldn't the pseudocode in 4.3 be enclosed in code component macros to
> match with the copyright TLP etc.?


> ----------------------------------------------------------------------
> COMMENT:
> ----------------------------------------------------------------------
>
> This is not my area of expertise so please excuse the brevity of the
> rest of this review.
>
> ---
>
> Please s/draft/document/ throughout (except the boilerplate and
> filename) so that it can be published as an RFC (which is not a draft).
>
> ---
>
> Although it causes some pain with abbreviations and a little more care
> in explanation, you need to put the Introduction as the first section in
> the document.
>
> ---
>
> You are using RFC 5714 as a Normative Reference by making me go there
> for the definition of terms. Please move it to the correct section.
>
> ---
>
> IMHO your definition of FIB is rather loose.  Fortunately (?) "FIB" is
> barely used in this document, so it might not be important, but if you
> wanted to fix it:
> - you are talking about IP packets in this document
>

Mostly MPLS actually...


> - the actions are, I think, limited to forwarding actions
>
> ---
>
> This comment applies iff the resolution of the Discuss is not a complete
> change to the terminology!
>
> I think definitions need to be tighter or omitted from this part of the
> document. The definitions in 4.2.1 are more verbose and probably for
> good reason. If you feel you need to retain these definitions early in
> the document and can't lift the text from 4.2.1 then you need to address
> the concerns below.
>
>    P-space        P-space is the set of routers reachable from a
>                   specific router using the normal FIB, without any path
>                   (including equal cost path splits) transiting the
>                   protected link.
>
> "the protected link"? There is only one protected link?
>
> Since the example is worded as...
>
>                   For example, the P-space of S with respect to link
>                   S-E, is the set of routers that S can reach without
>                   using the protected link S-E.
>
> ...I think you need...
>
>    P-space        The P-space of a router with respect to a specific
>                   protected link is the set of routers reachable from
>                   the specific router using the normal FIB, without any
>                   path (including equal cost path splits) transiting the
>                   protected link.
>
> Similarly, you need...
>
>    Extended P-space
>                   The union of the P-spaces of all of the neighbours of
>                   a specific router with respect to a single specific
>                   protected link (see Section 4.2.1.2).
>
> But note that 4.2.1.2 makes a significant change to this definition.
>
>    Q-space        The Q-space for a specific router with respect to a
>                   specific protected link is the set of routers from
>                   which the specific router can be reached without any
>                   path (including equal cost path splits) transiting the
>                   protected link.
>
>    PQ node        A PQ node is a node which is a member of both the
>                   P-space and the Q-space for the same router and with
>                   respect to the same protected link.
>
>                   Where extended P-space is being discussed, a PQ node
>                   is a node which is a member of both the extended
>                   P-space and the Q-space for the same router and with
>                   respect to the same protected link.
>
>                   In remote LFA the repair tunnel endpoint is a PQ node.
>
> Throughout the text, however, the terms are used rather loosely. For
> example, when discussing Figure 1 you say "S's extended P-space", but
> this is really "S's extended P-space with respect to S-E". Someone
> familiar with the work might say that it is obvious from the context
> that we are discussing the link S-E, and it is, but the terminology
> needs to be tight.
>
> ---
>
> There is some difficult terminology in Section 2
>
>    If all link costs are equal, the link S-E cannot be fully protected
>    by LFAs.  The destination C is an ECMP from S, and so can be
>    protected when S-E fails, but D and E are not protectable using LFAs.
>
> Is it the link or the node that is protected (or the traffic)? Perhaps
> this could be rewritten to be less ambiguous.
>
> ---
>
> Section 2
>
>    B
>    has equal-cost paths via B-A-S-E and B-C-D-E and so may go through
>    S-E.
>
> I don't think B is going anywhere. Maybe...
>
>    B
>    has equal-cost paths to E via B-A-S-E and B-C-D-E and so may reach E
>    through S-E.
>
> ---
>
> Section 2
>
>    In MPLS networks the targeted LDP
>    protocol needed to learn the label binding at the repair tunnel
>    endpoint is a well understood and widely deployed technology.
>
> But it would still benefit from a citation or a forward reference to
> section 7.
>
> ---
>
> I enjoyed 3.2
>
>    relatively rare as is the incidence of failure in a well managed
>    network.
>
> So, managing my network well is protection against back-hoes. Nice.
>

LOL - the argument is about the set-up time to be protected again and what
is the interval between failures.  The editors and WG have decided that this
trade-off is acceptable - but I'd also prefer to see it more clearly
articulated.


> ---
>
> In 3.2
>
>    Multiple
>    repairs MAY share a tunnel end point.
>
> 1. s/repairs/repair tunnels/
> 2. s/MAY/may/ since this is not an implementation or operational choice,
>    but a fact of life.
>
> ---
>
> In 4.2 you have truncated...
>
>    The repair tunnel endpoint needs to be a node in the network
>    reachable from S without traversing S-E.
>
> ...and...
>
>    o  The repair tunneled point MUST be reachable from the tunnel source
>       without traversing the failed link; and
>
> You mean "reachable using the normal FIB" I think.
>

Not quite because if the repair tunnel endpoint is in the extended P-space
and
not the P-space, then S has to force the first hop rather than send it via
the normal FIB.

 ---

>
> Section 4.3
>
>    The preceding text has mostly described the computation of the remote
>    LFA repair target (PQ) in terms of the intersection of two
>    reachability graphs computed using SPFs.
>
> "mostly"?
>
> "reachability graphs"? Were they? Or were they reachability sets?
>
> ---
>
> Your pseducode in 4.3 invokes an unresolved (and undescribed) function
> Compute_Forward_SPF().
>
> Actually, I think this is a bogus line that can be deleted.
>
> ---
>
> 4.3 has
>
>                           if ( D_opt(n, y) <
>                                   D_opt(n,self) + D_opt)(self, y)
>
> Surely this is
>
>                           if ( D_opt(n, y) <
>                                   D_opt(n,self) + D_opt(self, y) )
>
> ---
>
> I think the introduction of "pseudonode" in 4.3 may be a little without
> context.
>
> ---
>
> Section 7
>    If for any reason the TLDP session cannot
>    not be established
>
> s/cannot not/cannot/
>
> ---
>
> I think [RFC5424] and [RFC3411] are pretty poor references to give in
> section 7. You appear to be saying that an implementation that cannot
> establish a TLDP session should write a MIB module, standardise it, and
> then report an error.
>
> Can't you find an existing LDP MIB module that reports Session-up
> failures?
>
> Or maybe just delete "using any well known mechanism such as Syslog
> [RFC5424] or SNMP [RFC3411]."
>
> ---
>
> Why is the discussion of microloops on network re-converges considered
> to be a management consideration (by inclusion in Section 9). Surely it
> is a deployment or operational consideration.
>

I wanted that text somewhere in the doc.  Adding an operational
considerations
section to put it in would be fine.



> ---
>
> I think you can strengthen the security considerations. You have:
>
>    To prevent their use as an attack vector IP repair tunnel endpoints
>    (where used) SHOULD be assigned from a set of addresses that are not
>    reachable from outside the routing domain.
>
> 1. "To prevent their use" is surely consistent with a "MUST".
>    The fact that you want to say "SHOULD" means that you need to turn
>    the text around...
>
>    IP repair tunnel endpoints (where used) SHOULD be assigned from a set
>    of addresses that are not reachable from outside the routing domain.
>    This would prevent their use as an attack vector.
>
> 2. You can add a note about what traffic can be placed into a repair
>    tunnel. You already have this earlier in the document, and it is
>    worth restating.
>
> 3. I think you should also make note of whether the repair tunnel is
>    advertised by the routing protocol as an available link.
>

I agree on the comments otherwise.

Regards,
Alia

_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg

Re: Adrian Farrel's Discuss on draft-ietf-rtgwg-remote-lfa-09: (with DISCUSS and COMMENT)

Reply via email to