On 12/12/2014 00:00, Alia Atlas wrote:
Alia thank you for your review.
Here are my responses and the changes make to -09
Minor Comments:
1) In Sec 2, 3rd paragraph, in the sentence:
"The single node in both S's P-space and E's Q-space is C; thus node C
is selected as the repair tunnel's end-point."
it should be "S's extended P-space"
Correct - changed
2) In Sec 2, it says: "The non-failure traffic distribution is not
disrupted by the provision of such a tunnel since it is only used for
repair traffic and MUST NOT be used for normal traffic."
This is obviously correct and good - but I think it would be very
useful to clarify that OAM traffic to test the rLFA may transit the
tunnel at any time. Otherwise, the MUST NOT could cause some
confusion - depending on how one thinks about "normal traffic".
This now says:
The non-failure traffic distribution is not disrupted by the provision
of such a tunnel since it is only used for repair traffic and MUST NOT
be used for normal traffic. Note that OAM traffic specifically to verify
the viability of the repair MAY traverse the tunnel prior to a failure.
I used viability rather than for example "availability" to cover any
form of OAM test (CC, CV, delay, jitter.....)
I toyed with saying "normal data traffic" and not adding the OAM
sentence, but that would have allowed routing and network management
traffic (other than OAM) which we also need to exclude.
3) In Sec 3: I can't parse "Examples of worse failures are node
failures (see Section 6 ), and through the failure of a shared risk
link group (SRLG), the through the independent concurrent failure of
multiple links, and these are out of scope for this specification."
I think you mean "Examples of worse failures are node failures (see
Section 6), the failure of a shared risk link group (SRLG), the
independent concurrent failures of multiple links; protecting against
such worse failures is out of scope for this specification." I would
add in the failure of broadcast interfaces and NBMA interfaces for
completeness, even though that was mentioned in Sec 2.
This now says:
Examples of worse failures are node failures (see Section 6), the
failure of a shared risk link group (SRLG), the independent concurrent
failures of multiple links, broadcast or non-broadcast multi-access
(NBMA) links [Section 2]; protecting against such worse failures is out
of scope for this specification.
4) In Sec 4.2: "Provided both these requirements are met, packets
forwarded over the repair tunnel will reach their destination and will
not loop." Please change to:
"will not loop after the single link failure". Of course, looping can
happen if a worse failure than protected against occurs - as with
LFA. This could also be mitigated by requiring that the PQ node is
downstream of the PLR, as is mentioned in Sec 4.2.2.
Correct
This now says:
Provided both these requirements are met, packets forwarded over the
repair tunnel will reach their destination, and will not loop after a
single link failure.
5) In Sec 4.2.1.2 <http://4.2.1.2>: "This may be calculated by
computing an SPT at each of S's neighbors (excluding E) and excising
the subtree reached via the path N->S->E."
As described here, a node Y that is reached via N->S->A would be
considered to be in S's extended P-space. I realize that one would
assume that Y would be in S's P-space anyway and thus it is safe to
not care about this edge case. However, the ECMP considerations make
it more complex so please at a minimum add in the same caveat as in
Sec 4.2.1.2 "(including those routers which are members of an ECMP
that includes link S-E)" suitably modified. In the cost-based version
in Compute_Extended_P_Space, this is handled by ignoring any potential
node from N whose shortest path goes back through S. It'd be nice if
the two methods were consistent.
I have changed the text to:
This may be calculated by computing an SPT at each of S's neighbors
(excluding E) and excising the subtree reached via the path N->S->E.
Note this will excise those routers which are reachable through all
ECMPs that includes link S-E.
I am not sure that this clarification is strictly needed since "removal
of the subtree reached via the path N->S->E" would include "those
routers which are members of any ECMP that includes link S-E".
Would it be less confusing if we changed "excising the subtree reached"
to "excising the routers reached"?
6) In Sec 4.2.2: "As described in [RFC5286], always selecting a PQ
node that is downstream with respect to the repairing node, prevents
the formation of loops when the failure is worse than expected."
Could you clarify that the PQ node is downstream with respect to the
repairing node and the destination - rather than the proxy destination
E? I'm fairly certain that the latter wouldn't work (but don't have
an example topology created). If you disagree, let me know and I'll
work on creating one. This is the constraint that is expressed in
Apply_Downstream_Constraint().
I don't think there is a problem in practice since if PQ needed to be
downstream to E WRT S, D_opt(PQ,E) < D_opt(S,E) would apply and in a
unit cost network there would be no PQ nodes since we would need
D_opt(PQ,E) < 1, i.e. a link metric from PQ to E of less than one. PQ
nodes would be so rare that this would no be a practical solution.
I have changed the text to:
As described in [RFC5286], always selecting a PQ node that is downstream
to the destination with respect to the repairing node, prevents the
formation of loops when the failure is worse than expected. The use of
downstream nodes reduces the repair coverage, and operators are advised
to determine whether adequate coverage is achieved before enabling this
selection feature.
7) In Sec 4.3: "The reader is referred to
[I-D.psarkar-rtgwg-rlfa-node-protection] for further information
on the use of RLFA for node repairs." Can you add "and broadcast or
NBMA link repairs"? Do you feel that is accurate?
I cannot see any text on broadcast or NBMA in the draft which is now
draft-ietf-rtgwg-rlfa-node-protection (updates in text)
I have made no text change on the substantive point.
8) In Sec 6: s/"When the failure is a node failure rather than a link
failure"/"When the failure is a node failure rather than a
point-to-point link failure"
Done
9) In Sec 6: "Alternatively one might choose to assume that the
probability of a node failure and microloops forming is sufficiently
rare that the case can be ignored." Can you please clarify from
microloops to "microloops forming due to use of alternates"? We know
that in cases where a rLFA is necessary, that neighbor isn't loop-free
and so regular microloops due to reconvergence will form.
It took a while to understand the comment but I think I know what you mean.
I have changed the text to:
Alternatively one might choose to assume that the probability of a node
failure is sufficiently rare that the issue of looping RLFA repairs can
be ignored.
10) In Sec 7: "In the absence of a protocol to learn the preferred IP
address for targeted LDP, an LSR should attempt a targeted LDP session
with the Router ID [RFC2328] [RFC5305] [RFC5340], unless it is
configured otherwise." Can you please add in some text for how this
would work for IPv6? I believe that there are current drafts
discussing carrying Routable IP addresses (e.g.
http://datatracker.ietf.org/doc/draft-ietf-ospf-routable-ip-address/
). We know that there is interest in having IPv6 only networks with
MPLS - so it'd be good not to create new gaps.
It now says
In the absence of a protocol to learn the preferred IP address for targeted LDP, an
LSR should attempt a targeted LDP session with the Router ID [RFC2328] [RFC5305]
[RFC5340] [RFC6119] [I-D.ietf-ospf-routable-ip-address"
], unless it is configured otherwise.
11) In Sec 8.4: "In an MPLS network, this is achieved without any
scaleability impact, as the tunnels to the PQ nodes are always present
as aproperty of an LDP-based deployment." The targeted LDP sessions
don't have a scaleability impact? That the repair tunnels don't need
to be specifically created as new tunnels, I agree with - but this
statement is overselling. Please make the technical point more clearly.
I have cut this back to
As shown in the table, remote LFA provides close to 100% prefix
protection against link failure in 11 of the 14 topologies studied, and
provides a significant improvement in two of the remaining three cases.
Note that in an MPLS network the tunnels to the PQ nodes are always
present as a property of an LDP-based deployment.
12) In Sec 9: I feel like here is a good place at least mention the
issues with microloops from reconvergence. Since reconvergence after
rLFA is going to result in a local microloop (depending on timing), at
least a reference to
https://tools.ietf.org/html/draft-litkowski-rtgwg-uloop-delay-03 with
a recommendation to consider it is important. Otherwise, the rLFA
repair happens and then traffic microloops and is lost. The fact that
these local microloops occur with real impact much more with rLFA (or
any advanced FRR technique) is an important management consideration.
I have added the following new para:
When the network re-converges, microloops [RFC5715] may form due to
transient inconsistencies in the router FIBs. If it is determined that
microloops are a significant issue in the deployment, then a suitable
loop free convergence methods such as one of those described in
[RFC5715], [RFC6976] or [I-D.litkowski-rtgwg-uloop-delay] should be
implemented.
13) Sec 12: Saying "To prevent their use as an attack vector the
repair tunnel endpoints SHOULD be assigned from a set of addresses
that are not reachable from outside the routing domain." is basically
empty words without more behind Sec 7 default of using Router IDs.
Can you find a reference that talks about a BCP for Router IDs not
being reachable addresses outside the routing domain? Can you describe
how to use the IGP extensions?
Router IDs are used for T-LDP and normal MPLS security applies.
Again with MPLS repair tunnels normal MPLS security applies.
The Section 12 reference was to IP tunnels in an IP rather than MPLS
network. I have changed the text to:
The security considerations of [RFC 5286] also apply.
Targeted LDP sessions and MPLS tunnels are normal features of an MPLS
network and their use in this application raises no additional security
concerns.
To prevent their use as an attack vector IP repair tunnel endpoints
(where used) SHOULD be assigned from a set of addresses that are not
reachable from outside the routing domain.
Nits:
a) In Sec 4.2.1.1 <http://4.2.1.1>: "The exclusion of routers
reachable via an ECMP that includes S-E prevents the forwarding
subsystem attempting to a repair endpoint via the failed link S-E."
s/attempting to a repair/from attempting to use a repair
Done
b) In Sec 10: "We propose "Remote LFA" as a natural second step."
This is going to be an RFC - so rather than propose, try specify.
I have changed this to:
The purpose of LFA FRR technology is to provide for a simple FRR
solution when such a solution is possible. The first step along this
simplicity approach was "local" LFA [RFC5286]. This specification of
"Remote LFA" is a natural second step.
Hopefully these resolutions are acceptable to all. If not please let me
know.
New version at http://datatracker.ietf.org/doc/draft-ietf-rtgwg-remote-lfa/
Diffs at
http://www.ietf.org/rfcdiff?url1=draft-ietf-rtgwg-remote-lfa-08&difftype=--html&submit=Go!&url2=draft-ietf-rtgwg-remote-lfa-09
- Stewart
_______________________________________________
rtgwg mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/rtgwg