Hi Greg, Thanks for your comments. My answers/explanations are inline below.
Best Regards, Huaimo From: mpls [mailto:[email protected]] On Behalf Of Gregory Mirsky Sent: Sunday, April 12, 2015 2:04 AM To: [email protected]; [email protected]; [email protected] Cc: [email protected]; [email protected] Subject: [mpls] Comments on draft-ietf-teas-rsvp-ingress-protection Dear Editors, chairs, WG community, please find my comments to the current version of your work below: * Introduction o The first paragraph may leave an impression that local protection of transit LSRs is not being already addressed, neither by RFC 4090, nor RFC 4875; [Huaimo] Will revise it accordingly. o I think that "global protection" is not commonly used term, "end-to-end protection" seems to be commonly used instead. [Huaimo] It seems that "global protection" is better here since we mentioned "local protection" here. It seems that Global Protection is used often. * Section 3.1 o Third paragraph contains the following requirement: "For a P2P LSP, after the primary ingress fails, the backup ingress must use a method to reliably detect the failure of the primary ingress before the PATH message for the LSP expires at the next hop of the primary ingress." But that is not obvious that such requirement is really needed. Since this is RSVP-TE LSP, why not to use MP2MP construct and let the Source node to control switchover. Especially since, as noted in the last paragraph of Section 2.1, primary and backup ingress nodes must be connected by a logical link, which in general case will be a tunnel. Thus this solution puts a requirement, implicitly though, to instantiate a tunnel per protection group, tunnel that would not be used to carry traffic. [Huaimo] The requirement above seems necessary. If the backup ingress does not detect the failure of the primary ingress before the timer for the PATH message for the LSP at the next hop of the primary ingress expires, the LSP will be down after the primary ingress fails. If the backup ingress detects the failure and sends/refreshes the PATH message to the next hop before the timer expires after the primary egress fails, the LSP will continue being up and carry the traffic from the backup ingress via the backup LSP. For a P2P LSP, it seems that MP2MP construct is not used in RFC 4090 to protect a transit node of a P2P LSP. The logical link between the primary ingress and the backup ingress can be a direct link or a tunnel. It seems that a direct link is common. o In addition, what is importance of requirement quoted above: "... before the PATH message for the LSP expires at the next hop of the primary ingress" [Huaimo] This seems very important. If the timer for the PATH message for the LSP at the next hop of the primary egress expires, then the LSP will be down. So the PATH message must be refreshed before the timer for the PATH message for the LSP expires at the next hop of the primary LSP. o Fourth paragraph makes very questionable assumption in: "After the primary ingress fails, it will not be reachable after routing convergence." I believe that if OAM session is between two nodes there's no reliable way to differentiate between node and link failure. Thus, to declare a node unreachable there must be N tunnels for N OAM sessions that monitor all possible paths between two nodes. (Note, that if there was no requirement to use a tunnel between primary and backup ingress, multi-hop BFD could be used though its detection time being limited by IGP convergence, which may be too slow comparing with your requirement of tens milliseconds). [Huaimo] It is true that "After the primary ingress fails, it will not be reachable after routing convergence." From routing's point of view, there is no need for us to have any OAM session between two nodes. The timer for a PATH message seems in tens of seconds. Routing convergence is not limited to tens of milliseconds. * Section 5.1 o Regarding "Ingress local protection in use" flag As demonstrated earlier, backup ingress node has no reliable way to detect that primary ingress node is not reachable to the Source and thus protection must be activated. [Huaimo] It seems that there is no need for the backup ingress to detect whether the primary ingress is reachable to the Source and the focus is on the failure of the primary ingress. Considering that backup ingress may initiate described in the document actions not when primary ingress became unavailable to Source, I believe that cases that may produce false positives must be removed along with extensions that intended to support these cases. In my opinion, the only viable case of ingress protection is Source-centric where Source monitors availability of both primary and backup ingress nodes and controls traffic switchover. I'd ask WG to discuss these comments and, if agreed, ask Editors to make appropriate changes to the document. [Huaimo] It seems that the current version already indicates that the source-detect (i.e., Source detects the failure of the primary ingress and switches traffic to the backup ingress when the primary ingress fails) is used. There were a few of modes for detecting the failure of the primary ingress that were proposed in the previous versions of the document. A different mode may have a different control on the traffic switch over and/or forwarding. After discussions, the current version selects the source-detect. Can you give more details about the cases in which false positives may be produced? Regards, Greg
