Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-14 Thread Tore Anderson
* Job Snijders 

> TEXT:
> In network topologies where BGP speaking routers are directly
> attached to each other, or use fault detection mechanisms such as
> BFD, detecting and acting upon a link
> down event (for example when someone yanks the physical connector)
> in a timely fashion is straightforward.
> 
> So we should add something that even though detection is
> straightforward, and initiating action as a result of this event can be
> done timely, we cannot be sure of timely termination of whatever actions
> are taken because of the event, and therefor the recommendation is to
> shutdown sessions before doing maintenance, even though networks are
> directly connected to each other.
> 
> The above matches my operational experience and aligns with how we
> perform router maintenance.
> 
> There are a number of considerations:
> 
> - an operator may not know whether they are directly connected
> - even if directly connected, the remote side might not be able to
>   convergence in a timely fashion
> 
> Perhaps the paragraph should just be removed?

Yes.

Here's a quick suggestion or starting point for discussion (intended to
replace section 1 in its entirety):

   BGP Session Culling is the practice of ensuring BGP sessions are
   forcefully torn down before maintenance activities on a lower layer
   network commence, which otherwise would affect the flow of data
   between the BGP speakers.

   BGP Session Culling ensures that network maintenance activities
   cause the minimum possible amount of disruption, by giving BGP
   speakers advance notice of an impending outage, so they may
   preemptively react to it by gracefully converging onto alternate
   paths while the forwarding plane is still fully operational.

   The grace period required for a successful implementation BGP Session
   Culling is the sum of the time needed to detect the BGP session loss
   plus the time required for the BGP speaker to converge on alternate
   paths. The first value is in the worst case governed by the BGP Hold
   Timer (section 6.5 of [RFC4271]). The second value is implementation
   specific, but could be as much as 15 minutes or more in the case of
   sessions where a router with a slow control plane is receiving a full
   set of Internet routes.

   Operators implementing BGP Session Culling are in any case
   encouraged to avoid using a fixed grace period, but instead monitor
   forwarding plane activity while the culling is taking place and
   consider it complete once traffic levels have dropped to a minimum
   [Section 2.3].

Tore

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-14 Thread Job Snijders
On Tue, Mar 14, 2017 at 01:05:08PM +0100, Tore Anderson wrote:
> * Nick Hilliard 
> > Tore Anderson wrote:
> > > In other words: in my opinion, BGP session culling should be
> > > considered a BCP even in situations where link state signaling
> > > and/or BFD is used. IP-transit providers should perform culling
> > > towards their customers ahead of maintenance works. Direct peers,
> > > likewise.  
> > 
> > probably not much need if bfd is used because that would operate
> > route-to-router.
> 
> Quite the contrary, there is very much a need in this case too. If there
> are many active routes that will become invalid, converging on
> alternate paths (reprogramming the FIB) can take significantly longer
> than actually detecting the outage (even if it's detected only using
> BGP timers).
> 
> > > IXPs aren't at all special regarding the fundamental need for session
> > > culling, only in the method by which it is accomplished (i.e., using
> > > layer-2 ACLs).  
> > 
> > Correct, but for direct peers over PNIs, etc, the operator will usually
> > have control over the bgp session.  What we're talking about here is a
> > situation where there is an intermediate operator which has no direct
> > admin control over bgp sessions.
> 
> The draft is most definitively also talking about the situations where
> the operator does have admin control over the BGP session (section 2.1).

TEXT:
In network topologies where BGP speaking routers are directly
attached to each other, or use fault detection mechanisms such as
BFD, detecting and acting upon a link
down event (for example when someone yanks the physical connector)
in a timely fashion is straightforward.

So we should add something that even though detection is
straightforward, and initiating action as a result of this event can be
done timely, we cannot be sure of timely termination of whatever actions
are taken because of the event, and therefor the recommendation is to
shutdown sessions before doing maintenance, even though networks are
directly connected to each other.

The above matches my operational experience and aligns with how we
perform router maintenance.

There are a number of considerations:

- an operator may not know whether they are directly connected
- even if directly connected, the remote side might not be able to
  convergence in a timely fashion

Perhaps the paragraph should just be removed?

Kind regards,

Job

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-14 Thread Tore Anderson
* Nick Hilliard 

> Tore Anderson wrote:
> > In other words: in my opinion, BGP session culling should be
> > considered a BCP even in situations where link state signaling
> > and/or BFD is used. IP-transit providers should perform culling
> > towards their customers ahead of maintenance works. Direct peers,
> > likewise.  
> 
> probably not much need if bfd is used because that would operate
> route-to-router.

Quite the contrary, there is very much a need in this case too. If there
are many active routes that will become invalid, converging on
alternate paths (reprogramming the FIB) can take significantly longer
than actually detecting the outage (even if it's detected only using
BGP timers).

> > IXPs aren't at all special regarding the fundamental need for session
> > culling, only in the method by which it is accomplished (i.e., using
> > layer-2 ACLs).  
> 
> Correct, but for direct peers over PNIs, etc, the operator will usually
> have control over the bgp session.  What we're talking about here is a
> situation where there is an intermediate operator which has no direct
> admin control over bgp sessions.

The draft is most definitively also talking about the situations where
the operator does have admin control over the BGP session (section 2.1).

Tore

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-14 Thread Nick Hilliard
Tore Anderson wrote:
> In other words: in my opinion, BGP session culling should be considered
> a BCP even in situations where link state signaling and/or BFD is used.
> IP-transit providers should perform culling towards their customers
> ahead of maintenance works. Direct peers, likewise.

probably not much need if bfd is used because that would operate
route-to-router.  Link state signaling is problematic because it's not
necessarily transferred to all the devices that need to see the link
state changes.

> IXPs aren't at all special regarding the fundamental need for session
> culling, only in the method by which it is accomplished (i.e., using
> layer-2 ACLs).

Correct, but for direct peers over PNIs, etc, the operator will usually
have control over the bgp session.  What we're talking about here is a
situation where there is an intermediate operator which has no direct
admin control over bgp sessions.

Nick

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-14 Thread Tore Anderson
* Nick Hilliard 

> Tore Anderson wrote:
> > My point here was that if the IXP is doing maintenance, it could shut
> > all ports to all members simultaneously, and thus get the exact same
> > effect as the «when someone yanks the physical connector» scenario
> > described in the draft.  
> 
> this doesn't work because 1. some ixp participants connect their
> routers via intermediate switches and if all ports are yanked
> simultaneously, they will blackhole traffic on their side and 2. any
> ixp with more than one switch in their peering fabric needs to be
> able to performance maintenance on part of their ixp without
> affecting the rest.

Maybe we're talking past each other.

I fully agree with you that this does not work sufficiently well, which
by extension means that the draft is wrong in suggesting that BGP
session culling is only needed «in topologies where upper layer fast
fault detection mechanisms are unavailable and the lower layer topology
is hidden».

In other words: in my opinion, BGP session culling should be considered
a BCP even in situations where link state signaling and/or BFD is used.
IP-transit providers should perform culling towards their customers
ahead of maintenance works. Direct peers, likewise.

IXPs aren't at all special regarding the fundamental need for session
culling, only in the method by which it is accomplished (i.e., using
layer-2 ACLs).

Tore

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-14 Thread Nick Hilliard
Tore Anderson wrote:
> By the way, as an IXP operator, you also have the possibility to simply
> shut down your members' interfaces prior to performing maintenance,
> instead of doing culling. Doing so would be completely analogous to the
> directly connected BGP speakers scenario discussed in section 1 where the
> draft says «detecting and acting upon a link down event (for example
> when someone yanks the physical connector) in a timely fashion is
> straightforward».

- if the ixp port is connected to an intermediate switch on the member
side, the ixp member router won't see the carrier state transition and
will blackhole traffic on the member side

- all other ixp members who peer with that router also won't see the
carrier state and will leave bgp up, causing traffic to be blackholed on
the remote side.

Nick

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-13 Thread Nick Hilliard
Will Hargrave wrote:
> With a 30 minute cull window, there is substantial concern that an
> operator will begin to debug the ‘problem’, discover ICMP PING works but
> TCP/179 doesn’t work, and get very annoyed at this strange behaviour. I
> think we should operate on a principle of least surprise here.

We have found that re-mailing technical contacts with a "Maintenance is
about to start in 1 hour" reminder helps quite a lot in this sort of
situation.  Low tech is sometimes good.

Otherwise, 30 minutes for traffic drain is unnecessary - we keep our
interface counters on 30 second load average so can see when the
majority of traffic is gone; this always happens within a couple of
minutes of the 3m bgp dead time drop.

Usually this sort of thing happens at omg o'clock in the morning, and
we've found that reducing the time load helps quite a bit with both
accuracy and staff morale, as all our staff usually work regular office
hours.  Again, this sort of rationale would be outside the scope of IETF
BCPs, but they are real enough concerns that I'd personally be
uncomfortable with recommending any delays.

Nick

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-13 Thread Will Hargrave

Hello, Tore and GROW, I am new here.

On 13 Mar 2017, at 11:11, Tore Anderson wrote:

Another logical consequence of this is that the rather imprecise «a 
few
minutes» should ideally be expanded on, taking slow routers such as 
the

MX80 into account. While five minutes of culling would be helpful, it
would not be enough to avoid all disruption.


One point of the technique is that the ‘lower layer caretaker’ looks 
at their interface traffic counters to ensure traffic has dropped to 
near-zero before commencing the ‘destructive’ part of the 
maintenance. As a result no traffic is affected.


If a tree falls in the forest and there is no-one for it to land on, 
does it matter? :-)


Our operational experience at LONAP shows this does usually happen 
within 5 minutes.



If the operator/peer would decide to cull the session, say, 30 minutes
ahead of the maintenance, that would be great for me and others in my
situation.


I suspect this may be unnecessary, but do not have extensive data to 
back this up.


With a 30 minute cull window, there is substantial concern that an 
operator will begin to debug the ‘problem’, discover ICMP PING works 
but TCP/179 doesn’t work, and get very annoyed at this strange 
behaviour. I think we should operate on a principle of least surprise 
here.


___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow


Re: [GROW] Comment on draft-iops-grow-bgp-session-culling

2017-03-13 Thread Job Snijders
Hi Tore,

On Mon, Mar 13, 2017 at 12:11:34PM +0100, Tore Anderson wrote:
> While I wholeheartedly agree with the recommendations in the draft and
> really wish my providers and peers will all incorporate them into their
> maintenance routines, I have one comment on the following text:
> 
>In network topologies where BGP speaking routers are directly
>attached to each other, or use fault detection mechanisms such as BFD
>[RFC5880], detecting and acting upon a link down event (for example
>when someone yanks the physical connector) in a timely fashion is
>straightforward.
> 
> Not necessarily. Speaking from direct [painful] experiences, a router
> with a slow control plane (such as the Juniper MX80) can easily need 15
> minutes or so to fully converge after link down has been detected on an
> interface to which it had a large number of active routes. The prime
> example would be a connection to an IP-transit provider advertising
> full DFZ tables.
> 
> So in my opinion, BGP Session Culling really ought to be the BCP in all
> topologies, not just the ones «where upper layer fast fault detection
> mechanisms are unavailable and the lower layer topology is hidden from
> the BGP speakers».
> 
> Another logical consequence of this is that the rather imprecise «a few
> minutes» should ideally be expanded on, taking slow routers such as the
> MX80 into account. While five minutes of culling would be helpful, it
> would not be enough to avoid all disruption.

I think you make a valid point, would you be willing to prepare a change
proposal to address this? The document's authoritive source is on
github: https://github.com/bgpculling/draft-bgp-session-culling 

> If the operator/peer would decide to cull the session, say, 30 minutes
> ahead of the maintenance, that would be great for me and others in my
> situation. For single-homed customers of an IP-transit provider, on the
> other hand, any amount of culling only contributes to making the outage
> longer than necessary. So maybe it should be a suggestion to make it
> possible to opt out of the culling behaviour.

The myriad of issues resulting from a network single-homing on a single
connection behind a single router might be out of scope for this
document.

Kind regards,

Job

___
GROW mailing list
GROW@ietf.org
https://www.ietf.org/mailman/listinfo/grow