Hi,
I hope you all are doing well!
This is Shahrooz, a fourth-year Ph.D. student at the Department of
Computer Science at the University of Massachusetts Amherst working
under the supervision of prof. Arun Venkataramani.
We often read that the Internet (i.e., BGP) has a long convergence
delay. But why is it so slow? Moreover, can we (researchers) do anything
about it?
Please help us out to find out by answering our short anonymous survey
(<10 minutes).
Background:
Measurements on the real Internet show that BGP, the Internet routing
protocol, converges slowly upon link or node outages. Convergence delay
of BGP can be defined (informally) as the time since a root cause event
such as a link or node failure happens until all of the routers affected
by that event on the Internet update their best route to a new stable
one. According to the operators monitoring convergence time, BGP takes
more than 30 seconds to converge upon remote outages, on average. This
long convergence delay can result in long data-plane downtime for many
destinations, during which packets towards many destinations are lost.
During BGP convergence delay, each router will process a newly received
route from its neighbors and announce its new best route to that
destination to its neighbors. In order to reduce the number of times
that a router announces a new route to its neighbors, RFC suggests that
routers use a timer called MRAI (Minimum Route Advertisement Interval).
After a router has sent an advertisement to a neighbor, it has to wait
for at least the MRAI before sending a new route advertisement for the
same destination to the same neighbor. The straightforward way to
implement the MRAI would be on a per-destination basis, i.e., maintain a
separate timer for each destination and each neighbor. However, in
practice, usually, it is being implemented per-peer. The default value
of MRAI suggested by RFS is 30 seconds. Juniper routers use an out-delay
timer instead of an MRAI timer that specifies how long a route must be
present in the Junos OS routing table before it is exported to BGP.
This survey:
This survey aims at finding the best current practices on the Internet
about MRAI/"delay out" timer values. In addition, we expect the findings
to increase the understanding of the perceived BGP convergence on the
Internet, which could help researchers design better solutions for BGP
long convergence delay.
Survey URL: https://forms.gle/VNRpU2MzRU8DX1o57
We expect the questionnaire to be filled out by network operators whose
job relates to BGP operations. It has a total of 6 questions and should
take less than 10 minutes to answer.
A summary of the aggregate results will be published as a part of a
scientific article later (hopefully :) this year.
Thank you so much in advance, and we look forward to reading your
responses! We would also be extremely grateful if you could forward this
email to any operator you might know who may not read RIPE.
Best,
Shahrooz