Hi Paul,

> Hehe, My implicit assumption is that there's no significant change in
> allocation densities. :) Stuff like that generally seems to be an
> administive issue, so it's maybe unlikely it can be affected through
> routing changes.

It's not clear that that's a valid assumption, especially in the face of v4
runout.  Will people just become more efficient?  What happens if carriers
do deploy CGNAT?

>> This in turn implies that BGP will take longer to converge at a
>> given node when that node has to process the full table.
> That's the thing though, the most recent data doesn't seem to show
> any evidence of things like that. If per-node convergence was taking
> significantly longer (in the "scaling badly relative to prefix
> growth" sense), then Sigma(per-prefix per-node convergence) should be
> similarly increasing by at least the same amount and BGP observers
> ought to be able to see that, and so the data should show it.

Disagree.  If I understand Geoff's data correctly (and please correct me if
I don't) he's showing that the overall stability of the network hasn't
changed.  The number of updates and the number of withdrawals is scaling
nicely.  He also shows that the incremental convergence of a prefix is
roughly constant.  

This is in no way contradictory to the increasing time that it will take a
router to converge.  The two measures are wholly orthogonal.

Again, consider the situation of a router rebooting somewhere in the middle
of the network.  When the router crashes, adjacent routers will select
alternate paths.  This convergence time is not going to be visible, as Geoff
is measuring the time that it takes from the first change to convergence for
that prefix.  Since the convergence time is largely dominated by MRAI
effects today, it will be difficult to perceive any increase in overall
convergence times due to scale.

Similarly, when a router comes up, it will begin learning and advertising
prefixes.  This will trigger another convergence effect, but the delay in
starting the event will not be shown in Geoff's data.  Subsequent delay in
processing by upstream routers will get lost in MRAI as before.  The router
convergence time that I'm concerned about is the time that it takes this new
router to learn and advertise the full table of routes.  Pretty clearly,
this time is linear in the number of prefixes.  Thanks to the existence of
multiple paths, this delay is not going to be seen by end nodes.

To see this effect more clearly, consider a thought experiment where it
takes a router an hour (or a day, a week or a month) to boot and process all
prefixes.  What is the effect on the network?  Would Geoff's numbers change?

> I'm sorry for being such an arse with my scepticism, and I'll
> understand if people reply to me as if I'm half-wit, but if scaling
> is a problem surely it should be apparent in some data somewhere over
> the last decade+ that people have been worrying about it? Where's the
> smoking gun graph, based on real data, that shows the scaling
> problem? I'm somewhat willing to take your word as authoritative, but
> ideally we'd have graphs :).

<employer hat off>
Any operator who would like to stand up and embarrass their favorite router
vendor by showing a graph of router boot convergence times is welcome to do
so.  ;-)
</employer hat off>
> I stress again that, despite taking this contrarian view of the
> scaling problem, I still think the work here is very important!

I'll just point out the last slide of Geoff's talk:

"Will BGP Continue to Scale?

Only if: the address system continues to maintain strong alignment with
network topology & provider based addressing policies assist in maintaining
a viable global routing infrastructure."

The whole point of the work here is to decouple addressing from identity so
that it can be more easily aligned with topology.


rrg mailing list

Reply via email to