Re: [sidr] Scaling properties of caching in a globally deployed RPKI / BGPSEC system

Tim Bruijnzeels Fri, 16 Nov 2012 07:46:01 -0800

Hi,

Some more comments on the numbers and formula..

On Nov 15, 2012, at 5:36 AM, Arturo Servin <[email protected]> wrote:

> Erick
> 
>       Very interesting research. But I am finding difficult to understand how
> you got 1.4 M objects.
> 
>       Let me try to explain what I have seen in the young deployment of RPKI.
> For simplicity let's use the "hosted" model of RIRs.
> 
>       Let's suppose, each RIR issues in average 1 certificate per member
> containing all the resources (v4,v6,ASNs). I would say that there are
> 40,000 entities holding IP(4 and 6) prefixes and/or ASNs in the world
> (same as number of ASs). What I have see is that most prefix holders
> issue one roa with all the resources, but let's use 5 ROAs per
> organization as average. Then we have:
> 
> 40,000 certificates
> 200,000 roas
> 80,000 CRL,manifest
> 40,000 ghostbuster (not very deployed but let's count it)
> 
>       Am I missing something besides the Router EE? Or is the Router EE that
> makes the difference?
> 
>       It seems that we agree that Ototal is the same equation, but the values
> for Cas, Eas, etc. are different.

I don't agree with the formula although my O total is closer..

The formula in the document confuses SIAs with the number of repository (rsync) 
servers.

Apart from any signed objects it may publish, every CA typically has:
= 1 certificate published by its parent
= 1 manifest
= 1 CRL
= 1 GB record (as Arturo said not widely deployed, but let's throw it in for 
full deployment)

So that's 4 objects.

During a key roll the CA will have the following additional objects:
= 1 cert published by the parent
= 1 manifest
= 1 CRL

Making 7 objects. But typically not all CAs roll at the same time.

The number of signed ROAs and Router certificates does, in my opinion, not 
depend on the number of CAs, but:
= ROA -> The number of announcements seen in BGP * some aggregation factor (1 / 
# average prefixes on one ROA)
= Router certs -> The number of ASNs * the number of keys for each ASN

So I think a better model would be to say:

number           object
#CA                 CA cert
#CA                 MFT
#CA                 CRL
#CA                 GB

#prefixes * X  ROAs

#ASNs * Y      Router certs

 Ototal  = 4 * #CA + #prefixes * X + #ASNs * Y

As for the numbers.. this is a bit of a guessing game.. we just really don't 
know at this time. We can take our best guess, but should keep in mind that our 
best guess is probably off, and needs re-evaluation in years to come.

 #CAs

If this were the total number of current members for all RIRs this number would 
be around 40k. However, there are also PI users that are not direct members of 
the RIRs, and some members will delegate some of their resources further. For 
reference I believe that in the RIPE region we have around 25k PI prefixes. I 
expect that a lot of the organisations that hold these resources will be happy 
to let a sponsoring RIR member (LIR in our region) manage their ROAs. But not 
all.. So I think that in a full deployment world this number may be 
significantly bigger. If anyone has any ideas on this, please chime in… Going 
on nothing more than gut feeling I would say the total could be in the order 
of: 

 = 40k RIR members plus 40k self managing PI holders / children of members?

   80k. 

#prefixes and 'X'

The number of announced prefixes is still rising. Currently we are nearing 500k.
Worst case X is 1, meaning every ASN - prefix combination has its own ROA.

In reality this number will be lower because we can and do aggregate. But not 
all implementers will do this. There is something to be said for *not* 
fate-sharing ROAs for different prefixes from the same ASN. Also, most our 
members are fairly small, and they do not do huge numbers of announcements 
individually. Our current aggregation rate -- and we really try... is:
  792 ROAs / (2197 IPv4 + 468 IPv6 prefixes) = 0.3 ROA/prefix

(see: http://certification-stats.ripe.net/) 

For scalability assessment I am not sure though that a factor of (1/0.3=) 3 
between this level of aggregation, which seems best case, and no aggregation, 
worst case, is that significant in the big picture. I will use the mean of 
these two numbers below..

#ASNs * Y

The number of current ASNs is approx. 40k. I have no good idea about how many 
keys are needed at a single ASN. I know there was discussion about a single key 
per router, and re-using the same key for different routers. The suggested 
number 10 seems as good as any guess I could make. If anyone has more clue 
here, again, chime in please..

>       But, anyways. It's aprox 350K objects.

Combining the above guesses I arrive at:

 O total = 4 * 80k + 500k * 0.65 + 40k * 10    ==> ~ 1M

> If we want to load all in 5
> minutes (it would be good to define what is acceptable) we need to
> deliver objects in less 0.00085 secs.

In principal I like the approach to turn this around and define what an 
acceptable average delivery rate would be given the total number of objects and 
the maximum sync time. But on the other hand this can lead to rejecting any 
infrastructure we could come up with.. just set the goals high enough and 
nothing will be enough. So I think we should be cautious here. If there are 
absolute objective minimal requirements it would be good to know them. But 
other than that it may be best to be pragmatic about it and turn this back.. 
try to think of other ways and see if they actually perform significantly 
better..

Bearing with the document, if we take the current rsync repositories as a 
starting point to see where we are heading without changes:
= It should be noted that fetch times depend on lay-out, hierarchical layout, 
allowing recursive, saves a lot of overhead (and latency)
= We *do* recursive fetches on all current RIR repositories (yes, we hacked in 
which base directory to use)
= Testing on my laptop I typically see fetch times of around 20ms per object, 
not 628ms

A full fetch based on today's numbers would then take 1M * 20 ms = 20k seconds 
= 330 minutes = 5.5 hours = 0.25 days.

So this is quite a bit faster. Eric, Danny how did you get to the numbers in 
table 3? Like I said: I just got some times from validation runs done on my 
laptop. We do collect data from our validators 'in the wild' though.. 
unfortunately the format of this data (we store a *lot*) is such that it will 
take some time for me to dig out more representative numbers. More time than I 
have now, but I will try….

Another important factor is the amount of RPs that we can expect.. I know that 
Rob and Randy et al are looking into ways to let RPs share data and be less 
reliant on central repository servers. On the other hand if all ASNs run at 
least 1 rpki validation cache that talks to the repositories directly then 
we're looking at 40k clients. If they want updates, say just 3 times per day, 
that's 120k requests per day, so something like 1 - 1.5 per second.

Our server scalability experiments that I talked about at the Vancouver IETF 
suggest that we would have a hard time scaling rsyncd to these numbers:
http://www.ietf.org/proceedings/interim/2012/07/27/sidr/slides/slides-interim-2012-sidr-5-0.pdf

= Slide #24.

NOTE: This slide does not show the time it takes for a relying party to 
download the repository.

This slide shows the happy case where all RPs are up to date, and they just 
check with the server to see if there are any updates. So importantly this does 
not include long running data transfers.. This is not server grade hardware 
(just a mac mini) but it's useful as an order of magnitude indication imo. We 
see that the total number of RPs just checking for updates that the server can 
handle / second depends linearly on the repository size (it needs to do an O(n) 
scan). The total number of concurrent RPs also depends on the repository size. 
It appears that some list / index is kept in memory for every connection. Long 
story short, we can only process small numbers of RPs per second and it's quite 
trivial to end up with too many concurrent RPs pushing the servers to the 
memory limited cliff for huge repositories.

It's because of this that I keep going on about:
= We should have a separate delta protocol and notification mechanism and not 
rely on rsync for this
= For scalability:
     = the hard work (CPU) should be done by the clients, not the server
     = it should be possible to offload connections & memory away from the 
server (proxies)
= It makes sense to look at http and scalability of existing CDNs for delivery

My gut feeling (yes it's involved a lot today) tells me that this SHOULD scale 
a lot better.. For example serving a small update notification file over http 
using a CDN, 10ks of request / second, easy.. Data transfer to RPs.. probably 
not a whole lot better actually if you need it all -- though using existing 
CDNs with a global infrastructure closer to RPs may help here. But if we have a 
notification file that points to small deltas and fetching these small files is 
cheap this may actually be a big improvement.. So although it may take a while 
to do the first sync, *staying* in sync may actually perform a lot better. 
Well.. all this is thought experiment at this stage. Without a pilot and actual 
measurements it's hard to be sure.

Regards

Tim

_______________________________________________
sidr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/sidr

Re: [sidr] Scaling properties of caching in a globally deployed RPKI / BGPSEC system

Reply via email to