Hey Tim, My turn to apologize for the delay. I have taken your very excellent feedback and tried to address it in a new revision of the tech note. That should be coming out soon. Please do let me know if it seems off again!
Eric On Nov 23, 2012, at 11:26 AM, Tim Bruijnzeels wrote: > Hi Eric, > > Sorry for the late response.. > > With regards to some of your comments regarding requirements and ideas for > alternative deltas, some responses inline below, but other than that see the > thread where both Bryan Weber and I talk about our ideas. This is somewhat > separate from the questions about "how a fully deployed RPKI tree would look > like", and "what churn we might see" and "how many connecting RPs, and how > frequent". All that can feed into requirements, but is in essence another > discussion, so I suggest we keep them separate. > > > On Nov 16, 2012, at 10:44 PM, Eric Osterweil <[email protected]> wrote: > >> >> On Nov 16, 2012, at 10:45 AM, Tim Bruijnzeels wrote: >> >>> Hi, >>> >>> Some more comments on the numbers and formula.. >>> >>> On Nov 15, 2012, at 5:36 AM, Arturo Servin <[email protected]> wrote: >>> >> >> <snip> >> >>> Apart from any signed objects it may publish, every CA typically has: >>> = 1 certificate published by its parent >> >> But, as I asked Arturo, would we expect to have a CA from each parent (i.e. >> each RIR that an org may allocations from)? While this may often still just >> be 1, it seems important to note, no? THat was one reason we had for >> breaking it apart. I'm more than willing to believe that I'm wrong here, >> but I'd like to understand how. >> >>> = 1 manifest >>> = 1 CRL >>> = 1 GB record (as Arturo said not widely deployed, but let's throw it in >>> for full deployment) >> >> How does the RPKI certify ASN allocations? This is needed to certify router >> EE certs (they are tied to an ASN, no?). Such an AS cert would add another >> 1 to the above and make it 5, right? Otherwise, how does a forward signing >> peer associate itself with the ASN, instead of some prefix? The ASN isn't >> necessarily allocated by the same RIR as any of the allocations, so IP >> allocation and ASN allocation are orthogonal, no? > > Okay, let me rephrase.. it gets confusing when talking about organisations as > CAs, and CA certificates: i.e. certificates that have the CA bit set and can > sign and publish other certificates, signed objects etc. > > We're talking 4 objects for each CA Certificate: > = 1 CA Certificate with IPv4, IPv6 and/or ASN > = 1 mft > = 1 CRL > = 1 GB > > An organisation may have more than one CA certificate: > = if it has different parents and get certificates from more than one > = if it has 'minority space'. For example if one of our members has resources > for which another RIR is the majority holder, then this is not on our normal > certificate. We get a certificate with all such resources signed by the other > RIR, and then we use this to sign an additional certificate for this member. > Certificates can have only one parent, so we can not merge this. > > Regarding the second point: I haven't done detailed analysis on our data yet. > This does affect at least a portion of our members, so the average number of > CA certs per member organisation is more than 1. I expect that it will not be > a *lot* higher, but like I said I haven't analysed the data yet. > > Having said that, it seems that the relative component of these, in a sense, > boilerplate objects of the total count, are of a different order than the > amount of expected ROAs and router certificates. > >>> >>> So that's 4 objects. >>> >>> During a key roll the CA will have the following additional objects: >>> = 1 cert published by the parent >>> = 1 manifest >>> = 1 CRL >>> >>> Making 7 objects. But typically not all CAs roll at the same time. >> >> Unless it is an algorithm rollover, and that is expected to last for years >> (iirc). Then this set would be doubled (plus double the numbers below), >> right? >> > > I did not consider algorithm roll over, but I think you're right. > >>> >>> The number of signed ROAs and Router certificates does, >> >> And EE certs. While 1:1 with ROAs, they require additional (very different) >> processing, esp if you start down the road of HSMs. So, we claimed this >> additional operational requirement means that even if you double up on the >> downloads, those are still two separate objects. You have to manage EE >> rollover keeping crypto material the same or changing it, depending on >> details of the ROA, etc. That won't come for free, and (again) needing HSMs >> makes this a big deal. So, we really felt it was important to call this >> complexity out by counting each. > > I don't understand the use of EE certs and HSMs here. And I don't see how > this can significantly raise the object count. > > In the rpki EE certs can only be used to sign rpki signed objects and they > are embedded in those objects, not published separately. > > All keys in our online system are protected by HSMs. The keys in EE > certificates even more so: they are generated in an HSM, used only once, and > then forgotten. In the online system the HSM protects against key theft if an > attacker somehow gets access to the database/file system.. The will not be > able to export the keys without access to a quorum of key cards that protect > the internal key of the HSM. > > If you want to use an *offline* key protected by an HSM, then you would have > an additional CA cert (and mft, crl and gb). This is what we do with our TA > at least: > > TA (offline) -> signs CA cert for online use => signs member CAs > > The idea being that if the worst happens to our online system, we can rebuild > and re-issue using the more secure offline key. The hosted members don't have > their own offline key, they are protected by the same one.. Non-hosted CAs > may want to use an intermediate offline certificate. This adds some overhead: > 4 objects per offline key (CA cert, mft, crl, gb). > >>> in my opinion, not depend on the number of CAs, but: >>> = ROA -> The number of announcements seen in BGP * some aggregation factor >>> (1 / # average prefixes on one ROA) >> >> I, pretty much agree with this (as I think the tech-note said). I do, >> however have to note that with MOAS, you need multiple ROAs. Small point, >> but worth stating. :) > > Yes, a ROA can have one ASN only, but multiple prefixes. > > We aggregate prefixes for the same ASN as much as we can on a single ROA. So > far we are managing a factor of around 3 prefixes per ROA. Lacnic is similar. > The other RIRs seem to aggregate a bit less at this time. > > See here: > http://certification-stats.ripe.net/ > > >> >>> = Router certs -> The number of ASNs * the number of keys for each ASN >> >> \times The number of eBGP speakers you mean, right? > > Yes. My model assumes that this number of speakers can related to the number > of ASNs. > > Randy suggested a number of physical bhp sec speaking routers and that they > may have two keys. That may well be a better model. > > >> >>> >>> So I think a better model would be to say: >>> >>> number object >>> #CA CA cert >>> #CA MFT >>> #CA CRL >>> #CA GB >> >> Ack, we just estimated this as the # of SIAs, and then varied it from 5 to >> 42,000. > > 5 here would mean the current RIR TAs without any other signed content. > > The total object count for this depends on the number of CA certs. See my > previous best estimate below. > > >> >>> >>> #prefixes * X ROAs >> >> Yeah, but we didn't guestimate the $X$ value. It sounds like we should, but >> is there any data we can use to do so? >> >>> >>> #ASNs * Y Router certs >>> >>> Ototal = 4 * #CA + #prefixes * X + #ASNs * Y >> >> Re: the above, I think this would be >> >> O^Total = 4 * #CA + #ASNs + #prefixes + X * #prefixes + Y * #ASNs >> >> We had called out the need for AS EE cert (which was not in the equation you >> outlined), and we felt it was important to not omit EE certs (if for no >> other reason than the operational complexity they bring). >> >>> >>> As for the numbers.. this is a bit of a guessing game.. we just really >>> don't know at this time. We can take our best guess, but should keep in >>> mind that our best guess is probably off, and needs re-evaluation in years >>> to come. >> >> 100% agree. That is why we called this a back-of-the-envelope calculation >> and are totally seeking feedback, and are absolutely interested in pushing >> revisions as things evolve. >> >>> >>> #CAs >>> >>> If this were the total number of current members for all RIRs this number >>> would be around 40k. However, there are also PI users that are not direct >>> members of the RIRs, and some members will delegate some of their resources >>> further. For reference I believe that in the RIPE region we have around 25k >>> PI prefixes. I expect that a lot of the organisations that hold these >>> resources will be happy to let a sponsoring RIR member (LIR in our region) >>> manage their ROAs. But not all.. So I think that in a full deployment world >>> this number may be significantly bigger. If anyone has any ideas on this, >>> please chime in… Going on nothing more than gut feeling I would say the >>> total could be in the order of: >>> >>> = 40k RIR members plus 40k self managing PI holders / children of members? >>> >>> 80k. >> >> Really? I had been thinking that this number was tied to the origins, but I >> can see your logic. It would great to try and find a way to estimate this, >> so I'd like to echo your request for anyone with info to chime in. >> >>> >>> #prefixes and 'X' >>> >>> The number of announced prefixes is still rising. Currently we are nearing >>> 500k. >>> Worst case X is 1, meaning every ASN - prefix combination has its own ROA. >>> >>> In reality this number will be lower because we can and do aggregate. But >>> not all implementers will do this. There is something to be said for *not* >>> fate-sharing ROAs for different prefixes from the same ASN. Also, most our >>> members are fairly small, and they do not do huge numbers of announcements >>> individually. Our current aggregation rate -- and we really try... is: >>> 792 ROAs / (2197 IPv4 + 468 IPv6 prefixes) = 0.3 ROA/prefix >>> >>> (see: http://certification-stats.ripe.net/) >>> >>> For scalability assessment I am not sure though that a factor of (1/0.3=) 3 >>> between this level of aggregation, which seems best case, and no >>> aggregation, worst case, is that significant in the big picture. I will use >>> the mean of these two numbers below.. >> >> Fair enough. I can certainly see the logic here, but if we wound up with a >> good way to do the estimation that would be even better, no? :) >> >> <snip> >> >>> In principal I like the approach to turn this around and define what an >>> acceptable average delivery rate would be given the total number of objects >>> and the maximum sync time. But on the other hand this can lead to rejecting >>> any infrastructure we could come up with.. just set the goals high enough >>> and nothing will be enough. So I think we should be cautious here. If there >>> are absolute objective minimal requirements it would be good to know them. >>> But other than that it may be best to be pragmatic about it and turn this >>> back.. try to think of other ways and see if they actually perform >>> significantly better.. >> >> I can respect this concern, but we really do have to deal with any >> systemic/complexity/operational/etc. facets of the system that we have >> designed. We need to know how this design is going to behave if we are >> going to enshrine it. For example, the above calculations, and ours, and >> any derivative formulation would make revoking a key within an hour seem to >> be impossible. Is it a day, a week, a month, etc? That may still be >> unclear (depending on how we model this), but how can we go any farther >> forward without taking a careful look at this design? We must know if it >> meets our requirements, and I think measurements like these help tell us how >> feasible this will all be. >> >>> Bearing with the document, if we take the current rsync repositories as a >>> starting point to see where we are heading without changes: >>> = It should be noted that fetch times depend on lay-out, hierarchical >>> layout, allowing recursive, saves a lot of overhead (and latency) >>> = We *do* recursive fetches on all current RIR repositories (yes, we hacked >>> in which base directory to use) >>> = Testing on my laptop I typically see fetch times of around 20ms per >>> object, not 628ms >> >> To be perfectly fair, I just used the #s I found form the BGPSEC design >> team's measurements. I was very hesitant to use any particular set of >> numbers here. As I'm sure you know, there are opinions about rsync, what it >> looks like under load, with asynchrony, repos' operational uptimes, >> restarting because of changes in the repository (I think that was something >> from your preso), etc. I think you likely know (much better than me) that >> if a repo is under heavy load, churn, or just having an outage, that can >> cause cache's sync time to suffer. Hence, I really liked getting real >> operational data from non-lab measurements. I actually really feel that >> data is quite useful. Consider this, you all (who are running these things) >> are the experts. If we wind up w/ ~42,000 repos, they will _not_ all be run >> as well as you run them. >> > > I understand your concern about many repos, some of them small and possibly > not well managed. > > But the number of repos is not 1:1 the number of organisations acting as CAs. > > We currently see 5 different repositories for the hosted solutions provided > by the RIRs. Non-hosted is being worked on, but so far not done in the > production environment. When this arrives, some non-hosted people will want > to do their own publishing, some others will use a bigger repository, for > example with their RIR, or it may be that 3rd parties will start providing > this service. In any case 42k repositories seems a bit much, though more than > 5 is very likely. > >>> >>> A full fetch based on today's numbers would then take 1M * 20 ms = 20k >>> seconds = 330 minutes = 5.5 hours = 0.25 days. >> >> Sorry, but I really think this has some problems. First, the numbers I see >> in the cited preso are way larger than this, and just for the object sets we >> see today. So, I have to say that this calculation doesn't seem to jive >> with Randy's numbers. >> > > As you can read in my other emails I agree in general. I don't think rsync > can scale to the levels we need. > > > But the numbers on today's *small* repositories can be improved a lot by > making them hierarchical, or if your validator happens to know that it can > use a higher directory. We do that last thing, rcynic does not. We made our > repository hierarchical though. > > I think Randy's numbers may be outdated and represent the totals for our old > *flat* rsync repo. This adds a huge amount of latency and setup overhead. > > The numbers I got was by just running our validator* and watch the log for > lines like: > 16:51:12,883 INFO Prefetching 'rsync://rpki.ripe.net/repository/' > 16:52:06,980 INFO Done prefetching for 'rsync://rpki.ripe.net/repository/' > 16:52:26,189 INFO Finished validating RIPE NCC RPKI Root, fetched 4447 valid > Objects > > So the crypto took 20 seconds. The fetching of 4447 objects took 54 seconds > => 12 ms/object. For Lacnic: 280 objects in 5 seconds => 17 ms/object from my > laptop on a wireless in Amsterdam to South America. > > > *: > http://www.ripe.net/lir-services/resource-management/certification/tools-and-resources > > > >>> >>> So this is quite a bit faster. Eric, Danny how did you get to the numbers >>> in table 3? Like I said: I just got some times from validation runs done on >>> my laptop. We do collect data from our validators 'in the wild' though.. >>> unfortunately the format of this data (we store a *lot*) is such that it >>> will take some time for me to dig out more representative numbers. More >>> time than I have now, but I will try…. >> >> The citation at the end of the document comes from Randy. It shows (MRT?) >> graphs with these numbers on them. This doesn't include issues like repos >> under load from 42,000 caches DNS, outages, etc. I think the numbers taken >> from his preso are very charitable, and they are actual measurements. >> Moreover, his own experiments showed that replicating this all inside a >> ``fairly large scale'' experiment took 660 minutes by itself... and that was >> with just 14,000 objects. This actually totally contradicts the numbers you >> calculated above. Sorry, I think it just doesn't add up. >> >>> Another important factor is the amount of RPs that we can expect.. I know >>> that Rob and Randy et al are looking into ways to let RPs share data and be >>> less reliant on central repository servers. On the other hand if all ASNs >>> run at least 1 rpki validation cache that talks to the repositories >>> directly then we're looking at 40k clients. If they want updates, say just >>> 3 times per day, that's 120k requests per day, so something like 1 - 1.5 >>> per second. >> >> Again, I used Randy's numbers and even his experiments on 14k objects on a >> small topology show it takes twice as long as your global estimates. >> > > We are talking about different things here: > = You're looking at how long it takes for 1 RP to get in sync. > = I was referring to the number of RPs that a repository can expect to > connect per second. > >>> This slide shows the happy case where all RPs are up to date, and they just >>> check with the server to see if there are any updates. So importantly this >>> does not include long running data transfers.. This is not server grade >>> hardware (just a mac mini) but it's useful as an order of magnitude >>> indication imo. We see that the total number of RPs just checking for >>> updates that the server can handle / second depends linearly on the >>> repository size (it needs to do an O(n) scan). The total number of >>> concurrent RPs also depends on the repository size. It appears that some >>> list / index is kept in memory for every connection. Long story short, we >>> can only process small numbers of RPs per second and it's quite trivial to >>> end up with too many concurrent RPs pushing the servers to the memory >>> limited cliff for huge repositories. >> >> Yeah, I was wondering about that. It felt like it was beyond my perspective >> to estimate, so I tried to focus this sizing analysis on a more general >> systemic view. I totally appreciate the above comment, but maybe we could >> try to model that in another tech-note? I'm happy to try and help, if you >> would like. >> > > This is one of the main reasons why I think that rsync won't scale to the > needs we can expect in full deployment. Even if all RIR repositories are > hierarchical, and we don't see a lot of non-hosted CAs publishing elsewhere. > We can not expect to keep getting that number of say 20 ms/object.. Not > without very *huge* investments in setting up some home grown rsync CDN, > spreading them like very busy root servers over the world. Not something that > the non-hosted folk will likely want to do either btw.. > > Then regarding non-hosted CAs not publishing in their RIR repository.. I > hadn't really thought of this before, but he advantage of the recursive rsync > is lost here. Say 5 organisations do non-hosted and they all publish with a > 3rd party providing a repository service.. they are siblings. This repo is > flat. > > So apart from the other things I list in the document I sent yesterday, I > think we have a requirement that the delta protocol is not dependent on the > PKI hierarchy. > > > >>> >>> It's because of this that I keep going on about: >>> = We should have a separate delta protocol and notification mechanism and >>> not rely on rsync for this >>> = For scalability: >>> = the hard work (CPU) should be done by the clients, not the server >>> = it should be possible to offload connections & memory away from the >>> server (proxies) >>> = It makes sense to look at http and scalability of existing CDNs for >>> delivery >> >> ibid. >> >>> >>> My gut feeling (yes it's involved a lot today) tells me that this SHOULD >>> scale a lot better.. For example serving a small update notification file >>> over http using a CDN, 10ks of request / second, easy.. Data transfer to >>> RPs.. probably not a whole lot better actually if you need it all -- though >>> using existing CDNs with a global infrastructure closer to RPs may help >>> here. But if we have a notification file that points to small deltas and >>> fetching these small files is cheap this may actually be a big >>> improvement.. So although it may take a while to do the first sync, >>> *staying* in sync may actually perform a lot better. Well.. all this is >>> thought experiment at this stage. Without a pilot and actual measurements >>> it's hard to be sure. >> >> I really think these are they types of conversations that we should be >> having. Thank you very much for putting your thoughts here! >> >> Eric > _______________________________________________ sidr mailing list [email protected] https://www.ietf.org/mailman/listinfo/sidr
