Sorry for the delay... $dayjob...
On Nov 27, 2012, at 6:17 PM, Sriram, Kotikalapudi wrote: > This email is somewhat long. > So I also have a tech-note (pdf) version of this (including the table, > figure, etc.). > If you prefer to read it in the tech-note format, you may skip the rest of > this email and click: > http://www.nist.gov/itl/antd/upload/rpki-rsync-delay-technote.pdf My comments below are taken from you document (as you comment above advised): General comment: I think a lot of the supporting information in this document seems to be derived from qualitative estimation. I worry about this kind of ``guestimation'' from an engineering perspective, and in deference to those people that might have to operate this kind of infrastructure based on guesswork. Specific comments: - In Section A you discuss the merit of different sets of performance numbers: It is true that after Randy spent months presenting and supporting those performance numbers at multiple venues, he has now come back on the list with anecdotal numbers that are strangely very different. However, I have to point out that his anecdotal snapshot numbers don't seem to be quite as credible as his _own_ longitudinal study's numbers. If, for example, he were to do a similar longitudinal study and issue new numbers over the same protracted timescale, it would make a lot of sense to use those too. However, (form the standpoint of both an engineer and a scientist) snapshot measurements do not constitute the same amount of information as long-term studies. Tim's numbers are also point numbers, I'd like to find a way to factor them in as additional data points, but it is, again, hard to compare apples to oranges (one measurement vs. aggregated performance over a longitudinal study). Reme mber, Randy's original numbers are the results of measuring over a long period, they include operational eventualities (like outages and performance hiccups, etc.), instead of a self-proclaimed confirmation bias. - In Section A you present ``best case'' numbers. First, recall that the numbers you've based your best case performance on are snapshots (at best), and appear to be somewhat anecdotal. Our tech note illustrated that Randy's previous numbers are averages across 10 repositories, AND their average values were actually all reasonably consistent with each other in the performance (within the same order of magnitude, modulo 2 smaller repos whose numbers were basically averaged out). Also, I have never seen ``best case'' numbers used as the justification of performance, or operational soundness. For example, in complexity theory, one often speaks about worst-case performance (big-O notation). In general, one wants to know how bad things can get. Thus, you might be able to say something like, ``the best this thing could ever possibly run is x, y, z...'' But we have a long way to go before we can say we've quantified the optimal theoretical performance. Again, from complexity theory, if you want to (for example) describe average complexity (big-omega), that is a lot harder to do than big-O, but would be very useful, and I would applaud the initiative. Is that what your document is trying to do? I guess, at best, it seems like we might be pursuing opposite ends of the spectrum? I was trying to tell people that we have to be ready for operational issues to provide a worst-case estimate. It seems you were trying to illustrate that the system's peak performance is still quite slow. Though, I admit that I am having trouble following your methodology (perhaps your goal is loftier than mine). - Later in Section A, at the top of page 2: I (obviously) see the arithmetic effect of lowering the multiple, but we ought to ask _why_ the other numbers are so high? What do _you_ see as the difference (since both were taking by the same person)? What I see is that the original numbers include real operations, real latencies, etc. and they are taken over a long period. We (as engineers) have to respect that. By the way, these are _not_ my numbers. You likely ought to cite the work you are referencing properly, since calling these ``Eric's numbers'' isn't quite right. A citation lets readers find out the origin of cited information properly. ;) This is, I think, especially true in this case where the actual presenter of the the original numbers is the _same_person_ that is now trying to back away from them. This is, scientifically speaking, an important point. - Second para on page 2: ``normal (steady-state) operation'' citation needed. ``rather rarely'' citation needed. ``would be more like seconds or minutes'' citation needed. These speculations sound like biased views, but my concerns might be totally debunked with proper citations. As someone on the list pointed out, this is engineering. We need evidence and measurements, and _not_ intuition, guesswork, and experimental bias. - Section B: third para on page 2: ``84% of all ASes are stub ... 16% are non-stub'' citation needed. What is this statement based on? ``The stub ASes are likely to only run simplex bgpsec and they would likely contract out publication'' Also, each hosted instance that is _not_ flat still needs infrastructure objects (GB, Mft, CA, etc) + DNS domain name + Also a noteworthy point about your hosted model ( ;) ) is that hosted models remove resource holders' abilities to manage their own information directly. That is, if I have you host my info, and you have my private key, you likely cannot give that key to me for me to make my own operational changes (if you have it in an HSM, that private key cannot be shared). So, I (as a resource holder) cannot author changes to my (say) ROA, or create new router EE certs unless _YOU_ do it for me. So, right now, that would require conversations like, ``hey ARIN, can you generate a new ROA for me since I have a new feasible origin?'' (no big deal), but in the future that would be, ``hey ARIN, can you create a new EE cert for my router, and send me both the pub/priv key pair so I can configure a new router (or $n$ new routers)?'' I'd hazard that _this_ is a much bigger de al, especially with the need for a private key exchange. One more complexity to note. ;) - Section C: You comment that there could be a single router key for an entire AS is simply _not_ operationally feasible. Any operator can feel free to open this discussion again, but I think we've covered that ground here quite a few times. In fact, I think Randy was the most recent person to hit that nail on the head (re: his 2 million router keys comment). - Section C: Your estimates on routers per AS... Wow... So, I see some other people have taken you to task on those estimates. I'll simply add that I think your numbers are wrong, and even Randy pointed to 1 million routers as a good starting point. - Section D: You mention that you (``We?'') had trouble following the notion that updates will only be propagated to the whole system at about 2*T (where T is the update period). I am struggling to find a citation now, but will keep looking. In the meantime, here's a thought exercise that illustrates the point: Suppose you take $n$ second to crawl the rpki. If you start polling at $t$, but at $t + \epsilon$, the first cache updates its entries. Then you will have to wait until $t + n$ before you get to _start_ crawling the rpki again. However, if you then pick up the new entry, you may have to expect that someone's caching software will have to finish its crawl before your local cache has all of the rpki. Why wait? Well, if you have a hierarchical cache system (like the one Randy describes in his preso), or any large multinational AS would likely need, then the data authorities (i.e. not the ones caching) cannot be sure _everyone_ in the Internet has the fresh values un til caches like this have propagated out. Thus, $t + n + n$. This makes your response time $2 \times n$. I will find a citation in the literature and send it out. In the meantime, we have been updating our tech note with a lot of great feedback and will be releasing a new revision soon. Eric _______________________________________________ sidr mailing list [email protected] https://www.ietf.org/mailman/listinfo/sidr
