At Fri, 21 Mar 2014 17:23:05 +0100, Tim Bruijnzeels wrote: > > Sorry for the late reply.
Likewise :) > > 2) We added a few timing parameters to the End Of Data PDU. These, > > like the Serial Number mechanism, are lifted almost verbatim from > > the DNS zone transfer protocol. We left them out of RFC 6810, but > > subsequent exploration of some of the corner cases of the RPKI > > Router protocol convinced us that leaving these timing parameters > > out of the protocol had been a mistake. > > Can you elaborate a bit on why this had been a mistake? Back around Halloween, Randy and I received an off-list message asking about the end of RFC 6810 section 8. Our correspondent is welcome to jump in here and speak for himself, but he's not one of the usual IETF suspects and may not be on list, so I'll attempt to channel him for the moment. He pointed out that RFC 6810 is inconsistent in its advice about what the router should do if it loses connectivity to the cache for more than two hours: in one place we say that the router SHOULD delete the data it got from the cache, in another place we say that the router should retain the data. Furthermore, RFC 6810 is unclear on how long the router should retain the data (if it does), and, if so, what the expiration period should be. The DNS zone transfer protocol, on which the rpki-rtr protocol is based, conveys explicit answers to all of these questions to the router, in the form of the timing parameters included in the SOA record. Upon reading this question, I concluded that leaving the rest of these out of the rpki-rtr protocol had been a mistake. Granted, this just pushes some of the decisions back from the router to the cache, but the cache is (probably) in a better position to know the right answers, or, at least, plausible harmless answers. That said: the proposed defaults in the current draft, while not quite picked out of thin air, were not the result of years of detailed analysis, so I'm open to suggestions for better values. > With regards to the minimum allowed value for the refresh. Why is it > 120 seconds? That seems rather long to me. I understand that > currently updates do not happen that often and propagate that > quickly, but that may change. And if it does, and the RP tool sees > more regular changes and it is confident that it can deal with load, > then I think it would be better if it could use a lower number > without having to revisit this text. Part of the point of the REFRESH and RETRY parameters is to allow the cache to specify how much beating it is willing to accept from the router(s). In that context, telling the router to check for updates every minute seems a bit excessive. Keep in mind that the current recommendation is to poll hourly, and that the real mechanism for pushing updates out quickly is the Notify PDU (which we also lifted almost verbatim from DNS, no surprise). There may be a better way to phrase the text in the draft, but the REFRESH parameter is intended to control how often the router polls in the absence of a Notify PDU. Assuming that the cache generates Notify PDus, this kind of polling is almost unnecessary; we included it because Notify PDUs are just a hint to the router that now would be a good time to poll, which the router is free to ignore. The REFRESH value tells the router that it really should consider polling at that interval if it hasn't been responding to Notify PDUs. As a demented counterexample, I guess one could build a router which refuses to process Notify PDUs at all but which wants to be kept up to date so badly that it wants to poll every five seconds. I will confess to not having a lot of sympathy with such a design. All that said, if you have a proposal for a better minimum, holler. > With regards to the retry and expire intervals: I am not sure what I > (as cache) should tell the router, so I would probably just go for > the defaults. That's fine, if you have no better idea, although you might consider making them configurable by the cache operator. > Reading them they look to me more like responsibilities of the > client. But please let me know if I am missing something below.. The thing you may be missing is that these are likely to be tied to things like loading on your cache (eg, how many routers is this poor little cache serving, and are you running the cache on a TRS-80), how often your cache fetches data from the RPKI, and what kinds of certificate expiration times people are using to populate the RPKI. Granted, the cache may not be much smarter about some of this than the router is, but, pretty much by definition, the cache can't possibly be dumber about these issues than the router is. > Did you intend for the expire interval to be something that an > operator can configure in the cache, and have it automatically > propagated to the client routers? I think I would prefer to > configure it on the routers. Says the author of a cache implementation :) What I got from the original query that started me down this path is that the router guys don't really know either. Nothing you say can require them to hold data longer than they want to do so, but you, as the cache, ought to be able to tell them that it's a really bad idea to hold data longer than X. You, as the cache, have some idea of what your own update frequency is and can, if you choose, measure what kind of churn rate you're seeing, from which you can determine (at least as well as the router can) how long it is before N percent of the data have changed. Granted, you might not bother to do that; you might instead choose to hardwire an advertised expiration time measured in years. But since the cache is stripping off all the certificate expiration timestamps, the cache by definition knows more about this than the router does, so it seems advisable to have some way for the cache to tell the router when old data should go away. > With regards to the retry interval, I am not sure if there should be > any limit. Why can't the router decide, and what it is the synergy > with having multiple cache configured in a router described in > section 10? RETRY is one cache telling the routers that use it how often it's willing to be beaten up when there's a problem refreshing data. Different cache, (potentially) different RETRY. > I think I may prefer to not limit the router's retries at all, and > recommend operators to monitor their cache. Do we need a standard > query type for the latter? At least to get the cache's own opinion > on its health that could be integrated with monitoring? I don't understand how this would work well enough to have an informed opinion, but it sounds more complicated than DNS-like RETRY. Where's the gain? Also, "recommend operators to monitor their cache" sounds like you're talking about human beings. Certainly there are some in the NOC, but do we really need to drag them in whenever a router fails to sync with its cache(s)? Keep in mind that, for all of these parameters, the cache's responsibility is probably just to pass along settings that were configured by the human who installed the cache software (which may in turn just be passing along the default values from the spec). The point of the exercise here is to make these things explicit, visible, and configurable by the operator, rather than leaving them in the realm of things which the router implementor may or may not have made configurable in some implementation-specific way. _______________________________________________ sidr mailing list [email protected] https://www.ietf.org/mailman/listinfo/sidr
