Re: [sidr] Scaling properties of caching in a globally deployed RPKI / BGPSEC system

Montgomery, Douglas Thu, 06 Dec 2012 20:20:40 -0800

On 12/6/12 10:03 PM, "Eric Osterweil" <[email protected]> wrote:

>
>On Dec 6, 2012, at 7:00 PM, Montgomery, Douglas wrote:
>
>> 
>> On 12/6/12 5:57 PM, "Eric Osterweil" <[email protected]> wrote:
>>> 
>>> Uh... big difference.  DNSSEC doesn't require you to care about
>>>anything
>>> before you need it (on demand).  RPKI is prefetching...  I can't really
>>> outline the architectural difference better than that.
>> 
>> So this seems to be about sub-system behavior in various transient
>>states.
>> Cold start of a relying party, when there are no other "hot" instances
>>in
>> contact.  While some will argue that one can engineer redundant systems,
>> and smart RPs (e.g., that initialize with last running check-pointed
>>state
>> at boot),  still I will admit that one could envisions firing up a new
>>RP,
>> just out of the box, for the first time and it taking time to load its
>> initial state.
>
>I was hoping we could all see from my quoted text above that this latest
>discussion is about the _architectural_ difference between the on-demand
>soft-state DNS system, and the prefetching replicated state machine of
>RPKI.  These two are fundamentally very different architectural models.
>Your comments about boot states are interesting, but somewhat off topic
>to this post, imho.

IMHO they address the architectural issue, by pointing out that while the
there are architecturally different in the abstract, for the purpose they
are being designed/discussed, this architectural difference makes very
little difference in performance.

>
>> Won't a demand driven system will experience something very similar when
>> first fired up and and a full table dump comes across the wire in BGP?
>
>Ask any of the operators on this list how they feel about full table
>dumps and routers refreshing table state, configs, etc when coming
>online.  If you're saying that RPKI is just like that, I think you just
>sank your position.

No you completely missed my point, what I am saying is that when a demand
based system (say DNS) first comes on line with no cached state, and a
router with a full BGP table, it will need to do O(500K) queries
immediately.   Lets say DNS is the model for a query based system.  O(10s
to 100s msec) per query?   Lets look at a range of a reverse DNS query
(with DNSSEC?) taking from 10msec (very optimistic) to 500msec.  That
means a cold starting DNS based system would take between 1.4 hours and 69
hours to gather enough origin mappings to support a running DFZ router.
Note that is dangerous not to have all the state necessary to verify a
full RIB because of "prefer valid" like policies, where the relative
validity state of multiple entries in the RIB matter.  I.e., you better
stall the BGP decision process until you have the validation state for all
entries.

So if the query model takes 69 hours to gain enough state to be useful,
why is it architecturally interesting that it did that through 500K
individual queries instead of batch downloads?

>
>> If a demand driven system wasn't smart enough to use a hot standby with
>> significant cached state, it too will suffer the latency of pulling a
>> significant portion of the data it needs instantly.
>
>Again, we were discussing the architectural difference, not polishing the
>chrome on the titanic. :)

Not sure the pithy digs with smilies helps the clarity of these
discussions ... But OK I will give it a shot... Just because the Spruce
Goose's architecture did not sink, it is not clear to me it was any better
suited for the job it was designed for that the Titanic. :^)    .... Nah,
I still don't think these help.

>
>> While I see the architectural differences in those two, it is not clear
>>to
>> me that the end result to a running BGP system that uses them is all
>>that
>> different.
>
>Hmm.. Well, re: my comment above, we have very different opinions.  I
>(literally) defer to operators to answer my above question, and if I'm
>rebuffed on that, then so be it.

Seem my comment above.   I suspect their behavior relative to the
architectural differences you point out ... Will be of no operational
significance in the application they are designed to support.

>
>> Once either approach has achieved steady state, assuming there was some
>> caching done in the demand based system, if the cache holding time of
>> demand queries was the same as the poling interval of the RP, do you
>>think
>> the responsive ness of a change of authoritative info is all that
>> different?
>
>I honestly have to say that I don't know how I could tell at this point.
>I think it would be good for someone to put forward a model, do some
>simulation/measurements and present some analysis.  I don't know when BGP
>is in ``steady state'' or what you might mean by that, but it seems like
>a good time for us to be more quantitative and less qualitative.

I was talking about when the systems designed to support the distribution
of authorization information (e.g., RP/RPKI or some DNS based system?)
were in steady state ... I.e., they have booted up and done their initial
data loads.

>
>> Or do you not assume there will be any caching in a demand based system?
>> And if so, would you be concerned about the peers * full_table number of
>> queries that would result from a router reboot?
>
>I honestly don't understand this last part, but I'm hoping my comments
>and questions (above) address it?

If you don't locally cache the results of querying some system for each
prefix origin pair from a neighbor ... You will have to multiply the time
to query  500K such lookups for each peering session that gives you a full
table.   If you do cache them for some amount of time.  Set the polling
interval of a batch pull based system to that same time span.   Now, how
fast do each react to changes in the authoritative data?

>
>Eric

Dougm

_______________________________________________
sidr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/sidr

Re: [sidr] Scaling properties of caching in a globally deployed RPKI / BGPSEC system

Reply via email to