Rob,

Thanks for a clearer answer than Randy's. 

On Wed, Aug 04, 2010 at 01:08:08PM -0400, Rob Austein wrote:
> If it really cares whether the caches are exactly in sync, it could
> compare the results, but I doubt that doing so would be particularly
> useful.  This protocol is presenting a live view of a loosely
> consistent distributed database, there is no way to force two caches
> into lock step that does not involve fate sharing, which would defeat
> the point of using redundant caches.

The issue is less a matter of being in lock-step between a pair of caches.
My concern is that if a cache is out of step with the policy server that is
used to populate that cache.  In this particular case, the protocol will
move along happily and you'll see updates from one server, but not the
other.

What I had in mind was a simple message from the cache server noting what
its database version number was from the policy server and the last update
time.  This would permit a router to note when a cache has become radically
desynchronized from its redundantly configured cache.

> There is no concept equivalent to a DNS zone in the RPKI system.
> 
> Given that the state of each RPKI cache is a composite of the states
> of each of the however many objects it might be tracking (best guess
> for fully deployed system is tens or hundreds of thousands of objects,
> collected from an arbitrary number of publication points worldwide),
> the answer to your question is probably "mu", even before factoring in
> fat fingers problems such as clock or trust anchor skew.

I think I see my confusion.  I wouldn't have expected a provider to operate
multiple pieces of infrastructure to gather RPKI objects and generate policy
from them.  Since consistency of policy is important, this scenario didn't
occur to me.

Given the assumption above of more than one RPKI object collector and policy
generator, I can see how a domain-wide policy serial number would be
problematic.  Thanks for the clarification.

> To a first approximation, a router using this protocol just has to
> trust the cache.  There's nothing (other than CPU and storage hits) to
> prevent a router that doesn't trust caches from maintaining its own,
> but the point of this protocol is to get something we can deploy
> cheaply on current generation hardware.

I agree that the protocol accomplishes this nicely.

> Tracking multiple caches in
> case one of them dies is probably wise; attempting to get clever about
> hopping from one to another in an attempt to be smarter than any of
> your caches is probably beyond the intended scope of this protocol.

I'd say I'm still concerned about redudannt cache integrity and that the WG
should spend a few cycles considering reasonable and inexpensive to
implement solutions.  As I pointed out to Randy, a single point of failure
for such a stateful mechanism is problematic.

Random example since I'm here:

We learn 10/8 le 24 from cache servers A and B.  The refcount is thus 2.
A becomes desynchronized for some long period of time.
B withdraws 10/8 le 24.

The remaining refcount is 1 for as long as it stays desynchronized.  Thus a
previously authorized prefix which should be unauthorized remains in the
system.

Bounding the time that such a thing stays in the system would be my goal.

-- Jeff
_______________________________________________
sidr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/sidr

Reply via email to