Re: [sidr] WGLC for draft-ietf-sidr-keyroll-04

Geoff Huston Thu, 18 Nov 2010 21:21:53 -0800

Thank you for these review comments.

On 19/11/2010, at 12:33 PM, Osterweil, Eric wrote:


> Hello everyone,
> 
> I have reviewed this draft and I have several questions and comments.  As a
> relative newcomer to this list, I have not at all ruled out the possibility
> that some comments and questions below are the result of my own naivety.
> However, I have done the obvious background reading to try and minimize the
> incidence of this.  :-P
> 
> As a reader, Section 2 para 4 kind of confuses me.  There is a vague
> discussion of "a period of time" and "'staging period,'" but neither is
> explained until much later in the draft.  My suggestion is to be more
> explicit about _why_ the draft has the need for this waiting period.  From
> reading the entire draft, it later seems that this is to ensure that the new
> CA is propagated widely enough before going live?  Could the draft just say
> something along those lines?


The draft adopts what I consider to be a conventional approach of describing 
the  algorithm in terms of an overall summary, then describes the algorithm in 
detail.

Given that its a technical document describing a complex process I am not sure 
that the document necessarily could be understood in a single reading. 
Doubtless an implementor would read though it multiple times in order to ensure 
that they fully understand the implications of this process for relying parties 
as well as for CAs.

In terms of explaining motivation for the procedure and the overall interaction 
of the parties here, this task has been undertaken by the sidr architecture 
document, and this document defers to the architecture for the detail on _why_ 
specific actions are required, as key roll itself is not a process that exist 
in isolation from the other components of the RPKI.

> 
> Later in the Section there is a 6-step rollover process:
> - In step 2 I am a little worried about the notion that no rollover is
> possible until the issuer has issued a request and received a responded with
> new certificate.  I can clearly see how clean this model is, but practically
> speaking, doesn't that mean that an authority cannot take action in response
> to an event (such as a key compromise) until someone else's operational
> cycles are met?


Perhaps a thorough read of the architecture document, and the CP document would 
answer. Given that the subject is requesting an issuer to mint a new 
certificate, then the conventional process in PKIs is for the subject to 
request a certificate and await the outcome of the issuer's operational cycles 
that result in the certificate being issued. Indeed its somewhat difficult to 
understand what an alternate process may look like.

On the other hand it is also noted that the staging period is a minimum of 24 
hours, but no maximum is stated. If a CA wishes to use a procedure of preparing 
a new CA certificate well in advance and holding it ready in preparation for a 
rollover, then the CA entity can perform an emergency key rollover that relates 
to the issuance and publication of subordinate products with the NEW CA key 
within timing constraints that are completely under the CA entity's control.

However, such operational practices are up to the CA concerned. This document 
describes in normative language that part of the algorithm that is the 
essential  component that is requires for safe interoperation in the context of 
the RPKI.



> - Related to this, I am also a little confused about the "CPS."  I'm
> wondering how realistic it is for someone to predict their turnaround time
> on something like this.  What if the prediction is very off (sometimes,
> occasionally, etc)?  To be clear, my trepidation on this doesn't come from
> ambiguity of the concept (it does seem quite clean), but from some
> observations of complications in the DNSSEC rollout.  When implementing some
> simple ideas, things have sometimes become very complicated. :)


I am unsure what  text, if any you are proposing here. The document is clear 
that  the CA MUST perform the steps steps in the order given, and, unless 
specified  otherwise, each step SHOULD be performed without any intervening 
delay, and that the process MUST be run through to completion.

The staging period is specified as: This duration of the Staging Period is 
determined by the CA, but it MUST be no less than 24 hours.  The Staging Period 
is intended  to afford an opportunity for all RPs to download the NEW CA 
certificate, prior to publication of certificates.

I'm unsure where the process complication arises in this description of time 
elements.

> - Also in step 2 I don't understand why the issues must select a distinct
> subject name?  Also, is there an expectation that this is globally distinct?

There was an extended discussion on the uniqueness of subject names on this 
mailing list. I refer you to the list's archives, and will not attempt to 
reproduce this conversation here.

This discussion also included consideration of the scope of uniqueness of 
names, and again I refer you to the mailing list archive. The topic of subject 
name uniqueness is also considered in the res-cert draft, both in the 
specification section and in the security considerations section.



> - In step 3, the second bullet includes the statement "As quickly as
> possible..."  I think it might be more instructive to explain some tradeoffs
> of various timing decisions?  As a reader, I don't clearly understand why
> this should be done quickly, or what the draft considers "quickly."  Would
> it be possible to explain why delaying causes trouble, and at what point
> delaying causes that trouble, etc?


Again there is a certain amount of interlocking of documents here, and the 
motivation for this is contained in the manifest draft. This document is part 
of a set of documents that the WG chose to split into distinct parts. If you 
are saying that choice was inappropriate then you should review the WG archives 
and make your case, but given that we have a number of documents that each 
describe a particular part of the overall RPKI function, then each document 
does not attempt to reproduce the rationale and material contained in the 
others. The document is saying that there are a number of distinct actions that 
should occur "together". Neither the world, nor the RPKI, will break if this 
does not happen, but RPs will encounter inconsistencies in the distributed 
repository state. Such inconsistencies should be avoided, and the optimal way 
that they can be avoided is by synchronising these distinct actions as much as 
possible. But if they do appear, even in a transient manner, then 
 nothing fatal ensures - its just more work on the part of the RP to resolve 
them.


> - Also in step 3, I believe the third bullet explains that the new CA needs
> to create a temporary EE cert solely to seed the new CA's manifest, and then
> below the private portion of that EE's key [must|should|may] be destroyed.
> I don't think I understand why so many moving parts exist here.  Is this
> because the manifest cannot begin by being empty?

It is not empty: there is a CRL.

>  If so, why not, and why
> not define something that can securely illustrate an empty manifest?

But there is a CRL. It is not empty.

>  I can
> clearly see how this is intended to work, but not why it needs so many
> moving parts.

For motivation and rational for this procedure you are referred to the arch and 
cp drafts. This document does not replicate the material contained there.


>  In its current form, it seems like there is a fair amount of
> room for operational mistakes to wreak some havoc?

Perhaps you should send proposed procedure and text that addresses your 
concerns.  In and of itself this is not a comment that has a tangible response 
in terms of revising the document to address specific concerns.


>  For example, what if the
> private portion of the EE key is compromised?  Is this a problem?

Key compromise of an active key is always a problem. I'm not sure that there is 
any other response, but at the same time I'm not sure how, in a public/private 
keyed system, how the compromise of private keys could be anything but a 
problem.


> - In step 4, the text says "all RPs."  Does this mean all RPs in the
> Internet, or some other subset, and if the former what order of magnitude
> are people envisioning for the number of global RPs?


Please see the architecture draft.


> - Also in step 4, I was wondering what happens if the load of keeping the
> current CA and the new CA up to date causes them to fall behind?  I think
> that (assuming the current and new CAs are on the same directory repo) the
> text essentially requires the directory repo to do double duty (i.e. double
> its load) during a rollover.  What if it cannot keep up, it will be in a
> rollover state forever, right?

The draft states: "during the Staging Period, the NEW CA SHOULD re-issue, **but 
not publish**,  all of the products that were issued under the CURRENT CA" 
There is no "double duty" in the repository publication function at this point.


> - A somewhat related concern in step 4, I think revocation gets kind of
> hairy too: if a cert is issued during a rollover, and then revoked before
> the rollover finishes, I don't think the text explains how this should be
> represented in the new CA.  The text says new certs should go in the new CA,
> but it also says no revocations should go in the new CA's CRL.  The staging
> period exists so that RPs can be learning of the new CA, right? So they
> would see the new cert, but no revocation?

"during the Staging Period, the NEW CA SHOULD re-issue, **but not publish**, 
all of the products that were issued under the CURRENT CA"

given that no subordinate products of the NEW CA are being published at this 
point, the intent of the staging period ti to allow RPs to pick up the NEW CA 
cert itself, and the associated manifest and CRL. Nothing more.


> 
> Finally, the end of Section 5 really got my attention with the text, " When
> a CA rekeys, it changes many signed objects, thus impacting all RPs."  This
> statement makes me wonder how large an effect any given CA can have on
> global stability.  Could this be a [D]DoS vector?  I think some
> clarification on how to determine the impact a single party can have on the
> rest of the RPs would be very useful here?
> 

This is a fine research question, and I for one would encourage you to perform 
that research, but in terms of the draft describing the key rollover procedure, 
if you have alternative suggestions here please propose them. At such the 
comment above is not a comment that leads to any particular proposed revision 
of the text.

At this point I do not believe that you have proposed any revised text for the 
document. Unless you are in a position to be more specific about what textual 
changes you are proposing here, there is not much more I can do here as the 
document's editor.

  Geoff

_______________________________________________
sidr mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/sidr

Re: [sidr] WGLC for draft-ietf-sidr-keyroll-04

Reply via email to