Re: Updated Revocation Best Practices
Ryan - Again, thank you for the feedback, and please forgive me for the delayed response. I've attempted to address your concerns on the wiki page (since this isn't official policy, I'm editing the live document): https://wiki.mozilla.org/index.php?title=CA%2FResponding_To_An_Incident=revision=1210671=1207675 - Wayne On Mon, Mar 18, 2019 at 12:00 PM Ryan Sleevi wrote: > > > On Sat, Mar 16, 2019 at 12:49 PM Wayne Thayer wrote: > >> Ryan - Thank you for the feedback. >> >> On Fri, Mar 15, 2019 at 6:14 PM Ryan Sleevi wrote: >> >>> While I realize the thinking is with regards to the recent serial number >>> issue, a few questions emerge: >>> >>> 1) Based on the software vendor reporting, they don’t view this as a >>> software defect, but a CA misconfiguration. Do you believe the current >>> policy, as worded, addresses that ambiguity? >>> >>> >> As the language is an example, I don't believe it needs to address this >> distinction. I intended "defect" to mean a defect in the certificate, so >> perhaps it would help to specify that - i.e. "certificate defect"? >> > > I guess the challenge is it introduces the ontology that some folks have > advocated, but no one actually knows where the lines should be drawn, as > every example has had flaws. That is, a "certificate defect" could be > everything from granting basicConstraints:CA=true (e.g. as we saw with > Turktrust [1]) due to a misconfigured certificate profile (which, like > this, was an "off by one" error) to something like misencoded sequences [2]. > > My biggest worry with the proposal is that it seems to actually favor not > revoking/responding to systemic issues (those which can affect a > significant portion of the CA's issued certificates), whereas I think the > intent is that non-revocation should be exceptional and that the CA should > be moving to systemically address things. > > I think the end-goal, for both cases, remains the same: that the CA take > holistic steps to make revocation easier and painless, whether they're > dealing with systemic issues (such as serial numbers or validation methods) > or exceptional situations (such as a rogue RA or validation agent). Looking > at Heartbleed as the example, we know that a massive number of Subscribers > and certificates were affected - it seems like this example would have > encouraged non-revocation, by choosing the size of impact as the > illustrative example. > > [1] https://bugzilla.mozilla.org/show_bug.cgi?id=825022 > [2] > https://wiki.mozilla.org/SecurityEngineering/mozpkix-testing#Things_for_CAs_to_Fix > > >> 4) This new policy seems to explicitly allow a CA never revoking a >>> non-compliant Certificate. Is that your intent? If so, is there any concern >>> that this introduces the risk of CAs presenting revocation as being >>> “required by Mozilla” as opposed to the factually correct and accurate >>> “required by the Baseline Requirements” if Mozilla or this community should >>> disagree with such a decision? >>> >>> >> Is there any difference between delaying revocation until a certificate >> expires and not revoking at all? Is there any difference between CAs >> misrepresenting revocation as "being required by Mozilla to happen by X >> date" and "being required by Mozilla"? >> > > Fair points. I think the previous policy encouraged a more concrete plan > of action ("when"), and did not leave the CA decision making capability > ("if") which could create a conflict between the CA's decisions and the > community expectations. That said, you make a good point - if their "when" > is "when the certificate expires", then it's implicitly an "if" as well, > and that remains unless/until "when" is more prescreptive. > > >> 5) If multiple CAs are affected by a common incident, this seems to >>> encourage delaying revocation as long as possible. It’s unclear whether a >>> CA that can and does revoke their certificates will be more or less >>> favorably considered, both by the ecosystem and by this community. Given >>> the economic incentives, it seems to strongly discourage revocation, as a >>> way of competitive differentiation. >>> >>> >> I'm not following how these changes have the effect of encouraging >> multiple CAs to delay revocation as long as possible. but I do think it >> would be useful to state that CAs who violate the BRs will always be looked >> upon less favorably than those who do not. >> > > If a given CA is faced with a systemic issue - such as serial numbers - > then they have a decision whether to replace a majority of certificates or > not. Independent of any analysis, there will naturally be a preference to > not revoke "if we don't have to". Because the encouragement to post on the > Forum, and because these discussions show that people's opinions about the > seriousness/reasonableness of the matter is, in some way, impacted by how > many other CAs are impacted, there's a natural incentive to delay > revocation as much as possible (and to draw out discussions as much
Re: Updated Revocation Best Practices
On Sat, Mar 16, 2019 at 12:49 PM Wayne Thayer wrote: > Ryan - Thank you for the feedback. > > On Fri, Mar 15, 2019 at 6:14 PM Ryan Sleevi wrote: > >> While I realize the thinking is with regards to the recent serial number >> issue, a few questions emerge: >> >> 1) Based on the software vendor reporting, they don’t view this as a >> software defect, but a CA misconfiguration. Do you believe the current >> policy, as worded, addresses that ambiguity? >> >> > As the language is an example, I don't believe it needs to address this > distinction. I intended "defect" to mean a defect in the certificate, so > perhaps it would help to specify that - i.e. "certificate defect"? > I guess the challenge is it introduces the ontology that some folks have advocated, but no one actually knows where the lines should be drawn, as every example has had flaws. That is, a "certificate defect" could be everything from granting basicConstraints:CA=true (e.g. as we saw with Turktrust [1]) due to a misconfigured certificate profile (which, like this, was an "off by one" error) to something like misencoded sequences [2]. My biggest worry with the proposal is that it seems to actually favor not revoking/responding to systemic issues (those which can affect a significant portion of the CA's issued certificates), whereas I think the intent is that non-revocation should be exceptional and that the CA should be moving to systemically address things. I think the end-goal, for both cases, remains the same: that the CA take holistic steps to make revocation easier and painless, whether they're dealing with systemic issues (such as serial numbers or validation methods) or exceptional situations (such as a rogue RA or validation agent). Looking at Heartbleed as the example, we know that a massive number of Subscribers and certificates were affected - it seems like this example would have encouraged non-revocation, by choosing the size of impact as the illustrative example. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=825022 [2] https://wiki.mozilla.org/SecurityEngineering/mozpkix-testing#Things_for_CAs_to_Fix > 4) This new policy seems to explicitly allow a CA never revoking a >> non-compliant Certificate. Is that your intent? If so, is there any concern >> that this introduces the risk of CAs presenting revocation as being >> “required by Mozilla” as opposed to the factually correct and accurate >> “required by the Baseline Requirements” if Mozilla or this community should >> disagree with such a decision? >> >> > Is there any difference between delaying revocation until a certificate > expires and not revoking at all? Is there any difference between CAs > misrepresenting revocation as "being required by Mozilla to happen by X > date" and "being required by Mozilla"? > Fair points. I think the previous policy encouraged a more concrete plan of action ("when"), and did not leave the CA decision making capability ("if") which could create a conflict between the CA's decisions and the community expectations. That said, you make a good point - if their "when" is "when the certificate expires", then it's implicitly an "if" as well, and that remains unless/until "when" is more prescreptive. > 5) If multiple CAs are affected by a common incident, this seems to >> encourage delaying revocation as long as possible. It’s unclear whether a >> CA that can and does revoke their certificates will be more or less >> favorably considered, both by the ecosystem and by this community. Given >> the economic incentives, it seems to strongly discourage revocation, as a >> way of competitive differentiation. >> >> > I'm not following how these changes have the effect of encouraging > multiple CAs to delay revocation as long as possible. but I do think it > would be useful to state that CAs who violate the BRs will always be looked > upon less favorably than those who do not. > If a given CA is faced with a systemic issue - such as serial numbers - then they have a decision whether to replace a majority of certificates or not. Independent of any analysis, there will naturally be a preference to not revoke "if we don't have to". Because the encouragement to post on the Forum, and because these discussions show that people's opinions about the seriousness/reasonableness of the matter is, in some way, impacted by how many other CAs are impacted, there's a natural incentive to delay revocation as much as possible (and to draw out discussions as much as possible), in the hopes that a decision to not revoke will end up being more favorable. If the determination is that revocation is not necessary, the CAs that reported and revoked effectively went through more "pain" that was needed. I think this ties back up to the first remarks, about understanding what CAs are systemically doing to prevent further issues. I would think that the end goal is that, regardless of severity, CAs should be moving to systems where it's easier to mass-revoke. If large
Re: Updated Revocation Best Practices
Ryan - Thank you for the feedback. On Fri, Mar 15, 2019 at 6:14 PM Ryan Sleevi wrote: > While I realize the thinking is with regards to the recent serial number > issue, a few questions emerge: > > 1) Based on the software vendor reporting, they don’t view this as a > software defect, but a CA misconfiguration. Do you believe the current > policy, as worded, addresses that ambiguity? > > As the language is an example, I don't believe it needs to address this distinction. I intended "defect" to mean a defect in the certificate, so perhaps it would help to specify that - i.e. "certificate defect"? 2) We’ve seen CAs fail to do things like validate the well-formedness of > domain names or ensure consistent validation of their certificates. Given > the current (new) policy allows a CA to make a determination as to whether > a “massive” number of certificates / Subscribers are affected by a given > defect, and given that many CAs have historically viewed material and > substantial, dangerous non-compliance as “minor defects,” are you concerned > that this may place Mozilla directly in a position of requiring revocation > when CAs otherwise decline to? > > Are you asking if I'm concerned that CAs will abuse this guidance to avoid revocation of misissued certificates? If so, the answer is yes, both with the current/proposed and former wording. I don't feel that this additional example changes the situation. 3) With the rephrasing about acceptability to be “general” regarding the > severity of the issue, is there any concern that this may introduce > liability to Mozilla in assessing whether or not a given issue is a > security risk? It would seem that the previous intent is for the CA to > demonstrate their careful and thoughtful analysis as to the severity of > things, while this new policy would seem to permit CAs to make blanket > statements, without any expectations of them showing their analysis. While > it includes discussion on this forum, it’s unclear what acceptable > expectations there are. > > I can see how the term "generally" could be abused to mean "except in whatever current mess we find ourselves in", and on that basis I would support taking it back out. 4) This new policy seems to explicitly allow a CA never revoking a > non-compliant Certificate. Is that your intent? If so, is there any concern > that this introduces the risk of CAs presenting revocation as being > “required by Mozilla” as opposed to the factually correct and accurate > “required by the Baseline Requirements” if Mozilla or this community should > disagree with such a decision? > > Is there any difference between delaying revocation until a certificate expires and not revoking at all? Is there any difference between CAs misrepresenting revocation as "being required by Mozilla to happen by X date" and "being required by Mozilla"? 5) If multiple CAs are affected by a common incident, this seems to > encourage delaying revocation as long as possible. It’s unclear whether a > CA that can and does revoke their certificates will be more or less > favorably considered, both by the ecosystem and by this community. Given > the economic incentives, it seems to strongly discourage revocation, as a > way of competitive differentiation. > > I'm not following how these changes have the effect of encouraging multiple CAs to delay revocation as long as possible. but I do think it would be useful to state that CAs who violate the BRs will always be looked upon less favorably than those who do not. In general, this seems to significantly weaken the assurances that Relying > Parties have as to whether or not CAs will follow the BRs, and to place > Mozilla specifically, and this Forum generally, into a role of determining > whether or not revocation is required and whether the timelines are > reasonable. Given that the vast majority (all?) of the non-compliance > incidents we’ve seen have been argued as defects (in ACLs, in policy, in > procedures), I do worry that this encourages CAs to not revoke, whether > it’s a major matter - such as malformed DNS - or a “minor” matter (if such > a thing exists). > > I agree with this. The intent of the additional language stating that this forum must discuss any decision not to revoke based on lack of risk is intended to strengthen the requirement by forbidding CAs from unilaterally declaring that a particular issue is not a security risk, but the actual effect could be that it encourages CAs to try to punt every revocation decision to this forum. This language will need to change or be removed. This seems to create some of the wrong incentives, although I do understand > and appreciate the point from which it is coming from, in that it seems to > actively discourage revocation, unless and until Mozilla explicitly > requires it. This is certainly a position Mozilla could take, but it does > seem to be significantly different than the past conversations. > > I’m not yet sure how to best suggest
Re: Updated Revocation Best Practices
While I realize the thinking is with regards to the recent serial number issue, a few questions emerge: 1) Based on the software vendor reporting, they don’t view this as a software defect, but a CA misconfiguration. Do you believe the current policy, as worded, addresses that ambiguity? 2) We’ve seen CAs fail to do things like validate the well-formedness of domain names or ensure consistent validation of their certificates. Given the current (new) policy allows a CA to make a determination as to whether a “massive” number of certificates / Subscribers are affected by a given defect, and given that many CAs have historically viewed material and substantial, dangerous non-compliance as “minor defects,” are you concerned that this may place Mozilla directly in a position of requiring revocation when CAs otherwise decline to? 3) With the rephrasing about acceptability to be “general” regarding the severity of the issue, is there any concern that this may introduce liability to Mozilla in assessing whether or not a given issue is a security risk? It would seem that the previous intent is for the CA to demonstrate their careful and thoughtful analysis as to the severity of things, while this new policy would seem to permit CAs to make blanket statements, without any expectations of them showing their analysis. While it includes discussion on this forum, it’s unclear what acceptable expectations there are. 4) This new policy seems to explicitly allow a CA never revoking a non-compliant Certificate. Is that your intent? If so, is there any concern that this introduces the risk of CAs presenting revocation as being “required by Mozilla” as opposed to the factually correct and accurate “required by the Baseline Requirements” if Mozilla or this community should disagree with such a decision? 5) If multiple CAs are affected by a common incident, this seems to encourage delaying revocation as long as possible. It’s unclear whether a CA that can and does revoke their certificates will be more or less favorably considered, both by the ecosystem and by this community. Given the economic incentives, it seems to strongly discourage revocation, as a way of competitive differentiation. In general, this seems to significantly weaken the assurances that Relying Parties have as to whether or not CAs will follow the BRs, and to place Mozilla specifically, and this Forum generally, into a role of determining whether or not revocation is required and whether the timelines are reasonable. Given that the vast majority (all?) of the non-compliance incidents we’ve seen have been argued as defects (in ACLs, in policy, in procedures), I do worry that this encourages CAs to not revoke, whether it’s a major matter - such as malformed DNS - or a “minor” matter (if such a thing exists). This seems to create some of the wrong incentives, although I do understand and appreciate the point from which it is coming from, in that it seems to actively discourage revocation, unless and until Mozilla explicitly requires it. This is certainly a position Mozilla could take, but it does seem to be significantly different than the past conversations. I’m not yet sure how to best suggest clarifications, but I did want to highlight how the relatively small changes seem to do more to significantly alter policy, rather than to clarify existing policy. ___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy
Re: Updated Revocation Best Practices
As I mentioned last week [1], the "serial number entropy" issue has identified some improvements that could be made to Mozilla's guidance for CAs on revocation when responding to an incident. These are relatively minor clarifications and in no way do they represent a fundamental change in our guidance. I have updated a portion of the Revocation section on the wiki page [2] as follows: > Mozilla recognizes that in some exceptional circumstances, revoking > misissued certificates within the prescribed deadline may cause significant > harm, such as when the certificate is used in critical infrastructure and > cannot be safely replaced prior to the revocation deadline, or when a > defect affects a massive number of Subscribers and certificates. However, > Mozilla does not grant exceptions to the BR revocation requirements. It is > our position that your CA is ultimately responsible for deciding if the > harm caused by following the requirements of BR section 4.9.1 outweighs the > risks that are passed on to individuals who rely on the web PKI by choosing > not to meet this requirement. > > If your CA will not be revoking the certificates within the time period > required by the BRs, our expectations are that: > >- The decision and rationale for delaying revocation will be disclosed >to Mozilla in the form of a preliminary incident report immediately; >preferably before the BR mandated revocation deadline. The rationale must >include an explanation for why the situation is exceptional. Responses >similar to “we deem this misissuance not to be a security risk” are >generally not acceptable, and must be discussed on the >mozilla.dev.security.policy list. When revocation is delayed at the request >of specific Subscribers, the rationale should be provided on a >per-Subscriber basis. >- Any decision to not comply with the timeline specified in the >Baseline Requirements must also be accompanied by a clear timeline >describing if and when the problematic certificates will be revoked and >supported by the rationale to delay revocation. >- The issue will need to be listed as a finding in your CA’s next BR >audit statement. >- Your CA will work with your auditor (and supervisory body, as >appropriate) and the Root Store(s) that your CA participates in to ensure >your analysis of the risk and plan of remediation is acceptable. >- That you will perform an analysis to determine the factors that >prevented timely revocation of the certificates, and include a set of >remediation actions in the final incident report that aim to prevent future >revocation delays. > > If your CA will not be revoking the problematic certificates as required > by the BRs, then we recommend that you also contact the other root programs > that your CA participates in to acknowledge this non-compliance and discuss > what expectations their Root Programs have with respect to these > certificates. > I will once again appreciate everyone's constructive feedback on these changes. - Wayne [1] https://groups.google.com/d/msg/mozilla.dev.security.policy/S2KNbJSJ-hs/HNDX5LaZCAAJ [2] https://wiki.mozilla.org/CA/Responding_To_An_Incident#Revocation On Tue, Feb 12, 2019 at 4:42 PM Wayne Thayer wrote: > Mozilla's guidance for incident response lives at > https://wiki.mozilla.org/CA/Responding_To_An_Incident > > I just made some significant changes to the Revocation section that > reflect the approach we took with the recent underscore sunset. > > Most notably, the following paragraph: > > However, it is not our intent to introduce additional problems by forcing >> the immediate revocation of certificates that are not BR-compliant when >> they do not pose an urgent security concern. Therefore, we request that >> your CA perform careful analysis of the situation. If there is >> justification to not revoke the problematic certificates, then your report >> will need to explain those reasons and provide a timeline for when the bulk >> of the certificates will expire or be revoked/replaced. >> > > Has been replaced with: > > Mozilla recognizes that in some exceptional circumstances, revoking >> misissued certificates within the prescribed deadline may cause significant >> harm, such as when the certificate is used in critical infrastructure and >> cannot be safely replaced prior to the revocation deadline. However, >> Mozilla does not grant exceptions to the BR revocation requirements. It is >> our position that your CA is ultimately responsible for deciding if the >> harm caused by following the requirements of BR section 4.9.1.1 outweighs >> the risks created by choosing not to meet this requirement. >> > > Additions have also been made to our expectations when a CA doesn't revoke > on time, along with a number of minor updates. > > You can view a comparison of all the changes at >
Updated Revocation Best Practices
Mozilla's guidance for incident response lives at https://wiki.mozilla.org/CA/Responding_To_An_Incident I just made some significant changes to the Revocation section that reflect the approach we took with the recent underscore sunset. Most notably, the following paragraph: However, it is not our intent to introduce additional problems by forcing > the immediate revocation of certificates that are not BR-compliant when > they do not pose an urgent security concern. Therefore, we request that > your CA perform careful analysis of the situation. If there is > justification to not revoke the problematic certificates, then your report > will need to explain those reasons and provide a timeline for when the bulk > of the certificates will expire or be revoked/replaced. > Has been replaced with: Mozilla recognizes that in some exceptional circumstances, revoking > misissued certificates within the prescribed deadline may cause significant > harm, such as when the certificate is used in critical infrastructure and > cannot be safely replaced prior to the revocation deadline. However, > Mozilla does not grant exceptions to the BR revocation requirements. It is > our position that your CA is ultimately responsible for deciding if the > harm caused by following the requirements of BR section 4.9.1.1 outweighs > the risks created by choosing not to meet this requirement. > Additions have also been made to our expectations when a CA doesn't revoke on time, along with a number of minor updates. You can view a comparison of all the changes at https://wiki.mozilla.org/index.php?title=CA%2FResponding_To_An_Incident=revision=1207675=1185707 I will greatly appreciate everyone's feedback on these changes. - Wayne ___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy