Re: Incident report D-TRUST: syntax error in one tls certificate
On Mon, Nov 26, 2018 at 12:12 PM Jakob Bohm via dev-security-policy < dev-security-policy@lists.mozilla.org> wrote: > 1. Having a spare certificate ready (if done with proper security, e.g. >a separate key) from a different CA may unfortunately conflict with >badly thought out parts of various certificate "pinning" standards. > You blame the standards, but that seems an operational risk that the site (knowingly) took. That doesn't make a compelling argument. > 2. Being critical from a society perspective (e.g. being the contact >point for a service to help protect the planet), doesn't mean that the >people running such a service can be expected to be IT superstars >capable of dealing with complex IT issues such as unscheduled >certificate replacement due to no fault of their own. > That sounds like an operational risk the site (knowingly) took. Solutions for automation exist, as do concepts such as "hiring multiple people" (having a NOC/SOC). I see nothing to argue that a single person is somehow the risk here. > 3. Not every site can be expected to have the 24/7 staff on hand to do >"top security credentials required" changes, for example a high- >security end site may have a rule that two senior officials need to >sign off on any change in cryptographic keys and certificates, while a >limited-staff end-site may have to schedule a visit from their outside >security consultant to perform the certificate replacement. > This is exactly describing a known risk that the site took, accepting the tradeoffs. I fail to see a compelling argument that there should be no tradeoffs - given the harm presented to the ecosystem - and if sites want to make such policies, rather than promoting automation and CI/CD, then it seems that's a risk they should bear and make an informed choice. Thus I would be all for an official BR ballot to clarify/introduce > that 24 hour revocation for non-compliance doesn't apply to non- > dangerous technical violations. > As discussed elsewhere, there is no such thing as "non-dangerous technical violations". It is a construct, much like "clean coal", that has an appealing turn of phrase, but without the evidence to support it. > Another category that would justify a longer CA response time would be a > situation where a large batch of certificates need to be revalidated due > to a weakness in validation procedures (such as finding out that a > validation method had a vulnerability, but not knowing which if any of > the validated identities were actually fake). For example to recheck a > typical domain-control method, a CA would have to ask each certificate > holder to respond to a fresh challenge (lots of manual work by end > sites), then do the actual check (automated). Like the other examples, this is not at all compelling. Solutions exist to mitigate this risk entirely. CAs and their Subscribers that choose not to avail themselves of these methods - for whatever the reason - are making an informed market choice about these. If they're not informed, that's on the CAs. If they are making the choice, that's on the Subscribers. There's zero reason to change, especially when such revalidation can be, and is, being done automatically. ___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy
Re: Incident report D-TRUST: syntax error in one tls certificate
On Mon, Nov 26, 2018 at 10:31 AM Nick Lamb via dev-security-policy < dev-security-policy@lists.mozilla.org> wrote: > CA/B is the right place for CAs to make the case for a general rule about > giving themselves more time to handle technical non-compliances whose > correct resolution will annoy customers but impose little or no risk to > relying parties, > CAs have made the case - it was not accepted. On a more fundamental and philosophical level, I think this is well-intentioned but misguided. Let's consider that the issue is one that the CA had the full power-and-ability to prevent - namely, they violated the requirements and misissued. A CA is only in this situation if they are a bad CA - a good CA will never run the risk of "annoying" the customer. This also presumes that "annoyance" of the subscriber is a bad thing - but this is also wrong. If we accept that CAs are differentiated based on security, then a CA that regularly misissues and annoys its customers is a CA that will lose customers. This is, arguably, better than the alternative, which is to remove trust in a CA entirely, which will annoy all of its customers. This presumes that the customer cannot take steps to avoid this. However, as suggested by others, the customer could have minimized or eliminated annoyance, such as by ensuring they have a robust system to automate the issuance/replacement of certificates. That they didn't is an operational failure on their fault. This presumes that there is "little or no risk to relying parties." Unfortunately, they are by design not a stakeholder in those conversations - the stakeholders are the CA and the Subscriber, both of which are incentivized to do nothing (it avoids annoying the customer for the CA, it avoids having to change for the customer). This creates the tragedy of the commons that we absolutely saw result from browsers not regularly enforcing compliance on CAs - areas of technical non-compliance that prevented developing interoperable solutions from the spec, which required all sorts of hacks, which then subsequently introduced security issues. This is not a 'broken windows' argument so much as a statement of the demonstrable reality we lived in prior to Amazon's development and publication of linting tools that simplified compliance and enforcement, and the subsequent improvements by ZLint. Conceptually, this is similar to an ISP that regularly cuts its own backbone cables or publishes bad routes. By ensuring that the system consistently functions as designs - and that the CA follows their own stated practices and procedures and revokes everything that doesn't - the disruption is entirely self-inflicted and avoidable, and the market can be left to correct for that. > I personally at least would much rather see CAs actually formally agree > they should all have say 28 days in such cases - even though that's surely > far longer than it should be - than a series of increasingly implausible > "important" but ultimately purely self-serving undocumented exceptions that > make the rules on paper worthless. > I disagree that encouraging regulatory capture (and the CA/Browser Forum doesn't work by formal agreement of CAs, nor does it alter root program expectations) is the solution here. I agree that it's entirely worthless the increasingly implausible "important" revocations. I think a real and meaningful solution is what is being more consistently pursued, and that's to distrust CAs that are not adhering to the set of expectations. There's no reason to believe the "impact" argument, particularly when it's one that both the Subscriber and the CA can and should have avoided, and CAs that continue to make that argument are increasingly showing that they're not working in the best interests of Relying Parties (see above) or Subscribers (by "annoying" them or lying to them), and that's worthy of distrust. ___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy
Late Certinomis Audit (Was: Audit Reminder Email Summary)
Update: I heard back from Certinomis quickly. They provided the following attestation statement from LSTI dated 23-November on the same day. The audit was conducted back in July, so we still need an explanation from Certinomis of why it took LSTI so long to provide the report. https://bugzilla.mozilla.org/attachment.cgi?id=9027230 Unfortunately, the audit period listed in the report begins a week after the prior audit period ended. Certinomis says that this is a reporting mistake, so I have asked them to provide an updated attestation statement from LSTI. - Wayne On Tue, Nov 20, 2018 at 5:00 PM Wayne Thayer wrote: > Thanks for pointing this out Kurt. The Certinomis / Docapost audit report > is now almost one month late. Also, last week the Certinomis representative > informed root programs that he was leaving his post and two others would be > taking his place. I have just emailed the two new representatives and asked > them to explain when we will see the audit report. I'm also concerned about > their numerous compliance bugs. > > - Wayne > > On Tue, Nov 20, 2018 at 3:15 PM Kurt Roeckx via dev-security-policy < > dev-security-policy@lists.mozilla.org> wrote: > >> On Tue, Oct 23, 2018 at 02:35:37PM -0700, Kathleen Wilson via >> dev-security-policy wrote: >> > > > Mozilla: Audit Reminder >> > > > Root Certificates: >> > > > Certinomis - Root CA >> > > > Standard Audit: >> > > > https://bug937589.bmoattachments.org/attachment.cgi?id=8898169 >> > > > Audit Statement Date: 2017-07-24 >> > > > BR Audit: >> https://bug937589.bmoattachments.org/attachment.cgi?id=8898169 >> > > > BR Audit Statement Date: 2017-07-24 >> > > > CA Comments: null >> > > >> > > This seems to be in French, and does not seem to even indicate >> > > when the audit was done, just that the report itself is valid for >> > > 2 years. >> > >> > Our official requirement for the audit statements to be in English is >> new in >> > version 2.6 of our policy (effective date July 1, 2018). Also, last >> July we >> > were still having difficulty getting the ETSI auditors on board with >> > specifying audit periods in their audit statements. >> >> So it seems nothing changed related to this in the last month, >> they are clearly late in providing a new audit statement. >> >> >> Kurt >> >> ___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy
Re: Incident report D-TRUST: syntax error in one tls certificate
On 23/11/2018 16:24, Enrico Entschew wrote: > This post links to https://bugzilla.mozilla.org/show_bug.cgi?id=1509512 > > syntax error in one tls certificate > > 1. How your CA first became aware of the problem (e.g. via a problem report > submitted to your Problem Reporting Mechanism, a discussion in > mozilla.dev.security.policy, a Bugzilla bug, or internal self-audit), and the > time and date. > > We became aware of the issue via https://crt.sh/ on 2018-11-12, 09:01 UTC. > > 2. A timeline of the actions your CA took in response. A timeline is a > date-and-time-stamped sequence of all relevant events. This may include > events before the incident was reported, such as when a particular > requirement became applicable, or a document changed, or a bug was > introduced, or an audit was done. > > Timeline: > 2018-11-12, 09:01 UTC CA became aware via https://crt.sh/ of a syntax error > in one tls certificate issued on 2018-06-02. The PrintableString of OBJECT > IDENTIFIER serialNumber (2 5 4 5) contains an invalid character. For more > details see https://crt.sh/?id=514472818 > 2018-11-12, 09:30 UTC CA Security Issues task force analyzed the error and > recommended further procedure. > 2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an > international critical trade platform for emissions. Immediate revocation of > the certificate would cause irreparable harm to the public. > 2018-11-12, 13:00 UTC We performed a dedicated additionally coaching on > this specific syntax topic within the validation team to avoid this kind of > error in the future. > 2018-11-16, 08:40 UTC Customer responded first time and asked for more time > to evaluate the certificate replacement process. > 2018-11-19, 12:30 UTC CA informed the auditor TÜV-IT about the issue. > 2018-11-20, 15:19 UTC Customer declared to replace the certificate on > 2018-11-22 latest. > 2018-11-22, 15:52 UTC New certificate has been applied for and has been > issued. > 2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea > 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer. > > 3. Whether your CA has stopped, or has not yet stopped, issuing certificates > with the problem. A statement that you have will be considered a pledge to > the community; a statement that you have not requires an explanation. > > The CA has not stopped issuing EV-certificates. We applied dedicated coaching > on this specific syntax topic within the validation team to avoid this kind > of error until software adjustments to both effected systems have been > completed. > > 4. A summary of the problematic certificates. For each problem: number of > certs, and the date the first and last certs with that problem were issued. > > 1 Certificate > SHA-256 41F3AD0CBDA392F078D776FD1CDC0E35F7AF61030C56C7B26B95936F41A83B32 > Issued on 2018-06-01 > > 5. The complete certificate data for the problematic certificates. The > recommended way to provide this is to ensure each certificate is logged to CT > and then list the fingerprints or crt.sh IDs, either in the report or as an > attached spreadsheet, with one list per distinct problem. > > For more details see https://crt.sh/?id=514472818 > > 6. Explanation about how and why the mistakes were made or bugs introduced, > and how they avoided detection until now. > > This problem was caused within the frontend system to the customer and the > lint system. Both systems did not check the entry in the field of > serialNumber (2 5 4 5) correctly. It was possible to enter characters other > than defined in PrintableString definition. > > 7. List of steps your CA is taking to resolve the situation and ensure such > issuance will not be repeated in the future, accompanied with a timeline of > when your CA expects to accomplish these things. > > The CA Security Issues task force together with the software development > analyzed the error. We applied dedicated coaching on this specific syntax > topic within the validation team to avoid this kind of error until software > adjustments to both effected systems have been completed. The changes in the > systems are expected to go live in early January 2019. > In addition to this, would you add the following: - Daily checks of crt.sh (or some other existing tool) if additional such certificates are erroneously issued before the automated countermeasures are in place? - Procedurally (and eventually technically) restrict the serial number element to actual validated identification numbers from a fixed set of databases for each jurisdiction. For example for a Bundesamt, this should be a special prefix followed by some kind of official identifying number of entities within the Bundesvervaltung. Similar of cause for Landesamts, companies etc. Also, it is unclear why a Bundesamt belongs to an identification jurisdiction lower than the entire BDR. For comparison, Danish Company entities ar
Re: Incident report D-TRUST: syntax error in one tls certificate
On 26/11/2018 16:31, Nick Lamb wrote: In common with others who've responded to this report I am very skeptical about the contrast between the supposed importance of this customer's systems versus their, frankly, lackadaisical technical response. This might all seem harmless but it ends up as "the boy who cried wolf". If you relay laughable claims from customers several times, when it comes to an incident where maybe some extraordinary delay was justifiable any good will is already used up by the prior claims. CA/B is the right place for CAs to make the case for a general rule about giving themselves more time to handle technical non-compliances whose correct resolution will annoy customers but impose little or no risk to relying parties, I personally at least would much rather see CAs actually formally agree they should all have say 28 days in such cases - even though that's surely far longer than it should be - than a series of increasingly implausible "important" but ultimately purely self-serving undocumented exceptions that make the rules on paper worthless. It should be noted that the counter-measures that some posts have expected of the end-site in question may not always be realistic (Speaking generally, as I have not data on the specifics of this end- site): 1. Having a spare certificate ready (if done with proper security, e.g. a separate key) from a different CA may unfortunately conflict with badly thought out parts of various certificate "pinning" standards. 2. Being critical from a society perspective (e.g. being the contact point for a service to help protect the planet), doesn't mean that the people running such a service can be expected to be IT superstars capable of dealing with complex IT issues such as unscheduled certificate replacement due to no fault of their own. 3. Not every site can be expected to have the 24/7 staff on hand to do "top security credentials required" changes, for example a high- security end site may have a rule that two senior officials need to sign off on any change in cryptographic keys and certificates, while a limited-staff end-site may have to schedule a visit from their outside security consultant to perform the certificate replacement. Thus I would be all for an official BR ballot to clarify/introduce that 24 hour revocation for non-compliance doesn't apply to non- dangerous technical violations. Another category that would justify a longer CA response time would be a situation where a large batch of certificates need to be revalidated due to a weakness in validation procedures (such as finding out that a validation method had a vulnerability, but not knowing which if any of the validated identities were actually fake). For example to recheck a typical domain-control method, a CA would have to ask each certificate holder to respond to a fresh challenge (lots of manual work by end sites), then do the actual check (automated). Enjoy Jakob -- Jakob Bohm, CIO, Partner, WiseMo A/S. https://www.wisemo.com Transformervej 29, 2860 Søborg, Denmark. Direct +45 31 13 16 10 This public discussion message is non-binding and may contain errors. WiseMo - Remote Service Management for PCs, Phones and Embedded ___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy
Re: Incident report D-TRUST: syntax error in one tls certificate
In common with others who've responded to this report I am very skeptical about the contrast between the supposed importance of this customer's systems versus their, frankly, lackadaisical technical response.This might all seem harmless but it ends up as "the boy who cried wolf". If you relay laughable claims from customers several times, when it comes to an incident where maybe some extraordinary delay was justifiable any good will is already used up by the prior claims.CA/B is the right place for CAs to make the case for a general rule about giving themselves more time to handle technical non-compliances whose correct resolution will annoy customers but impose little or no risk to relying parties, I personally at least would much rather see CAs actually formally agree they should all have say 28 days in such cases - even though that's surely far longer than it should be - than a series of increasingly implausible "important" but ultimately purely self-serving undocumented exceptions that make the rules on paper worthless.___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy
Re: Incident report D-TRUST: syntax error in one tls certificate
(for the avoidance of doubt: posting in a personal capacity) On 23/11/2018 15:24, Enrico Entschew wrote: Timeline: 2018-11-12, 10:30 UTC Customer was contacted the first time. Customer runs an international critical trade platform for emissions. Immediate revocation of the certificate would cause irreparable harm to the public. 2018-11-22, 16:08 UTC The certificate with the serial number 3c 7c fb bf ea 35 a8 96 c6 79 c6 5c 82 ec 40 13 was revoked by customer. Some questions I have: 1) Don't the BR specify CAs MUST revoke within 24 hours (for some issues) or 5 days (for others)? This looks like just over 10 days, and was customer-prompted as opposed to set by the CA, it seems. Am I just missing the part of the BRs that says ignoring the 5 days is OK if it's "just" a syntax error? 2) what procedure does D-TRUST follow to ensure adequate revocation times, and in particular, under what circumstances does it decide that not revoking until the customer gives an OK is necessary (e.g. how does it decide what constitutes an "international[ly] critical" site)? Is this documented, e.g. in CPS or similar? Have auditors signed off on that? 3) can you elaborate on the system being down causing "irreparable harm"? What would have happened if the cert had just been revoked after 24/120 hours? In this case, the website in question ( www.dehst.de ) has been broken in Firefox for the past 64 or so hours (ie since about 6pm UK time on Friday, when I first read your message) because the server doesn't actually send the full chain of certs for its new certificate. Given that the server (AFAICT) doesn't staple OCSP responses, I don't imagine that practical breakage in a web browser would have been worse if the original cert had been revoked immediately, given the CRL revocation done last week hasn't appeared in CRLSet/OneCRL either. ~ Gijs ___ dev-security-policy mailing list dev-security-policy@lists.mozilla.org https://lists.mozilla.org/listinfo/dev-security-policy