Hi Aaron, You raise some excellent points. Thanks for your feedback. It seems like (1) we generally agree on a goal of providing the most complete set of data possible, and (2) there’s an opportunity to balance our desires for the completeness, usefulness, and practicality of the certificate data in question.
With this community’s help (especially yours), the CCADB Steering Committee launched an updated Incident Report Template <https://www.ccadb.org/cas/incident-report> in October 2023. Though this template has only been in use for a short while, we believe there are opportunities to further promote consistency and transparency in Incident Reporting. One enhancement idea was to include a bulleted list of specific questions that would better guide responses for all given sections rather than the current free-form response approach based on some of the section descriptions on CCADB.org <http://ccadb.org> (i.e., the Impact section). The Impact and Appendix sections are similar in that they intend to describe the size and nature of the incident, and it's possible an enhancement to one can benefit the other. For example, the “Impact" section could be transformed… >From (current): “The Impact section should contain a short description of the size and nature of the incident. For example: how many certificates, OCSP responses, or CRLs were affected; whether the affected objects share features (such as issuance time, signature algorithm, or validation type); and whether the CA Owner had to cease issuance during the incident.” To (illustrative): something like the following (intends to describe fields and expected responses)… - Total number of pre-certificates: [if applicable, the total count of pre-certificates affected by the issue(s) described in this incident report, including expired and revoked pre-certificates] - Total number of certificates: [if applicable, the total count of "final" certificates affected by the issue(s) described in this incident report, including expired and revoked certificates] - Total number of "remaining valid" certificates: [if applicable, the total count of "final" certificates affected by the issue(s) described in this incident report, minus expired and revoked certificates. Minimally, this set of certificates MUST be disclosed in the Appendix section of this report.] - Incident heuristic: [if applicable, EITHER: (a) describe a heuristic that would allow a third-party to assemble the full corpus of affected certificates, if not provided in the Appendix (e.g., "Any certificate containing policy OID 1.2.3.4.5.6 and issued between 11/13/2024 and 4/11/2024 is affected by this incident. Certificates that have been revoked or are expired are omitted from the certificate list disclosed to the Appendix.") --- (b) clearly explain why this isn't possible (e.g., "This incident affected every certificate issued between 5/25/2023 and 6/15/2024 that relied upon BR Validation Method 3.2.2.4.19. Because the relied upon validation method is not described in a certificate, this heuristic cannot be used by a third-party to assemble the full corpus of affected certificates. Certificates that have been revoked or expired have been omitted from the certificate list disclosed to the Appendix.), --- or (c) the full corpus of affected certificates are disclosed in the Appendix.] - Was issuance stopped in response to this incident, and why or why not?: [yes/no - explanation (e.g., "Yes. As described in the incident timeline, we stopped issuance after learning of this issue to correct the corresponding certificate profile.")] This is just an example, and we might need to more thoughtfully consider incidents that don’t involve certificates before it can be considered for adoption. What’s helpful to us, though, is that this proposed approach can more consistently describe the impact of an incident - while also possibly offering a balance between our desire for completeness (i.e., satisfied by counts and a clear description of an incident heuristic) and practicality (only requiring disclosure of the “remaining valid" certificates in the Appendix). There might be unexpected benefits from this approach, for example, the heuristic may make it easier for other CA Owners to evaluate whether they share the same issue being reported. We’re interested in your feedback, and that of other community members, in how this might help better define community expectations - and further improve the incident reporting process. Thanks, Ryan On Thu, Apr 11, 2024 at 1:33 PM 'Aaron Gable' via CCADB Public < [email protected]> wrote: > In general, I agree that producing the most complete set of data possible > is the most desirable course of action. However, I wonder how this desire > interacts with the full scale of the WebPKI. > > Two years ago, Let's Encrypt had an incident > <https://bugzilla.mozilla.org/show_bug.cgi?id=1751984> which affected > 100% of our validations conducted via the TLS-ALPN-01 method. The end > result was 10 zstd-compressed files, 10 megabytes each (due to Bugzilla's > attachment size limits), together containing 2.7 million crt.sh URIs. Those > files represented only one version of each certificate: usually the final > certificate, but sometimes the precertificate for those issuances where > production of the final certificate failed. The files also represented only > the certificates which were unexpired at the time that the incident was > discovered, only 18% of the total incident period. > > If the list of affected certificates had included both the pre- and final > certificates, and had covered the full incident period, it would have been > a full gigabyte of compressed URLs. (10 MB per file x 10 files x 2 for both > kinds of certs x 5 to go from 18% to 100% of incident period.) And this > incident affected only a small fraction of Let's Encrypt's total issuance > volume -- if the issue had been with our HTTP-01 method over the same > incident period, the resulting list of URLs would have been nearly 20 > gigabytes. Would this larger set of certificate data actually have been > useful to the community, given that they were already untrusted? > > Acquiring this fuller list would have significantly increased the time > taken to conduct the investigation. Let's Encrypt prunes data about > already-expired certificates from our easily-queriable database to prevent > it from growing without bound, so the investigation would have had to start > pulling in log data, which is a much slower process for both writing and > executing the relevant queries. Would this additional investigation time, > and correspondingly slower incident response and remediation, have been > worthwhile? > > It is possible that the incident period could have exceeded the audit log > retention period (currently 2 years) required by the BRs. In that case, > producing a full list of certificates would have been impossible. The > certificates themselves contain no indication of what validation method was > used for each identifier, so reconstruction from CT doesn't work. What > would be the appropriate action if producing the full list of > historically-affected certificates is not possible? > > Thanks, > Aaron > > On Thu, Apr 11, 2024 at 10:03 AM 'Chris Clements' via CCADB Public < > [email protected]> wrote: > >> Hi Rob, >> >> Thank you for the comprehensive survey and for clearly communicating your >> findings. >> >> In response to your questions, and from the perspective of the Chrome >> Root Program: >> >> >> 1. Is a CA's incident report expected to disclose the affected >>> certificates that have already expired prior to the CA's response to the >>> incident? >>> >> >> We see disclosing the full set of affected certificates, regardless of >> whether they have expired or have been revoked, as presenting the community >> with the most complete perspective of an incident’s impact. This is our >> preferred approach. >> >> >> 2. Is a CA's incident report expected to disclose the affected >>> certificates that have already been revoked prior to the CA's response to >>> the incident? >> >> >> Yes, similar to the previous question, our preference is to collect the >> most complete perspective possible. >> >> 3. Is a CA's incident report expected to disclose both an affected >>> precertificate and its corresponding certificate? Or just one of the pair? >> >> >> You raise an opportunity for improvement. Historically, a list of >> precertificates was considered acceptable. However, having both >> precertificates and final certificates provides a more comprehensive >> perspective, which we consider favorable. >> >> We appreciate other thoughts and perspectives. >> >> Additionally, we’ll plan to sync on these opinions with the other members >> of the CCADB Steering Committee, which could ultimately lead to an update >> of https://www.ccadb.org/cas/incident-report. >> >> Thanks again! >> >> -Chris >> >> >> On Mon, Apr 8, 2024 at 12:45 PM 'Rob Stradling' via CCADB Public < >> [email protected]> wrote: >> >>> In recent weeks, a number of CAs have filed incident reports relating to >>> mistakes made when setting critical flags in Subscriber certificate >>> extensions since the TLSBRv2 profiles came into force. We thought it would >>> be worth performing a comprehensive survey ourselves in order to discover >>> if any similar incidents at other CAs had not yet been detected. >>> >>> I've run [1] against the primary crt.sh DB, which caused it to trawl >>> through the crt.sh ID space starting around the time TLSBRv2 went into >>> force to identify any Subscriber certificate containing any common >>> extension with its critical flag set incorrectly per §7.1.2.7.6. I've >>> posted a report of the results at [2], which was generated using [3]. >>> >>> Seven further incidents were identified. I sent Certificate Problem >>> Reports to the two CAs whose affected PKI hierarchies are trusted by root >>> programs whose representatives are active in monitoring Bugzilla. Both of >>> those CAs responded promptly and filed incident reports: [4] and [5]. >>> >>> Having gathered this data, today I've used it to cross-check the lists >>> of affected certificates that CAs have provided with their incident >>> reports. I was surprised to find two bugs ([6] and [7]) without any >>> attached list of affected certificates. I also observed some patterns of >>> "omissions" in the disclosed lists of affected certificates, for which I >>> would like to call upon the root program owners to clarify their >>> expectations; noting that the CCADB incident reporting requirements [8] say >>> that each incident report's *"Appendix must include a listing of the >>> complete certificate details of all affected certificates"*: >>> >>> 1. Is a CA's incident report expected to disclose the affected >>> certificates that have already expired prior to the CA's response to the >>> incident? >>> 2. Is a CA's incident report expected to disclose the affected >>> certificates that have already been revoked prior to the CA's response to >>> the incident? >>> 3. Is a CA's incident report expected to disclose both an affected >>> precertificate and its corresponding certificate? Or just one of the >>> pair? >>> >>> >>> >>> [1] >>> https://gist.github.com/robstradling/6a5ecca872cf28232d90638fc2c44ed5#file-check_extension_criticality-go >>> [2] >>> https://gist.github.com/robstradling/6a5ecca872cf28232d90638fc2c44ed5#file-report-csv >>> [3] >>> https://gist.github.com/robstradling/6a5ecca872cf28232d90638fc2c44ed5#file-generate_report-sh >>> [4] https://bugzilla.mozilla.org/show_bug.cgi?id=1888060 >>> [5] https://bugzilla.mozilla.org/show_bug.cgi?id=1888104 >>> [6] https://bugzilla.mozilla.org/show_bug.cgi?id=1887096 >>> [7] https://bugzilla.mozilla.org/show_bug.cgi?id=1883416 >>> [8] https://www.ccadb.org/cas/incident-report >>> >>> -- >>> Rob Stradling >>> Senior Research & Development Scientist >>> Sectigo Limited >>> >>> -- >>> You received this message because you are subscribed to the Google >>> Groups "CCADB Public" group. >>> To unsubscribe from this group and stop receiving emails from it, send >>> an email to [email protected]. >>> To view this discussion on the web visit >>> https://groups.google.com/a/ccadb.org/d/msgid/public/MW4PR17MB47290848C0FE089BD12FA77AAA002%40MW4PR17MB4729.namprd17.prod.outlook.com >>> <https://groups.google.com/a/ccadb.org/d/msgid/public/MW4PR17MB47290848C0FE089BD12FA77AAA002%40MW4PR17MB4729.namprd17.prod.outlook.com?utm_medium=email&utm_source=footer> >>> . >>> >> -- >> You received this message because you are subscribed to the Google Groups >> "CCADB Public" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> To view this discussion on the web visit >> https://groups.google.com/a/ccadb.org/d/msgid/public/CAAbw9mCwjzphmV3r-W%3DoSyGiGXdoBqhvcvnCyMkSogiq0%2BTthQ%40mail.gmail.com >> <https://groups.google.com/a/ccadb.org/d/msgid/public/CAAbw9mCwjzphmV3r-W%3DoSyGiGXdoBqhvcvnCyMkSogiq0%2BTthQ%40mail.gmail.com?utm_medium=email&utm_source=footer> >> . >> > -- > You received this message because you are subscribed to the Google Groups > "CCADB Public" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/a/ccadb.org/d/msgid/public/CAEmnErci%2ByV6A9LDioRJWyeXUobMpRo3_G4pP4eymTb1d%3DB_Kw%40mail.gmail.com > <https://groups.google.com/a/ccadb.org/d/msgid/public/CAEmnErci%2ByV6A9LDioRJWyeXUobMpRo3_G4pP4eymTb1d%3DB_Kw%40mail.gmail.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "CCADB Public" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/a/ccadb.org/d/msgid/public/CADEW5O9C5YQURi8HA7g3NzOwh0m1zx-tW05TDR21s0jE5P1DcA%40mail.gmail.com.
