Hi Dustin and Hanno, I started a bug for this already, here: https://bugzilla.mozilla.org/show_bug.cgi?id=2013294, but certainly we can add to it as necessary. Ben
On Thu, Jan 29, 2026 at 11:24 AM 'Dustin Hollenback' via CCADB Public < [email protected]> wrote: > Hello Hanno, > > Thanks so much for this detailed analysis and for the feature suggestion! > We really appreciate you taking the time to dig into the data quality > issues. > > The specific inconsistencies you've identified (the various empty field > representations, missing protocols, special characters) are definitely > problems we need to address. > > Regarding your suggestion to add automated sanity checks to prevent these > issues, I agree that this is needed! To help us evaluate how to implement > this, would you be able to share some additional thoughts on: > - Which validations you see as highest priority > - Whether these should be enforced at data entry (blocking submission) or > flagged for review > - Any edge cases or exceptions we should consider > > To ensure we track both the current data issues and the feature request > long-term, could you file this in Bugzilla under the Common CA Database > component? Here's the link: > > https://bugzilla.mozilla.org/buglist.cgi?product=CA%20Program&component=Common%20CA%20Database&resolution=---&list_id=17830792 > > Once filed with those details, the CCADB Steering Committee will review > it, add it to our backlog, and prioritize accordingly. > > Thanks again for thinking about ways to make CCADB better! > > Best regards, > > > Dustin > On behalf of CCADB > > > > On Jan 29, 2026, at 6:00 AM, Hanno Böck <[email protected]> wrote: > > > > Hi, > > > > I recently did some checks with the CRL data from CCADB contained in > > the AllCertificateRecordsCSVFormatv4 file and noted some > > inconsistencies. > > > > Those can either be a single URL value (column "Full CRL Issued By This > > CA") or a JSON list ("JSON Array of Partitioned CRLs"). > > > > * In the JSON list, it appears multiple different values are used to > > indicate that the field is empty. It is a mix of empty strings (""), > > JSON lists with an empty string ('[""]'), or JSON lists with a > > double-double-quoted empty string ('[""""]'). In one particularly > > peculiar case (DigiCert/Microsoft TLS G1 ECC CA 01), it is a list > > containing a double-double-quoted non-breaking space > > ('[""\\u200b""]'). > > > > * In the single URL column, there are two cases that are missing the > > protocol, i.e., no http:// or https://: > > www.acabogacia.org/crl/aca_arl.crl and ssl.gpki.go.kr/certs/ssl-ca.cer > > > > I would suggest to add some basic sanity checks to the data. I don't > > care which symbol is used to indicate an empty field for the JSON > > column, but I think it should be consistent. Furthermore, I'd suggest > > checking that URLs are URLs, and possibly also reject > > unicode/non-ascii characters. > > > > > > Note that there's a somewhat related issue that many of these CRLs are > > not reliably accessible due to dubious blocking based on user-agents, > > and that they are often served with incorrect MIME types. That's > > recently been discussed on mdsp: > > > https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/PZTEB49qsHY/m/8vm3-C3oFgAJ > > > > -- > > Hanno Böck - Independent security researcher > > https://itsec.hboeck.de/ > > https://badkeys.info/ > > > > -- > > You received this message because you are subscribed to the Google > Groups "CCADB Public" group. > > To unsubscribe from this group and stop receiving emails from it, send > an email to [email protected]. > > To view this discussion visit > https://groups.google.com/a/ccadb.org/d/msgid/public/20260129150032.3a23c25b%40hboeck.de > . > > -- > You received this message because you are subscribed to the Google Groups > "CCADB Public" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/a/ccadb.org/d/msgid/public/5FFAE205-B7F6-4225-B88D-EE66AE8282EE%40apple.com > . > -- You received this message because you are subscribed to the Google Groups "CCADB Public" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/a/ccadb.org/d/msgid/public/CA%2B1gtabyB6eEhKAwpQtq%2BVNY-gwtd5HLtH1q0rSaYf4JOV1OFg%40mail.gmail.com.
