Hello Hanno, Thanks so much for this detailed analysis and for the feature suggestion! We really appreciate you taking the time to dig into the data quality issues.
The specific inconsistencies you've identified (the various empty field representations, missing protocols, special characters) are definitely problems we need to address. Regarding your suggestion to add automated sanity checks to prevent these issues, I agree that this is needed! To help us evaluate how to implement this, would you be able to share some additional thoughts on: - Which validations you see as highest priority - Whether these should be enforced at data entry (blocking submission) or flagged for review - Any edge cases or exceptions we should consider To ensure we track both the current data issues and the feature request long-term, could you file this in Bugzilla under the Common CA Database component? Here's the link: https://bugzilla.mozilla.org/buglist.cgi?product�%20Program&component=Common%20CA%20Database&resolution=---&list_id830792 Once filed with those details, the CCADB Steering Committee will review it, add it to our backlog, and prioritize accordingly. Thanks again for thinking about ways to make CCADB better! Best regards, Dustin On behalf of CCADB > On Jan 29, 2026, at 6:00 AM, Hanno Böck <[email protected]> wrote: > > Hi, > > I recently did some checks with the CRL data from CCADB contained in > the AllCertificateRecordsCSVFormatv4 file and noted some > inconsistencies. > > Those can either be a single URL value (column "Full CRL Issued By This > CA") or a JSON list ("JSON Array of Partitioned CRLs"). > > * In the JSON list, it appears multiple different values are used to > indicate that the field is empty. It is a mix of empty strings (""), > JSON lists with an empty string ('[""]'), or JSON lists with a > double-double-quoted empty string ('[""""]'). In one particularly > peculiar case (DigiCert/Microsoft TLS G1 ECC CA 01), it is a list > containing a double-double-quoted non-breaking space > ('[""\\u200b""]'). > > * In the single URL column, there are two cases that are missing the > protocol, i.e., no http:// or https://: > www.acabogacia.org/crl/aca_arl.crl and ssl.gpki.go.kr/certs/ssl-ca.cer > > I would suggest to add some basic sanity checks to the data. I don't > care which symbol is used to indicate an empty field for the JSON > column, but I think it should be consistent. Furthermore, I'd suggest > checking that URLs are URLs, and possibly also reject > unicode/non-ascii characters. > > > Note that there's a somewhat related issue that many of these CRLs are > not reliably accessible due to dubious blocking based on user-agents, > and that they are often served with incorrect MIME types. That's > recently been discussed on mdsp: > https://groups.google.com/a/mozilla.org/g/dev-security-policy/c/PZTEB49qsHY/m/8vm3-C3oFgAJ > > -- > Hanno Böck - Independent security researcher > https://itsec.hboeck.de/ > https://badkeys.info/ > > -- > You received this message because you are subscribed to the Google Groups > "CCADB Public" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion visit > https://groups.google.com/a/ccadb.org/d/msgid/public/20260129150032.3a23c25b%40hboeck.de. -- You received this message because you are subscribed to the Google Groups "CCADB Public" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion visit https://groups.google.com/a/ccadb.org/d/msgid/public/5FFAE205-B7F6-4225-B88D-EE66AE8282EE%40apple.com.
smime.p7s
Description: S/MIME cryptographic signature
