On 11/09/17 15:30, Rob Stradling via dev-security-policy wrote:
Hi Hanno.  Thanks for reporting this to us.  We acknowledge the problem, and as I mentioned at [1], we took steps to address it this morning.

We will follow-up with an incident report ASAP.

INCIDENT REPORT

We received two Problem Reports - from Hanno Böck on 9th September at 20:10 UTC, and from Jonathan Rudenberg on 10th September at 00:08 UTC - each of which reported that we had misissued a certificate contrary to a published CAA RRset. Jonathan reported this problem at https://bugzilla.mozilla.org/show_bug.cgi?id=1398545, and in https://bugzilla.mozilla.org/show_bug.cgi?id=1398545#c2 Quirin Scheitle provided a further misissuance report.

TRIAGING
Some Comodo staff saw these reports late on Friday 9th and began to discuss them over the weekend, but they were unable to confirm their accuracy. Indeed, the reports appeared to them to be erroneous, because the logs at their disposal showed that the relevant CAA checks had been performed but the RRsets were empty. Therefore, the only action taken at that point was to escalate the reports to the original developer of our CAA checking code to look at first thing Monday morning.

BACKGROUND
As you'd expect from the authors of RFC6844, we were an early adopter, deploying our initial CAA checking implementation 2.5 years ago. It executes `dig CAA +dnssec +sigchase +trusted-key=dnssec_trusted.keys` to perform the DNS queries. We chose this approach after concluding that, at that time, it was the least worst option available to us for checking DNSSEC signatures. We deployed a specific version of BIND (9.10.1-P2) because testing had shown that `dig` in the next release of BIND would crash when trying to do DNSSEC validation.

WHAT WENT WRONG
Our ops team upgraded the servers that our CAA checking code was running on. This included a very-long-awaited transition from a 32-bit to 64-bit OS. Rather than recompile 9.10.1-P2 for 64-bit, our ops engineers upgraded BIND to 9.10.5-P1. Yesterday morning (Monday 11th), when investigating the Problem Reports, the original developer discovered that as a result of that BIND upgrade all of our calls to `dig` were returning the following response:

`Invalid option: +sigchase
Usage:  dig [@global-server] [domain] [q-type] [q-class] {q-opt}
            {global-d-opt} host [@local-server] {local-d-opt}
            [ host [@local-server] {local-d-opt} [...]]

Use "dig -h" (or "dig -h | more") for complete list of options`

Unfortunately, this `dig` response was being interpreted by our CAA checking code as a CAA response that contained: no "issue" property, no "issuewild" property, no unrecognized critical properties, etc.

This problem had gone undetected due to a combination of reasons: the developer did not ask for BIND to be upgraded and so did not expect any behaviour to have changed; the ops engineers did not realize that upgrading BIND might cause a problem; there wasn't an automated test that would've detected this problem and raised an alarm; CAA RRsets are still fairly uncommon, so nobody noticed that we'd dropped from finding hardly any RRsets to finding zero RRsets; our validation staff only see the results of our CAA processing rather than the complete output from `dig`.

ACTION TAKEN TO ADDRESS THE PROBLEM
Upon discovery of the failing `dig` calls, we immediately downgraded to BIND 9.10.1-P2 and verified that our CAA checks were then working correctly. We also purged our local cache (of recent `dig` responses) to ensure that the misissuance vector was completely closed.

PROBLEM CERTIFICATES
The following certificates have all been revoked:
Reported by Hanno:
https://crt.sh/?id=207082245
Reported by Jonathan:
https://crt.sh/?id=207224651
Reported by Quirin:
https://crt.sh/?id=208456003
https://crt.sh/?id=208486480
https://crt.sh/?id=208486489
https://crt.sh/?id=208486485
https://crt.sh/?id=208486495

NEW CAA CHECKING IMPLEMENTATION
Our initial CAA checking implementation, while functional, was not designed with our current certificate issuance volumes in mind. Consequently, we had been working on a new, much more scalable CAA checking implementation, written in Go. We had expected to deploy this new implementation during Q2 2017, but work on this project was paused due to the uncertainties of CNAME processing that have now been resolved at IETF (see https://www.rfc-editor.org/errata/eid5065) and that will hopefully soon also be resolved at CABForum (see https://cabforum.org/pipermail/public/2017-August/011972.html).

DEPLOYING OUR NEW CAA CHECKING IMPLEMENTATION
Having fixed our `dig` calls we found that our system was struggling to process the queue of CAA checks quickly enough, and so we accelerated our plans to deploy our new CAA checking implementation. This morning (Tuesday 12th) we verified that our new implementation does a reasonable job when faced with the test cases at https://caatestsuite.com/, and we deployed it.

VERIFYING OUR NEW CAA CHECKING IMPLEMENTATION
We are taking immediate steps to engage the services of an external security consultant to independently assess our new CAA checking implementation and to work with us to ensure that it behaves correctly.

ACKNOWLEDGMENTS
We would like to express our thanks to Hanno, Jonathan and Quirin for reporting the problem to us, and to Andrew Ayer for providing https://caatestsuite.com/.

--
Rob Stradling
Senior Research & Development Scientist
COMODO - Creating Trust Online

_______________________________________________
dev-security-policy mailing list
dev-security-policy@lists.mozilla.org
https://lists.mozilla.org/listinfo/dev-security-policy

Reply via email to