Hey all, This might not be quite the right list for this, but since I'm not aware of a "ct-implementors" list, I thought I'd go ahead and post it.
tl;dr: Logging with live updates to the Merkle tree can be done at rates of at least 10-150qps. It had bothered me for a while that there was this 24-hour delay at the heart of CT, and I was never satisfied with the hand-wavy answers about distributed consensus. At the CT policy days last week, Nick, Zakir, and I decided to see if we could build a log with two properties: 1. It updates in real time, i.e., it ensures that the Merkle tree is updated before returning 200/OK to an add-chain query. 2. It updates fast enough that it could plausibly keep up with real log loads Property (1) means that if your signer was fast enough to keep up, you could return an STH instead of an SCT in response to an add-chain query. This should be appealing to log operators, since if you log this way, you never have to worry about missing your MMD -- your MMD is zero. The only real innovation needed here is to focus on the bare minimum that needs to be done to update a Merkle tree. In order to add a certificate to the tree, you don't need to track the whole tree, you just need to keep track of a log-size "frontier" of the heads of complete subtrees; these also happen to be the elements of an inclusion proof from the new cert to the tree head immediately after its addition. Then all you need to do to add a cert to the tree is ensure that the following are done atomically: - Fetch the latest frontier - Update it with the new certificate - Store the certificate and the new frontier Then you can update your log as fast as your database can do those transactions. To estimate how fast this can happen in practice, we tested out a few storage configurations behind a simple Go front-end that implements the add-chain endpoint from RFC 6962 [1]: A. Google Cloud Spanner [2] B. Postgres running on a dedicated SSD, on the same server as the front end C. Postgres running remote, on a shared hard disk (All of these can trivially support multiple front-ends, due to transaction wrapping; it turns out not to help, because you're limited by the transaction processing rate.) Spanner turned out to be the slowest by quite a bit [3], operating at around 10qps. Postgres on an SSD was fastest, at 150-200qps. Postgres in a more realistic configuration was in the middle, at around 60qps. Obviously, there's some trade-off with reliability here, but since Google is using Spanner for their logs, it seems like the Spanner number should represent a pretty acceptable reliability level. The upshot is that this style of logging seems eminently feasible. Almost all CT logs are growing at well less than 10qps [2], and even Let's Encrypt on its worst days only generates about 115qps. And I have pretty high confidence that if you actually put some thought into designing the storage scheme, you could get some additional margin here. ===== What does this mean for CT? I'm not totally sure. On the one hand, it seems like it adds some credibility to the idea of relying more on inclusion proofs, since in principle, a log can issue an inclusion proof to a tree head immediately on cert submission. It certainly makes me wonder if there's an argument in here for getting rid of SCTs entirely. On the other hand, it may not be dispositive. Even if you can create an inclusion proof to *a* tree head, it's not going to be that useful unless that a client/auditor can connect that tree head up to some notion of the global state -- and it will not be feasible for that global state to be the entire set of tree heads, one per certificate! Assuming then that logs will have to only issue global state periodically, there will still be delay in issuance, just driven by client/auditor caching capabilities instead of log ingress capabilities. On the third hand, it may be possible to hack around even this. Even if logs are checkpointed, so that only some tree heads are "official" and used by clients/auditors, the log could produce a consistency proof to the last official head at ingress time, which would provide the client/auditor some assurance that the cert fits into the global history, which could be verified when the next official tree head comes out. In any case, we should stop thinking that log delay and the MMD are essential properties of this system. --Richard [1] https://github.com/bifurcation/loggerhead [2] https://github.com/bifurcation/loggerhead/tree/spanner [3] It was also really expensive! Cost almost $100 in free trial credit to run this experiment :) [4] https://sslmate.com/labs/ct_growth/
_______________________________________________ Trans mailing list [email protected] https://www.ietf.org/mailman/listinfo/trans
