Hey all,

This might not be quite the right list for this, but since I'm not aware of
a "ct-implementors" list, I thought I'd go ahead and post it.

tl;dr: Logging with live updates to the Merkle tree can be done at rates of
at least 10-150qps.

It had bothered me for a while that there was this 24-hour delay at the
heart of CT, and I was never satisfied with the hand-wavy answers about
distributed consensus.  At the CT policy days last week, Nick, Zakir, and I
decided to see if we could build a log with two properties:

1. It updates in real time, i.e., it ensures that the Merkle tree is
updated before returning 200/OK to an add-chain query.
2. It updates fast enough that it could plausibly keep up with real log
loads

Property (1) means that if your signer was fast enough to keep up, you
could return an STH instead of an SCT in response to an add-chain query.
This should be appealing to log operators, since if you log this way, you
never have to worry about missing your MMD -- your MMD is zero.

The only real innovation needed here is to focus on the bare minimum that
needs to be done to update a Merkle tree.  In order to add a certificate to
the  tree, you don't need to track the whole tree, you just need to keep
track of a log-size "frontier" of the heads of complete subtrees; these
also happen to be the elements of an inclusion proof from the new cert to
the tree head immediately after its addition.  Then all you need to do to
add a cert to the tree is ensure that the following are done atomically:

- Fetch the latest frontier
- Update it with the new certificate
- Store the certificate and the new frontier

Then you can update your log as fast as your database can do those
transactions.  To estimate how fast this can happen in practice, we tested
out a few storage configurations behind a simple Go front-end that
implements the add-chain endpoint from RFC 6962 [1]:

A. Google Cloud Spanner [2]
B. Postgres running on a dedicated SSD, on the same server as the front end
C. Postgres running remote, on a shared hard disk

(All of these can trivially support multiple front-ends, due to transaction
wrapping; it turns out not to help, because you're limited by the
transaction processing rate.)

Spanner turned out to be the slowest by quite a bit [3], operating at
around 10qps.  Postgres on an SSD was fastest, at 150-200qps.  Postgres in
a more realistic configuration was in the middle, at around 60qps.
Obviously, there's some trade-off with reliability here, but since Google
is using Spanner for their logs, it seems like the Spanner number should
represent a pretty acceptable reliability level.

The upshot is that this style of logging seems eminently feasible.  Almost
all CT logs are growing at well less than 10qps [2], and even Let's Encrypt
on its worst days only generates about 115qps.  And I have pretty high
confidence that if you actually put some thought into designing the storage
scheme, you could get some additional margin here.

=====

What does this mean for CT?  I'm not totally sure.

On the one hand, it seems like it adds some credibility to the idea of
relying more on inclusion proofs, since in principle, a log can issue an
inclusion proof to a tree head immediately on cert submission.  It
certainly makes me wonder if there's an argument in here for getting rid of
SCTs entirely.

On the other hand, it may not be dispositive.  Even if you can create an
inclusion proof to *a* tree head, it's not going to be that useful unless
that a client/auditor can connect that tree head up to some notion of the
global state -- and it will not be feasible for that global state to be the
entire set of tree heads, one per certificate!  Assuming then that logs
will have to only issue global state periodically, there will still be
delay in issuance, just driven by client/auditor caching capabilities
instead of log ingress capabilities.

On the third hand, it may be possible to hack around even this.  Even if
logs are checkpointed, so that only some tree heads are "official" and used
by clients/auditors, the log could produce a consistency proof to the last
official head at ingress time, which would provide the client/auditor some
assurance that the cert fits into the global history, which could be
verified when the next official tree head comes out.

In any case, we should stop thinking that log delay and the MMD are
essential properties of this system.

--Richard


[1] https://github.com/bifurcation/loggerhead
[2] https://github.com/bifurcation/loggerhead/tree/spanner
[3] It was also really expensive!  Cost almost $100 in free trial credit to
run this experiment :)
[4] https://sslmate.com/labs/ct_growth/
_______________________________________________
Trans mailing list
[email protected]
https://www.ietf.org/mailman/listinfo/trans

Reply via email to