On Thu, 1 Sep 2016, Kent Overstreet wrote:
> Encryption in bcachefs is done and working and I just finished documenting the
> design - so now, it needs more eyeballs and vetting before letting users play
> with it.
>
> ### Algorithms
>
> By virtue of working within a copy on write filesystem with provisions for ZFS
> style checksums (that is, checksums with the pointers, not the data), we’re
> able to use a modern AEAD style construction. We use ChaCha20 and Poly1305. We
> use the cyphers directly instead of using the kernel AEAD library (and thus
> means there's a bit more in the design that needs auditing).
A few thoughts:
Great work implementing your own monotnoically-increasing nonce in
bcachefs. You have implemented your own crypto stack, I'm sure lots of
time went into that. Stream ciphers are great, but not always seekable
(eg: output-feedback mode, OFM). Of course if you plan to validate the
MAC for every read, then seekable might not matter---but you will need to
decrypt up to the offset you want even after MAC validation which adds
overhead.
Since you have written a solid nonce generator, supporting the existing
kernel library for block ciphers in counter-mode (CTR) is probably easy
and would aide the validation of your protocol; indeed, it would
future-proof bcachefs against maintaining new ciphers if the user can
specify the kernel crypto library's block- or streamping- cipher.
For counter mode, shift the nonce by as many bits as the extent size
divided by the cipher-block size. For 64k extents and AES-128, you would
shift the nonce like so:
nonce <<= ilog2( 65536 / ilog2(128) )
This example uses 12 nonce bits leaving 2^84 extent generations before
wrapping. (Of course you'll need to store the nonce in your extent
metadata.)
You then have a counter for each 16-bytes of AES in the bottom bits of
the extent nonce. Counter mode is trivial to implement:
ciphertext[ 0] = E_k(nonce+ 0) XOR plaintext[0]
[...]
ciphertext[4095] = E_k(nonce+4095) XOR plaintext[4095]
This gives you cipher-block-independent parallelism, seekability, and
flexibility using existing and future block ciphers. Counter mode is well
studied, I don't think anyone will argue against that if your nonces are
well founded.
[... more below ]
> The current design uses the same key for both ChaCha20 and Poly1305, but my
> recent rereading of the Poly1305-AES paper seems to imply that the Poly1305
> key
> shouldn't be used for anything else. Guidance from actual cryptographers would
> be appreciated here; the ChaCha20/Poly1305 AEAD RFC appears to be silent on
> the
> matter.
>
> Note that ChaCha20 is a stream cypher. This means that it’s critical that we
> use
> a cryptographic MAC (which would be highly desirable anyways), and also
> avoiding
> nonce reuse is critical. Getting nonces right is where most of the trickiness
> is
> involved in bcachefs’s encryption.
>
> The current algorithm choices are not hard coded. Bcachefs already has
> selectable checksum types, and every individual data and metadata write has a
> field that describes the checksum algorithm that was used. On disk, encrypted
> data is represented as a new checksum type - so we now have [none, crc32c,
> crc64, chacha20/poly1305] as possible methods for data to be
> checksummed/encrypted. If in the future we add new encryption algorithms,
> users
> will be able to switch to the new algorithm on existing encrypted filesystems;
> new data will be written with the new algorithm and old data will be read with
> the old algorithm until it is rewritten.
>
> ### Key derivation, master key
>
> Userspace tooling takes the user's passphrase and derives an encryption key
> with
> scrypt. This key is made available to the kernel (via the Linux kernel's
> keyring
> service) prior to mounting the filesystem.
>
> On filesystem mount, the userspace provided key is used to decrypt the master
> key, which is stored in the superblock - also with ChaCha20. The master key is
> encrypted with an 8 byte header, so that we can tell if the correct key was
> supplied.
>
> ### Metadata
>
> Except for the superblock, no metadata in bcache/bcachefs is updated in place
> -
> everything is more or less log structured. Only the superblock is stored
> unencrypted; other metadata is stored with an unencrypted header and encrypted
> contents.
>
> The superblock contains:
> * Label and UUIDs identifying the filesystem
> * A list of component devices (for multi-device filesystems), and information
>on their size, geometry, status (active/failed), last used timestamp
> * Filesystem options
> * The location of the journal
>
> For the rest of the metadata, the unencrypted portion contains:
>
> * 128 bit checksum/MAC field
> * Magic number - identifies a given structure as btree/journal/allocation
>information, for that filesystem
> * Version number (of on disk format), flags (including