All,

One feature we built on main (aka 4.x) was native support for encrypted 
databases. As the foundationdb/4.x work is largely halted now I thought I'd 
scratch a personal itch and try to bring this feature to 3.x.

I've posted a draft pull request at https://github.com/apache/couchdb/pull/4019 
which works.

Obviously the encryption and decryption primitives are the least interesting 
part (though I'm pleased with the implementation), the important part is key 
management, which I hope will be the primary focus of thread responses.

The draft PR has numerous commits that can be understood independently, and not 
all of which necessarily make sense in any final version. I'll describe them 
briefly at the end of this post.

In brief, encryption is done with AES in counter mode. This provides two 
properties that make integration with the way we write to couch_file's much 
easier than other modes. Firstly, we can calculate the cipher text for any 
plain text without needing to read any other data (i.e, we don't need to read 
an AES block's, 16 bytes, worth of data to then modify it). Secondly, we can 
write, or read, any subsection of an AES block. Taken together this means we 
can preserve our append-only scheme and do not have to buffer writes or pad 
them out to AES block sizes. There is a penalty in performance, of course, as 
we must encrypt or decrypt a full AES block at minimum, even if we can discard 
sections of the result. Another consequence is that native encryption only 
provides confidentially, not authenticated (like GCM mode would, for example).

The basics of key management are present in the PR. Each couch_file is 
encrypted by a unique key, generated with crypto:strong_rand_bytes when 
created. This value is wrapped using a secure key wrapping algorithm and stored 
very deliberately at the beginning of the file. Unlike db headers, we do not 
write a further copy later. It lives in the first X bytes. One important 
benefit of this is that the file can be crypto-shredded by overwriting this 
area. The key is unique to the file and does not propagate through compaction. 
Compacting a file encrypts it with a new key. There was no reason to preserve 
the key, so I didn't.

You will see there are commits that hardcode the wrapping key (or "key 
encrypting key" if you prefer), later in the commit set I switch to storing 
these in the config files. I don't think that is a suitable option for any 
version of this but it is useful to demonstrate the separation of concerns. The 
keys can come from anywhere. 

To make things more manageable, later commits introduce the notion of a "key 
id" that is essentially a label for an otherwise random 32-byte value. The key 
id allows us to perform rekeying. That is, we can also change the key that 
wraps the per-couch_file key, by changing the wrapping_key_id in config and 
compacting all existing files. We can also consider a far faster approach where 
we simply overwrite the encryption header at the start of each file. More care 
is needed there, of course.

To the commits themselves;

* demonstrate native encryption

This introduces only the essential changes for native encryption. A single 
"master" key is used. We use the NIST AES Key Wrap algorithm (copied from the 
aegis application on couchdb main branch) to wrap the per-couch_file keys.
 
encrypt the headers too

I went back and forth on whether to encrypt the headers (really, footers) or 
not. In this commit I begin encrypting them. At this point the entire file is 
encrypted (and `ent` confirms the entire file is statistically random). 

support unencrypted files

To support migration, this commit allows couch_file to read unencrypted files.
 
canary value to detect encryption

The previous commit made the assumption that any failure to unwrap a key means 
the file is not encrypted. This isn't necessarily true, so this commit allows 
us to distinguish between an unencrypted file and one where unwrap fails for 
some other reason (tampering, corruption, truncation, cosmic rays). In the 
latter case we return an error. This prevents us from resetting a file 
erroneously.
 
import https://github.com/whitelynx/erlang-pbkdf2/blob/master/src/pbk…

At this point I wanted more than a hardcoded key. In the absence of our 
community's view on key management I elected to use the config file as a 
stepping stone. In advance of doing that I imported a better PBKDF2 
implementation than the one I originally wrote for CouchDB approximately a 
century ago (rounding up). I also expunged our implementation and delegated to 
the imported version.
 
encryption password from config

This commit introduces wrapping keys that are derived from user supplied 
values, using PBKDF2 with SHA-256 as the PRF.
 
performance boost and also hides the key from inspection

This one is very cool (which is entirely down to the Erlang/OTP team). We 
switch to the new dyn_iv functions which has both a performance boost 
(approximately 30% on my laptop, at least) and a security benefit; the key 
bytes themselves are no longer passed around in variables.
 
use AES_SIV (RFC 5297) instead of AES Key Wrap

AES Key Wrap was used in CouchDB main with a view to staying within the FIPS 
140-2 family (for IBM Cloudant). I had written an implementation of AES SIV 
prior to that which never left the workbench. I introduce it here because a) 
it's cool b) waste is bad c) it has a better theoretical foundation than AES 
Key Wrap. In reality we might need a toggle.

 
Add key rotation facility

Here I introduce the notion of multiple keys, where each has an id. The ids are 
written into the couch_files in the same encryption header as the wrapped key, 
and the id is also included in the AAD of the key wrapping algorithm (that is, 
we will detect if the wrong key id is supplied). New files are encrypted using 
the key named in [encryption] wrapping_key_id but as long as you don't delete 
the other keys from [encryption_keys] all existing files can still be 
encrypted. On compaction those files will have their keys wrapped with the 
wrapping_key_id

I'm keen to see this, or something much like it, in CouchDB and invite your 
thoughts.

Regards,
B.






Reply via email to