Re: Transparent column encryption

Peter Eisentraut Mon, 18 Jul 2022 03:53:47 -0700

On 15.07.22 19:47, Jacob Champion wrote:

     The CEK key
     material is in turn encrypted by an assymmetric key called the column
     master key (CMK).


I'm not yet understanding why the CMK is asymmetric.

I'm not totally sure either. I started to build it that way becauseother systems were doing it that way, too. But I have been thinkingabout adding a symmetric alternative for the CMKs as well (probably AESKW).

I think there are a couple of reasons why asymmetric keys are possiblyuseful for CMKs:

Some other products make use of secure enclaves to do computations on(otherwise) encrypted values on the server. I don't fully know how thatworks, but I suspect that asymmetric keys can play a role in that. (Idon't have any immediate plans for that in my patch. It seems to be adying technology at the moment.)

Asymmetric keys gives you some more options for how you set up the keysat the beginning. For example, you create the asymmetric key pair onthe host where your client program that wants access to the encrypteddata will run. You put the private key in an appropriate location forrun time. You send the public key to another host. On that other host,you create the CEK, encrypt it with the CMK, and then upload it into theserver (CREATE COLUMN ENCRYPTION KEY). Then you can wipe that secondhost. That way, you can be even more sure that the unencrypted CEKisn't left anywhere. I'm not sure whether this method is very useful inpractice, but it's interesting.

In any case, as I mentioned above, this particular aspect is up fordiscussion.

Also note that if you use a KMS (cmklookup "run" method), the actualalgorithm doesn't even matter (depending on details of the KMS setup),since you just tell the KMS "decrypt this", and the KMS knows by itselfwhat algorithm to use. Maybe there should be a way to specify "unknown"in the ckdcmkalg field.

+#define PG_CEK_AEAD_AES_128_CBC_HMAC_SHA_256   130
+#define PG_CEK_AEAD_AES_192_CBC_HMAC_SHA_384   131
+#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_384   132
+#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_512   133


It looks like these ciphersuites were abandoned by the IETF. Are there
existing implementations of them that have been audited/analyzed? Are
they safe (and do we know that the claims made in the draft are
correct)? How do they compare to other constructions like AES-GCM-SIV
and XChacha20-Poly1305?

The short answer is, these same algorithms are used in equivalentproducts (see MS SQL Server, MongoDB). They even reference the sameexact draft document.

Besides that, here is my analysis for why these are good choices: Youcan't use any of the counter modes, because since the encryption happenson the client, there is no way to coordinate to avoid nonce reuse. Soamong mainstream modes, you are basically left with AES-CBC with arandom IV. In that case, even if you happen to reuse an IV, thepossible damage is very contained.

And then, if you want to use AEAD, you combine that with some MAC, andHMAC is just as good as any for that.

The referenced draft document doesn't really contain any additionalcryptographic insights, it's just a guide on a particular way to putthese two together.


So altogether I think this is a pretty solid choice.

+-- \gencr
+-- (This just tests the parameter passing; there is no encryption here.)
+CREATE TABLE test_gencr (a int, b text);
+INSERT INTO test_gencr VALUES (1, 'one') \gencr
+SELECT * FROM test_gencr WHERE a = 1 \gencr
+ a |  b
+---+-----
+ 1 | one
+(1 row)
+
+INSERT INTO test_gencr VALUES ($1, $2) \gencr 2 'two'
+SELECT * FROM test_gencr WHERE a IN ($1, $2) \gencr 2 3
+ a |  b
+---+-----
+ 2 | two
+(1 row)

I'd expect \gencr to error out without sending plaintext. I know that
under the hood this is just setting up a prepared statement, but if I'm
using \gencr, presumably I really do want to be encrypting my data.
Would it be a problem to always set force-column-encryption for the
parameters we're given here? Any unencrypted columns could be provided
directly.

Yeah, this needs a bit of refinement. You don't want something named"encr" but it only encrypts some of the time. We could possibly do whatyou suggest and make it set the force-encryption flag, or maybe renameit or add another command that just uses prepared statements and doesn'tpromise anything about encryption from its name.

This also ties in with how pg_dump will eventually work. I think bydefault pg_dump will just dump things encrypted and set it up so thatCOPY writes it back encrypted. But there should probably be a mode thatdumps out plaintext and then uses one of these commands to load theplaintext back in. What these psql commands need to do also depends onwhat pg_dump needs them to do.

+  <para>
+   Null values are not encrypted by transparent column encryption; null values
+   sent by the client are visible as null values in the database.  If the fact
+   that a value is null needs to be hidden from the server, this information
+   needs to be encoded into a nonnull value in the client somehow.
+  </para>


This is a major gap, IMO. Especially with the switch to authenticated
ciphers, because it means you can't sign your NULL values. And having
each client or user that's out there solve this with a magic in-band
value seems like a recipe for pain.

Since we're requiring "canonical" use of text format, and the docs say
there are no embedded or trailing nulls allowed in text values, could we
steal the use of a single zero byte to mean NULL? One additional
complication would be that the client would have to double-check that
we're not writing a NULL into a NOT NULL column, and complain if it
reads one during decryption. Another complication would be that the
client would need to complain if it got a plaintext NULL.

You're already alluding to some of the complications. Also considerthat null values could arise from, say, outer joins. So you could be ina situation where encrypted and unencrypted null values coexist. And ofcourse the server doesn't know about the encrypted null values. So howdo you maintain semantics, like for aggregate functions, primary keys,anything that treats null values specially? How do clients deal with amix of encrypted and unencrypted null values, how do they know which oneis real. What if the client needs to send a null value back as aparameter? All of this would create enormous complications, if they canbe solved at all.

I think a way to look at this is that this column encryption featureisn't suitable for disguising the existence or absence of data, it canonly disguise the particular data that you know exists.

+   <para>
+    The <quote>associated data</quote> in these algorithms consists of 4
+    bytes: The ASCII letters <literal>P</literal> and <literal>G</literal>
+    (byte values 80 and 71), followed by the algorithm ID as a 16-bit unsigned
+    integer in network byte order.
+   </para>


Is this AD intended as a placeholder for the future, or does it serve a
particular purpose?

It has been recommended that you include the identity of the encryptionalgorithm in the AD. This protects the client from having to decryptstuff that wasn't meant to be decrypted (in that way).

Re: Transparent column encryption

Reply via email to