Re: Transparent column encryption

Jacob Champion Thu, 19 Jan 2023 12:48:17 -0800

On 12/31/22 06:17, Peter Eisentraut wrote:
> On 21.12.22 06:46, Peter Eisentraut wrote:
>> And another update.  The main changes are that I added an 'unspecified' 
>> CMK algorithm, which indicates that the external KMS knows what it is 
>> but the database system doesn't.  This was discussed a while ago.  I 
>> also changed some details about how the "cmklookup" works in libpq. Also 
>> added more code comments and documentation and rearranged some code.


Trying to delay a review until I had "completed it" has only led to me
not reviewing, so here's a partial one. Let me know what pieces of the
implementation and/or architecture you're hoping to get more feedback on.

I like the existing "caveats" documentation, and I've attached a sample
patch with some more caveats documented, based on some of the upthread
conversation:

- text format makes fixed-length columns leak length information too
- you only get partial protection against the Evil DBA
- RSA-OAEP public key safety

(Feel free to use/remix/discard as desired.)

When writing the paragraph on RSA-OAEP I was reminded that we didn't
really dig into the asymmetric/symmetric discussion. Assuming that most
first-time users will pick the builtin CMK encryption method, do we
still want to have an asymmetric scheme implemented first instead of a
symmetric keywrap? I'm still concerned about that public key, since it
can't really be made public. (And now that "unspecified" is available, I
think an asymmetric CMK could be easily created by users that have a
niche use case, and then we wouldn't have to commit to supporting it
forever.)

For the padding caveat:

> +      There is no concern if all values are of the same length (e.g., credit
> +      card numbers).

I nodded along to this statement last year, and then this year I learned
that CCNs aren't fixed-length. So with a 16-byte block, you're probably
going to be able to figure out who has an American Express card.

The column encryption algorithm is set per-column -- but isn't it
tightly coupled to the CEK, since the key length has to match? From a
layperson perspective, using the same key to encrypt the same plaintext
under two different algorithms (if they happen to have the same key
length) seems like it might be cryptographically risky. Is there a
reason I should be encouraged to do that?

With the loss of \gencr it looks like we also lost a potential way to
force encryption from within psql. Any plans to add that for v1?

While testing, I forgot how the new option worked and connected with
`column_encryption=on` -- and then I accidentally sent unencrypted data
to the server, since `on` means "not enabled". :( The server errors out
after the damage is done, of course, but would it be okay to strictly
validate that option's values?

Are there plans to document client-side implementation requirements, to
ensure cross-client compatibility? Things like the "PG\x00\x01"
associated data are buried at the moment (or else I've missed them in
the docs). If you're holding off until the feature is more finalized,
that's fine too.

Speaking of cross-client compatibility, I'm still disconcerted by the
ability to write the value "hello world" into an encrypted integer
column. Should clients be required to validate the text format, using
the attrealtypid?

It occurred to me when looking at the "unspecified" CMK scheme that the
CEK doesn't really have to be an encryption key at all. In that case it
can function more like a (possibly signed?) cookie for lookup, or even
be ignored altogether if you don't want to use a wrapping scheme
(similar to JWE's "direct" mode, maybe?). So now you have three ways to
look up or determine a column encryption key (CMK realm, CMK name, CEK
cookie)... is that a concept worth exploring in v1 and/or the documentation?

Thanks,
--Jacob

diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml
index 55f33a2f5f..06e1c077d5 100644
--- a/doc/src/sgml/ddl.sgml
+++ b/doc/src/sgml/ddl.sgml
@@ -1588,7 +1588,33 @@ export PGCMKLOOKUP
     card numbers).  But if there are signficant length differences between
     valid values and that length information is security-sensitive, then
     application-specific workarounds such as padding would need to be applied.
-    How to do that securely is beyond the scope of this manual.
+    How to do that securely is beyond the scope of this manual.  Note that
+    column encryption is applied to the text representation of the stored 
value,
+    so length differences can be leaked even for fixed-length column types 
(e.g.
+    <literal>bigint</literal>, whose largest decimal representation is longer
+    than 16 bytes).
+   </para>
+
+   <para>
+    Column encryption provides only partial protection against a malicious
+    user with write access to the table.  Once encrypted, any modifications to 
a
+    stored value on the server side will cause a decryption failure on the
+    client.  However, a user with write access may still freely swap encrypted
+    values between rows or columns (or even separate database clusters) as long
+    as they were encrypted with the same key.  Attackers may also remove values
+    by replacing them with nulls, and users with ownership over the table 
schema
+    may replace encryption keys or strip encryption from the columns entirely.
+    All of this is to say: proper access control is still of vital importance
+    when using this feature.
+   </para>
+
+   <para>
+    When using the RSA-OAEP CEK encryption methods, the "public" half of the 
CMK
+    may be used to replace existing column encryption keys with keys of an
+    attacker's choosing, compromising confidentiality and authenticity for
+    values encrypted under that CMK.  For this reason, it's important to keep
+    both the private <emphasis>and</emphasis> public halves of the CMK keypair
+    confidential.
    </para>
 
    <note>

Re: Transparent column encryption

Reply via email to