Re: Transparent column encryption
On Thu, Apr 18, 2024 at 1:49 PM Jelte Fennema-Nio wrote: > I think this is an interesting idea. I can indeed see use cases for > e.g. inserting a new row based on another row (where the secret is the > same). > > IMHO that means that we should also bump the protocol version for this > change, because it's changing the wire protocol by adding a new > parameter format code. And it does so in a way that does not depend on > the new protocol extension. I think we're more or less covering the same ground we did on the other thread here -- in theory I don't love the fact that we never bump the protocol version when we change stuff, but in practice if we start bumping it every time we do anything I think it's going to just break a bunch of stuff without any real benefit. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
On Thu, 18 Apr 2024 at 18:46, Robert Haas wrote: > With regard to the Bind message, I suggest that we regard the protocol > change as reserving a currently-unused bit in the message to indicate > whether the value is pre-encrypted, without reference to the protocol > extension. It could be legal for a client that can't understand > encryption message from the server to supply an encrypted value to be > inserted into a column. And I don't think we would ever want the bit > that's being reserved here to be used by some other extension for some > other purpose, even when this extension isn't used. So I don't see a > need for this to be tied into the protocol extension. I think this is an interesting idea. I can indeed see use cases for e.g. inserting a new row based on another row (where the secret is the same). IMHO that means that we should also bump the protocol version for this change, because it's changing the wire protocol by adding a new parameter format code. And it does so in a way that does not depend on the new protocol extension.
Re: Transparent column encryption
On Wed, Apr 10, 2024 at 6:13 AM Peter Eisentraut wrote: > Obviously, it's early days, so there will be plenty of time to have > discussions on various other aspects of this patch. I'm keeping a keen > eye on the discussion of protocol extensions, for example. I think the way that you handled that is clever, and along the lines of what I had in mind when I invented the _pq_ stuff. More specifically, the way that the ColumnEncryptionKey and ColumnMasterKey messages are handled is exactly the way that I was imagining things would work. The client uses _pq_.column_encryption to signal that it can understand those messages, and the server responds by including them. I assume that if the client doesn't signal understanding, then the server simply omits sending those messages. (I have not checked the code.) I'm less certain about the changes to the ParameterDescription and RowDescription messages. I see a couple of potential problems. One is that, if you say you can understand column encryption messages, the extra fields are included even for unencrypted columns. The client must choose at connection startup whether it ever wishes to read any encrypted data; if so, it pays a portion of that overhead all the time. Another potential problem is with the scalability of this design. Suppose that we could not only encrypt columns, but also compress, fold, mutilate, and spindle them. Then there might end up being a dizzying array of variation in the format of what is supposed to be the same message. Perhaps it's not so bad: as long as the documentation is clear about in which order the additional fields will appear in the relevant messages when more than one relevant feature is used, it's probably not too difficult for clients to cope. And it is probably also true that the precise size of, say, a RowDescription message will rarely be performance-critical. But another thought is that we might try to redesign this so that we simply add more message types rather than mutating message types i.e. after sending the RowDescription message, if any columns are encrypted, we additionally send a RowEncryptionDescription message. Then this treatment becomes symmetric with the handling of ColumnEncryptionKey and ColumnMasterKey messages, and there's no overhead when the feature is unused. With regard to the Bind message, I suggest that we regard the protocol change as reserving a currently-unused bit in the message to indicate whether the value is pre-encrypted, without reference to the protocol extension. It could be legal for a client that can't understand encryption message from the server to supply an encrypted value to be inserted into a column. And I don't think we would ever want the bit that's being reserved here to be used by some other extension for some other purpose, even when this extension isn't used. So I don't see a need for this to be tied into the protocol extension. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
On Thu, 18 Apr 2024 at 13:25, Peter Eisentraut wrote: > Hopefully, the reason for key rotation is mainly that policies require > key rotation, not that keys get compromised all the time. These key rotation policies are generally in place to reduce the impact of a key compromise by limiting the time a compromised key is valid. > This > seems pretty standard to me. For example, I can change the password on > my laptop's file system encryption, which somehow wraps a lower-level > key, but I can't reencrypt the actual file system in place. I think the threat model for this proposal and a laptop's file system encryption are different enough that the same choices/tradeoffs don't automatically translate. Specifically in this proposal the unencrypted CEK is present on all servers that need to read/write those encrypted values. And a successful attacker would then be able to read the encrypted values forever with this key, because it effectively cannot be rotated. That is a much bigger attack surface and risk than a laptop's disk encryption. So, I feel quite strongly that shipping the proposed feature without being able to re-encrypt columns in an online fashion would be a mistake. > That's the > reason for having this two-tier key system in the first place. If we allow for online-rotation of the actual encryption key, then maybe we don't even need this two-tier system ;) Not having this two tier system would have a few benefits in my opinion: 1. We wouldn't need to be sending encrypted key material from the server to every client. Which seems nice from a security, bandwidth and client implementation perspective. 2. Asymmetric encryption of columns is suddenly an option. Allowing certain clients to enter encrypted data into the database but not read it. > Two problems here: One, for deterministic encryption, everyone needs to > agree on the representation, otherwise equality comparisons won't work. > Two, if you give clients the option of storing text or binary, then > clients also get back a mix of text or binary, and it will be a mess. > Just giving the option of storing the payload in binary wouldn't be that > hard, but it's not clear what you can sensibly do with that in the end. How about defining at column creation time if the underlying value should be binary or not? Something like: CREATE TABLE t( mytime timestamp ENCRYPTED WITH (column_encryption_key = cek1, binary=true) ); > Even if the identifiers > somehow were global (but OIDs can also change and are not guaranteed > unique forever), OIDs of existing rows can't just change while a connection is active, right? (all I know is upgrades can change them but that seems fine) Also they are unique within a catalog table, right? > the state of which keys have already been sent is > session state. I agree that this is the case. But it's state that can be tracked fairly easily by a transaction pooler. Similar to how prepared statements can be tracked. And this is easier to do when at the IDs of the same keys are the same across each session to the server, because if they differ then you need to do re-mapping of IDs. > This is kind of like SASL or TLS can add new methods dynamically without > requiring a new version. I mean, as we are learning, making new > protocol versions is kind of hard, so the point was to avoid it. Fair enough > I guess you could do that, but wouldn't that making the decoding of > these messages much more complicated? You would first have to read the > "short" variant, decode the format, and then decide to read the rest. > Seems weird. I see your point. But with the current approach even for queries that don't return any encrypted columns, these useless fields would be part of the RowDescryption. It seems quite annoying to add extra network and parsing overhead all of your queries even if only a small percentage use the encryption feature. Maybe we should add a new message type instead like EncryptedRowDescription, or add some flag field at the start of RowDescription that can be used to indicate that there is encryption info for some of the columns. > Yes, that's what would happen, and that's the intention, so that for > example you can use pg_dump to back up encrypted columns without having > to decrypt them. Okay, makes sense. But I think it would be good to document that. > > A related question to this is that currently libpq throws an error if > > e.g. a master key realm is not defined but another one is. Is that > > really what we want? Is not having one of the realms really that > > different from not providing any realms at all? > > Can you provide a more concrete example of what scenario you have a > concern about? A server has table A and B. A is encrypted with a master key realm X and B is encrypted with master key realm Y. If libpq is only given a key for realm X, and it then tries to read table B, an error is thrown. While if you don't provide any realm at all, you can read from table B just fine,
Re: Transparent column encryption
On 10.04.24 16:14, Jelte Fennema-Nio wrote: (The CEK can't be rotated easily, since that would require reading out all the data from a table/column and reencrypting it. We could/should add some custom tooling for that, but it wouldn't be a routine operation.) This seems like something that requires some more thought because CEK rotation seems just as important as CMK rotation (often both would be compromised at the same time). Hopefully, the reason for key rotation is mainly that policies require key rotation, not that keys get compromised all the time. That's the reason for having this two-tier key system in the first place. This seems pretty standard to me. For example, I can change the password on my laptop's file system encryption, which somehow wraps a lower-level key, but I can't reencrypt the actual file system in place. +The plaintext inside +the ciphertext is always in text format, but this is invisible to the +protocol. It seems like it would be useful to have a way of storing the plaintext in binary form too. I'm not saying this should be part of the initial version, but it would be good to keep that in mind with the design. Two problems here: One, for deterministic encryption, everyone needs to agree on the representation, otherwise equality comparisons won't work. Two, if you give clients the option of storing text or binary, then clients also get back a mix of text or binary, and it will be a mess. Just giving the option of storing the payload in binary wouldn't be that hard, but it's not clear what you can sensibly do with that in the end. + The session-specific identifier of the key. Is it necessary for this identifier to be session-specific? Why not use a global identifier like an oid? Anything session specific makes the job of transaction poolers quite a bit harder. If this identifier would be global, then the message can be forwarded as is to the client instead of re-mapping identifiers between clients and servers (like is needed for prepared statements). The point was just to avoid saying specifically that the OID will be sent, because then that would tie the catalog representation to the protocol, which seems unnecessary. Maybe we can reword that somehow. In terms of connection pooling, this feature as it is conceived right now would only work in session pooling anyway. Even if the identifiers somehow were global (but OIDs can also change and are not guaranteed unique forever), the state of which keys have already been sent is session state. + Additional algorithms may be added to this protocol specification without a + change in the protocol version number. What's the reason for not requiring a version bump for this? This is kind of like SASL or TLS can add new methods dynamically without requiring a new version. I mean, as we are learning, making new protocol versions is kind of hard, so the point was to avoid it. + If the protocol extension _pq_.column_encryption is + enabled (see ), then + there is also the following for each parameter: It seems a little bit wasteful to include these for all columns, even the ones that don't require encryption. How about only adding these fields when format code is 0x11 I guess you could do that, but wouldn't that making the decoding of these messages much more complicated? You would first have to read the "short" variant, decode the format, and then decide to read the rest. Seems weird. Finally, I'm trying to figure out if this really needs to be a protocol extension or if a protocol version bump would work as well without introducing a lot of work for clients/poolers that don't care about it (possibly with some modifications to the proposed protocol changes). That's not something this patch cares about, but the philosophical discussions in the other thread on protocol versioning etc. appear to lean toward protocol extension. What makes this a bit difficult for me is that there's not much written in the documentation on what is supposed to happen for encrypted columns when the protocol extension is not enabled. Is the content just returned/written like it would be with a bytea? Yes, that's what would happen, and that's the intention, so that for example you can use pg_dump to back up encrypted columns without having to decrypt them. A related question to this is that currently libpq throws an error if e.g. a master key realm is not defined but another one is. Is that really what we want? Is not having one of the realms really that different from not providing any realms at all? Can you provide a more concrete example of what scenario you have a concern about?
Re: Transparent column encryption
On Wed, 10 Apr 2024 at 12:13, Peter Eisentraut wrote: > > To kick some things off for PG18, here is an updated version of the > patch for automatic client-side column-level encryption. I only read the docs and none of the code, but here is my feedback on the current design: > (The CEK can't be rotated easily, since > that would require reading out all the data from a table/column and > reencrypting it. We could/should add some custom tooling for that, > but it wouldn't be a routine operation.) This seems like something that requires some more thought because CEK rotation seems just as important as CMK rotation (often both would be compromised at the same time). As far as I can tell the only way to rotate a CEK is by re-encrypting the column for all rows in a single go at the client side, thus taking a write-lock on all rows of the table. That seems quite problematic, because that makes key rotation an operation that requires application downtime. Allowing online rotation is important, otherwise almost no-one will do it preventative at a regular interval. One way to allow online CEK rotation is by allowing a column to be encrypted by one of several keys and/or allow a key to have multiple versions. And then for each row we would store which key/version it was encrypted with. That way for new insertions/updates clients would use the newest version. But clients would still be able to decrypt both old rows with the old key and new rows encrypted with the new key, because the server would give them both keys and tell which row was encrypted with which. Then the old rows can be rewritten by a client in small batches, so that writes to the table can keep working while this operation takes place. This could even be used to allow encrypting previously unencrypted columns using something like "ALTER COLUMN mycol ENCRYPTION KEY cek1". Then unencrypted rows could be indicated by e.g. returning something like NULL for the CEK. +The plaintext inside +the ciphertext is always in text format, but this is invisible to the +protocol. It seems like it would be useful to have a way of storing the plaintext in binary form too. I'm not saying this should be part of the initial version, but it would be good to keep that in mind with the design. + The session-specific identifier of the key. Is it necessary for this identifier to be session-specific? Why not use a global identifier like an oid? Anything session specific makes the job of transaction poolers quite a bit harder. If this identifier would be global, then the message can be forwarded as is to the client instead of re-mapping identifiers between clients and servers (like is needed for prepared statements). + Additional algorithms may be added to this protocol specification without a + change in the protocol version number. What's the reason for not requiring a version bump for this? + If the protocol extension _pq_.column_encryption is + enabled (see ), then + there is also the following for each parameter: It seems a little bit wasteful to include these for all columns, even the ones that don't require encryption. How about only adding these fields when format code is 0x11 Finally, I'm trying to figure out if this really needs to be a protocol extension or if a protocol version bump would work as well without introducing a lot of work for clients/poolers that don't care about it (possibly with some modifications to the proposed protocol changes). What makes this a bit difficult for me is that there's not much written in the documentation on what is supposed to happen for encrypted columns when the protocol extension is not enabled. Is the content just returned/written like it would be with a bytea? Or is writing disallowed because the format code would never be set to 0x11. A related question to this is that currently libpq throws an error if e.g. a master key realm is not defined but another one is. Is that really what we want? Is not having one of the realms really that different from not providing any realms at all? But no-matter these behavioural details, I think it would be fairly easy to add minimal "non-support" for this feature while supporting the new protocol messages. All they would need to do is understand what the new protocol messages/fields mean and either ignore them or throw a clear error. For poolers it's a different story however. For transaction pooling there's quite a bit of work to be done. I already mentioned the session-specific ID being a problem, but even assuming we change that to a global ID there's still difficulties. Key information is only sent by the server if it wasn't sent before in the session[1], so a pooler would need to keep it's own cache and send keys to clients that haven't received them yet. So yeah, I think it would make sense to put this behind a protocol extension feature flag, since it's fairly niche and would require significant work at the pooler side to support.
Re: Transparent column encryption
On 30.03.23 20:35, Stephen Frost wrote: I do feel that column encryption is a useful capability and there's large parts of this approach that I agree with, but I dislike the idea of having our clients be able to depend on what gets returned for non-encrypted columns while not being able to trust what encrypted column results are and then trying to say it's 'transparent'. To that end, it seems like just saying they get back a bytea and making it clear that they have to provide the validation would be clear, while keeping much of the rest. [Note that the word "transparent" has been removed from the feature name. I just didn't change the email thread name.] These thoughts are reasonable, but I think there is a tradeoff to be made between having featureful data validation and enhanced security. If you want your database system to validate your data, you have to send it in plain text. If you want to have your database system not see the plain text, then it cannot validate it. But there is still utility in it. You can't really depend on what gets returned even in the non-encrypted case, unless you cryptographically sign the schema against modification or something like that. So realistically, a client that cares strongly about the data it receives has to do some kind of client-side validation anyway. Note also that higher-level client libraries like JDBC effectively do client-side data validation, for example when you call ResultSet.getInt() etc. This is also one of the reasons why the user facing type declaration exists. You could just make all encrypted columns of type "opaque" or something and not make any promises about what's inside. But client APIs sort or rely on the application being able to ask the result set for what's inside a column value. If we just say, we don't know, then applications (or driver APIs) will have to be changed to accommodate that, but the intention was to not require that. So instead we say, it's supposed to be int, and then if it's sometimes actually not int, then your application throws an exception you can deal with. This is arguably a better developer experience, even if it concerns the data type purist. But do you have a different idea about how it should work? Expanding out from that I'd imagine, pie-in-the-sky and in some far off land, having our data type in/out validation functions moved to the common library and then adding client-side validation of the data going in/out of the encrypted columns would allow application developers to be able to trust what we're returning (as long as they're using libpq- and we'd have to document that independent implementations of the protocol have to provide this or just continue to return bytea's). As mentioned, some client libraries effectively already do that. But even if we could make this much more comprehensive, I don't see how this can ever actually satisfy your point. It would require that all participating clients apply validation all the time, and all other clients can rely on that happening, which is impossible.
Re: Transparent column encryption
On 30.03.23 17:55, Andres Freund wrote: I find it very hard to belief that details of the catalog representation like this will matter to users. How would would it conceivably affect users that we store (key, encryption method) in pg_attribute vs storing an oid that's effectively a foreign key reference to (key, encryption method)? The change you are alluding to would also affect how the DDL commands work and interoperate, so it affects the user. But also, let's not drive this design decision bottom up. Let's go from how we want the data model and the DDL to work and then figure out suitable ways to record that. I don't really know if you are just worried about the catalog size, or you find an actual fault with the data model, or you just find it subjectively odd. The feature is arguably useful without typmod support, e.g., for text. We could ship it like that, then do some work to reorganize pg_attribute and tuple descriptors to relieve some pressure on each byte, and then add the typmod support back in in a future release. I think that is a workable compromise. I doubt that shipping a version of column encryption that breaks our type system is a good idea. I don't follow how you get from that to claiming that it breaks the type system. Please provide more details.
Re: Transparent column encryption
Greetings, * Andres Freund (and...@anarazel.de) wrote: > On 2023-03-30 16:01:46 +0200, Peter Eisentraut wrote: > > On 30.03.23 03:29, Andres Freund wrote: > > > > One might think that, but the precedent in other equivalent systems is > > > > that > > > > you reference the key and the algorithm separately. There is some > > > > (admittedly not very conclusive) discussion about this near [0]. > > > > > > > > [0]: > > > > https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 > > > > > > I'm very much not convinced by that. Either way, there at least there > > > should > > > be a comment mentioning that we intentionally try to allow that. > > > > > > Even if this feature is something we want (why?), ISTM that this should > > > not be > > > implemented by having multiple fields in pg_attribute, but instead by a > > > table > > > referenced by by pg_attribute.attcek. > > > > I don't know if it is clear to everyone here, but the key data model and the > > surrounding DDL are exact copies of the equivalent MS SQL Server feature. > > When I was first studying it, I had the exact same doubts about this. But > > as I was learning more about it, it does make sense, because this matches a > > common pattern in key management systems, which is relevant because these > > keys ultimately map into KMS-managed keys in a deployment. Moreover, 1) it > > is plausible that those people knew what they were doing, and 2) it would be > > preferable to maintain alignment and not create something that looks the > > same but is different in some small but important details. I was wondering about this- is it really exactly the same, down to the point that there's zero checking of what the data returned actually is after it's decrypted and given to the application, and if it actually matches the claimed data type? > I find it very hard to belief that details of the catalog representation like > this will matter to users. How would would it conceivably affect users that we > store (key, encryption method) in pg_attribute vs storing an oid that's > effectively a foreign key reference to (key, encryption method)? I do agree with this. > > > > With the proposed removal of usertypmod, it's only two fields: the link > > > > to > > > > the key and the user-facing type. > > > > > > That feels far less clean. I think loosing the ability to set the > > > precision of > > > a numeric, or the SRID for postgis datums won't be received very well? > > > > In my mind, and I probably wasn't explicit about this, I'm thinking about > > what can be done now versus later. > > > > The feature is arguably useful without typmod support, e.g., for text. We > > could ship it like that, then do some work to reorganize pg_attribute and > > tuple descriptors to relieve some pressure on each byte, and then add the > > typmod support back in in a future release. I think that is a workable > > compromise. > > I doubt that shipping a version of column encryption that breaks our type > system is a good idea. And this. I do feel that column encryption is a useful capability and there's large parts of this approach that I agree with, but I dislike the idea of having our clients be able to depend on what gets returned for non-encrypted columns while not being able to trust what encrypted column results are and then trying to say it's 'transparent'. To that end, it seems like just saying they get back a bytea and making it clear that they have to provide the validation would be clear, while keeping much of the rest. Expanding out from that I'd imagine, pie-in-the-sky and in some far off land, having our data type in/out validation functions moved to the common library and then adding client-side validation of the data going in/out of the encrypted columns would allow application developers to be able to trust what we're returning (as long as they're using libpq- and we'd have to document that independent implementations of the protocol have to provide this or just continue to return bytea's). Not sure how we'd manage to provide support for extensions though. Thanks, Stephen signature.asc Description: PGP signature
Re: Transparent column encryption
Hi, On 2023-03-30 16:01:46 +0200, Peter Eisentraut wrote: > On 30.03.23 03:29, Andres Freund wrote: > > > One might think that, but the precedent in other equivalent systems is > > > that > > > you reference the key and the algorithm separately. There is some > > > (admittedly not very conclusive) discussion about this near [0]. > > > > > > [0]: > > > https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 > > > > I'm very much not convinced by that. Either way, there at least there should > > be a comment mentioning that we intentionally try to allow that. > > > > Even if this feature is something we want (why?), ISTM that this should not > > be > > implemented by having multiple fields in pg_attribute, but instead by a > > table > > referenced by by pg_attribute.attcek. > > I don't know if it is clear to everyone here, but the key data model and the > surrounding DDL are exact copies of the equivalent MS SQL Server feature. > When I was first studying it, I had the exact same doubts about this. But > as I was learning more about it, it does make sense, because this matches a > common pattern in key management systems, which is relevant because these > keys ultimately map into KMS-managed keys in a deployment. Moreover, 1) it > is plausible that those people knew what they were doing, and 2) it would be > preferable to maintain alignment and not create something that looks the > same but is different in some small but important details. I find it very hard to belief that details of the catalog representation like this will matter to users. How would would it conceivably affect users that we store (key, encryption method) in pg_attribute vs storing an oid that's effectively a foreign key reference to (key, encryption method)? > > > With the proposed removal of usertypmod, it's only two fields: the link to > > > the key and the user-facing type. > > > > That feels far less clean. I think loosing the ability to set the precision > > of > > a numeric, or the SRID for postgis datums won't be received very well? > > In my mind, and I probably wasn't explicit about this, I'm thinking about > what can be done now versus later. > > The feature is arguably useful without typmod support, e.g., for text. We > could ship it like that, then do some work to reorganize pg_attribute and > tuple descriptors to relieve some pressure on each byte, and then add the > typmod support back in in a future release. I think that is a workable > compromise. I doubt that shipping a version of column encryption that breaks our type system is a good idea. Greetings, Andres Freund
Re: Transparent column encryption
On 30.03.23 03:29, Andres Freund wrote: One might think that, but the precedent in other equivalent systems is that you reference the key and the algorithm separately. There is some (admittedly not very conclusive) discussion about this near [0]. [0]: https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 I'm very much not convinced by that. Either way, there at least there should be a comment mentioning that we intentionally try to allow that. Even if this feature is something we want (why?), ISTM that this should not be implemented by having multiple fields in pg_attribute, but instead by a table referenced by by pg_attribute.attcek. I don't know if it is clear to everyone here, but the key data model and the surrounding DDL are exact copies of the equivalent MS SQL Server feature. When I was first studying it, I had the exact same doubts about this. But as I was learning more about it, it does make sense, because this matches a common pattern in key management systems, which is relevant because these keys ultimately map into KMS-managed keys in a deployment. Moreover, 1) it is plausible that those people knew what they were doing, and 2) it would be preferable to maintain alignment and not create something that looks the same but is different in some small but important details. With the proposed removal of usertypmod, it's only two fields: the link to the key and the user-facing type. That feels far less clean. I think loosing the ability to set the precision of a numeric, or the SRID for postgis datums won't be received very well? In my mind, and I probably wasn't explicit about this, I'm thinking about what can be done now versus later. The feature is arguably useful without typmod support, e.g., for text. We could ship it like that, then do some work to reorganize pg_attribute and tuple descriptors to relieve some pressure on each byte, and then add the typmod support back in in a future release. I think that is a workable compromise.
Re: Transparent column encryption
Hi, On 2023-03-29 19:08:25 +0200, Peter Eisentraut wrote: > On 29.03.23 18:24, Andres Freund wrote: > > On 2023-03-29 18:08:29 +0200, Peter Eisentraut wrote: > > > On 24.03.23 19:12, Andres Freund wrote: > > > > > I thought about this some more. I think we could get rid of > > > > > attusertypmod > > > > > and just hardcode it as -1. The idea would be that if you ask for an > > > > > encrypted column of type, say, varchar(500), the server isn't able to > > > > > enforce that anyway, so we could just prohibit specifying a nondefault > > > > > typmod for encrypted columns. > > > > > > > > Why not just use typmod for the underlying typmod? It doesn't seem like > > > > encrypted datums will need that? Or are you using it for something > > > > important there? > > > > > > Yes, the typmod of encrypted types stores the encryption algorithm. > > > > Why isn't that an attribute of pg_colenckey, given that attcek has been > > added > > to pg_attribute? > > One might think that, but the precedent in other equivalent systems is that > you reference the key and the algorithm separately. There is some > (admittedly not very conclusive) discussion about this near [0]. > > [0]: > https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 I'm very much not convinced by that. Either way, there at least there should be a comment mentioning that we intentionally try to allow that. Even if this feature is something we want (why?), ISTM that this should not be implemented by having multiple fields in pg_attribute, but instead by a table referenced by by pg_attribute.attcek. > > > (Also, mixing a type with the typmod of another type is weird in a variety > > > of ways, so this is a quite clean solution.) > > > > It's not an unrelated type though. It's the actual typmod of the column > > we're > > talking about. > > What I mean is that various parts of the system think that typid+typmod make > sense together. If the typmod actually refers to usertypid, well, the code > doesn't know that, so who knows what happens. You control what the typmod for encrypted columns does. So I don't see what problems that could be. I seems quite likely that having a separate typmod for the encrypted type will cause problems down the line, because you'll end up having to copy around typid+typmod for the encrypted datum and then also separately for the unencrypted one. > Also, with the current proposal, the system is internally consistent: > pg_encrypted_* can actually look at their own typmod and verify their own > input values that way, which is what a typmod is for. This just works out > of the box. > > > I find it a lot less clean to make all non-CEK uses of > > pg_attribute pay the price of storing three new fields. > > With the proposed removal of usertypmod, it's only two fields: the link to > the key and the user-facing type. That feels far less clean. I think loosing the ability to set the precision of a numeric, or the SRID for postgis datums won't be received very well? Greetings, Andres Freund
Re: Transparent column encryption
On 29.03.23 18:24, Andres Freund wrote: Hi, On 2023-03-29 18:08:29 +0200, Peter Eisentraut wrote: On 24.03.23 19:12, Andres Freund wrote: I thought about this some more. I think we could get rid of attusertypmod and just hardcode it as -1. The idea would be that if you ask for an encrypted column of type, say, varchar(500), the server isn't able to enforce that anyway, so we could just prohibit specifying a nondefault typmod for encrypted columns. Why not just use typmod for the underlying typmod? It doesn't seem like encrypted datums will need that? Or are you using it for something important there? Yes, the typmod of encrypted types stores the encryption algorithm. Why isn't that an attribute of pg_colenckey, given that attcek has been added to pg_attribute? One might think that, but the precedent in other equivalent systems is that you reference the key and the algorithm separately. There is some (admittedly not very conclusive) discussion about this near [0]. [0]: https://www.postgresql.org/message-id/flat/00b0c4f3-0d9f-dcfd-2ba0-eee5109b4963%40enterprisedb.com#147ad6faafe8cdd2c0d2fd56ec972a40 (Also, mixing a type with the typmod of another type is weird in a variety of ways, so this is a quite clean solution.) It's not an unrelated type though. It's the actual typmod of the column we're talking about. What I mean is that various parts of the system think that typid+typmod make sense together. If the typmod actually refers to usertypid, well, the code doesn't know that, so who knows what happens. Also, with the current proposal, the system is internally consistent: pg_encrypted_* can actually look at their own typmod and verify their own input values that way, which is what a typmod is for. This just works out of the box. I find it a lot less clean to make all non-CEK uses of pg_attribute pay the price of storing three new fields. With the proposed removal of usertypmod, it's only two fields: the link to the key and the user-facing type.
Re: Transparent column encryption
Hi, On 2023-03-29 18:08:29 +0200, Peter Eisentraut wrote: > On 24.03.23 19:12, Andres Freund wrote: > > > I thought about this some more. I think we could get rid of attusertypmod > > > and just hardcode it as -1. The idea would be that if you ask for an > > > encrypted column of type, say, varchar(500), the server isn't able to > > > enforce that anyway, so we could just prohibit specifying a nondefault > > > typmod for encrypted columns. > > > > Why not just use typmod for the underlying typmod? It doesn't seem like > > encrypted datums will need that? Or are you using it for something > > important there? > > Yes, the typmod of encrypted types stores the encryption algorithm. Why isn't that an attribute of pg_colenckey, given that attcek has been added to pg_attribute? > (Also, mixing a type with the typmod of another type is weird in a variety > of ways, so this is a quite clean solution.) It's not an unrelated type though. It's the actual typmod of the column we're talking about. I find it a lot less clean to make all non-CEK uses of pg_attribute pay the price of storing three new fields. Greetings, Andres Freund
Re: Transparent column encryption
On 24.03.23 19:12, Andres Freund wrote: I thought about this some more. I think we could get rid of attusertypmod and just hardcode it as -1. The idea would be that if you ask for an encrypted column of type, say, varchar(500), the server isn't able to enforce that anyway, so we could just prohibit specifying a nondefault typmod for encrypted columns. Why not just use typmod for the underlying typmod? It doesn't seem like encrypted datums will need that? Or are you using it for something important there? Yes, the typmod of encrypted types stores the encryption algorithm. (Also, mixing a type with the typmod of another type is weird in a variety of ways, so this is a quite clean solution.)
Re: Transparent column encryption
Hi, On 2023-03-23 14:54:48 +0100, Peter Eisentraut wrote: > On 22.03.23 10:00, Peter Eisentraut wrote: > > > I get that for the type, but why do we need the typmod duplicated as > > > well? > > > > Earlier patch versions didn't do that, but that got really confusing > > about which type the typmod really belonged to, since code currently > > assumes that typid+typmod makes sense. Earlier patch versions had three > > fields (usertypid, keyid, encalg), and then I changed it to (usertypid, > > usertypmod, keyid) and instead placed the encalg into the real typmod, > > which made everything much cleaner. > > I thought about this some more. I think we could get rid of attusertypmod > and just hardcode it as -1. The idea would be that if you ask for an > encrypted column of type, say, varchar(500), the server isn't able to > enforce that anyway, so we could just prohibit specifying a nondefault > typmod for encrypted columns. Why not just use typmod for the underlying typmod? It doesn't seem like encrypted datums will need that? Or are you using it for something important there? Greetings, Andres Freund
Re: Transparent column encryption
On 23.03.23 16:55, Robert Haas wrote: On Thu, Mar 23, 2023 at 9:55 AM Peter Eisentraut wrote: I thought about this some more. I think we could get rid of attusertypmod and just hardcode it as -1. The idea would be that if you ask for an encrypted column of type, say, varchar(500), the server isn't able to enforce that anyway, so we could just prohibit specifying a nondefault typmod for encrypted columns. I'm not sure if there are weird types that use typmods in some way where this wouldn't work. But so far I could not think of anything. I'll look into this some more. I thought we often treated atttypid, atttypmod, and attcollation as a trio, these days. It seems a bit surprising that you'd end up adding columns for two out of the three. Internally, we use all three. But for reporting to the client (RowDescription message), we only have slots for type and typmod. We could in theory extend the protocol to report the collation as well, but it's probably not too interesting.
Re: Transparent column encryption
On Thu, Mar 23, 2023 at 9:55 AM Peter Eisentraut wrote: > I thought about this some more. I think we could get rid of > attusertypmod and just hardcode it as -1. The idea would be that if you > ask for an encrypted column of type, say, varchar(500), the server isn't > able to enforce that anyway, so we could just prohibit specifying a > nondefault typmod for encrypted columns. > > I'm not sure if there are weird types that use typmods in some way where > this wouldn't work. But so far I could not think of anything. > > I'll look into this some more. I thought we often treated atttypid, atttypmod, and attcollation as a trio, these days. It seems a bit surprising that you'd end up adding columns for two out of the three. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
On 22.03.23 10:00, Peter Eisentraut wrote: I get that for the type, but why do we need the typmod duplicated as well? Earlier patch versions didn't do that, but that got really confusing about which type the typmod really belonged to, since code currently assumes that typid+typmod makes sense. Earlier patch versions had three fields (usertypid, keyid, encalg), and then I changed it to (usertypid, usertypmod, keyid) and instead placed the encalg into the real typmod, which made everything much cleaner. I thought about this some more. I think we could get rid of attusertypmod and just hardcode it as -1. The idea would be that if you ask for an encrypted column of type, say, varchar(500), the server isn't able to enforce that anyway, so we could just prohibit specifying a nondefault typmod for encrypted columns. I'm not sure if there are weird types that use typmods in some way where this wouldn't work. But so far I could not think of anything. I'll look into this some more.
Re: Transparent column encryption
On 21.03.23 18:47, Andres Freund wrote: On 2023-03-21 18:05:15 +0100, Peter Eisentraut wrote: On 16.03.23 17:36, Andres Freund wrote: Maybe a daft question, but why do we need a separate type and typmod for encrypted columns? Why isn't the fact that the column is encrypted exactly one new field, and we use the existing type/typmod fields? The way this is implemented is that for an encrypted column, the real atttypid and atttypmod are one of the encrypted special types (pg_encrypted_*). That way, most of the system doesn't need to care about the details of encryption or whatnot, it just unpacks tuples etc. by looking at atttypid, atttyplen, etc., and queries on encrypted data behave normally by just looking at what operators etc. those types have. This approach heavily contains the number of places that need to know about this feature at all. I get that for the type, but why do we need the typmod duplicated as well? Earlier patch versions didn't do that, but that got really confusing about which type the typmod really belonged to, since code currently assumes that typid+typmod makes sense. Earlier patch versions had three fields (usertypid, keyid, encalg), and then I changed it to (usertypid, usertypmod, keyid) and instead placed the encalg into the real typmod, which made everything much cleaner.
Re: Transparent column encryption
Hi, On 2023-03-21 18:05:15 +0100, Peter Eisentraut wrote: > On 16.03.23 17:36, Andres Freund wrote: > > Maybe a daft question, but why do we need a separate type and typmod for > > encrypted columns? Why isn't the fact that the column is encrypted exactly > > one > > new field, and we use the existing type/typmod fields? > > The way this is implemented is that for an encrypted column, the real > atttypid and atttypmod are one of the encrypted special types > (pg_encrypted_*). That way, most of the system doesn't need to care about > the details of encryption or whatnot, it just unpacks tuples etc. by looking > at atttypid, atttyplen, etc., and queries on encrypted data behave normally > by just looking at what operators etc. those types have. This approach > heavily contains the number of places that need to know about this feature > at all. I get that for the type, but why do we need the typmod duplicated as well? Greetings, Andres Freund
Re: Transparent column encryption
On 16.03.23 17:36, Andres Freund wrote: Maybe a daft question, but why do we need a separate type and typmod for encrypted columns? Why isn't the fact that the column is encrypted exactly one new field, and we use the existing type/typmod fields? The way this is implemented is that for an encrypted column, the real atttypid and atttypmod are one of the encrypted special types (pg_encrypted_*). That way, most of the system doesn't need to care about the details of encryption or whatnot, it just unpacks tuples etc. by looking at atttypid, atttyplen, etc., and queries on encrypted data behave normally by just looking at what operators etc. those types have. This approach heavily contains the number of places that need to know about this feature at all. Do we need to decouple tuple descriptors from pg_attribute altogether? Yes. Very clearly. The amount of memory and runtime we spent on tupledescs is disproportionate. A second angle is that we build tupledescs way way too frequently. The executor infers them everywhere, so not even prepared statements protect against that. How do we decide what goes into the tuple descriptor and what does not? I'm interested in addressing this, because obviously we do want the ability to add more features in the future, but I don't know what the direction should be. We've had some prior discussion around this, see e.g. https://postgr.es/m/20210819114435.6r532qbadcsyfscp%40alap3.anarazel.de This sounds like a good plan.
Re: Transparent column encryption
Hi, On 2023-03-16 11:26:46 +0100, Peter Eisentraut wrote: > On 13.03.23 22:11, Andres Freund wrote: > > > It adds branches, and it makes tupledescs wider. In tight spots, such as > > > printtup, that can hurt, even if the branches aren't ever entered. > > In fact, I do see a noticable, but not huge, regression: > > I tried to reproduce your measurements, but I don't have the CPU affinity > tools like that on macOS, so the differences were lost in the noise. (The > absolute numbers looked very similar to yours.) > > I can reproduce a regression if I keep adding more columns to pg_attribute, > like 8 OID columns does start to show. > > [...] > How do we proceed here? Maybe a daft question, but why do we need a separate type and typmod for encrypted columns? Why isn't the fact that the column is encrypted exactly one new field, and we use the existing type/typmod fields? > A lot of user-facing table management stuff like compression, statistics, > collation, dropped columns, and probably future features like column > reordering or periods, have to be represented in pg_attribute somehow, at > least in the current architecture. Do we hope that hardware keeps up with > the pace at which we add new features? Yea, it's not great as is. I think it's also OK to decide that the slowdown is worth it in some cases - it just has to be a conscious decision, in the open. > Do we need to decouple tuple descriptors from pg_attribute altogether? Yes. Very clearly. The amount of memory and runtime we spent on tupledescs is disproportionate. A second angle is that we build tupledescs way way too frequently. The executor infers them everywhere, so not even prepared statements protect against that. > How do we decide what goes into the tuple descriptor and what does not? I'm > interested in addressing this, because obviously we do want the ability to > add more features in the future, but I don't know what the direction should > be. We've had some prior discussion around this, see e.g. https://postgr.es/m/20210819114435.6r532qbadcsyfscp%40alap3.anarazel.de Greetings, Andres Freund
Re: Transparent column encryption
On 13.03.23 22:11, Andres Freund wrote: It adds branches, and it makes tupledescs wider. In tight spots, such as printtup, that can hurt, even if the branches aren't ever entered. In fact, I do see a noticable, but not huge, regression: I tried to reproduce your measurements, but I don't have the CPU affinity tools like that on macOS, so the differences were lost in the noise. (The absolute numbers looked very similar to yours.) I can reproduce a regression if I keep adding more columns to pg_attribute, like 8 OID columns does start to show. I investigated whether I could move some columns to the "variable-length" part outside the tuple descriptor, but that would require major surgery in heap.c and tablecmds.c (MergeAttributes() ... shudder), and also elsewhere, where you currently only have a tuple descriptor for looking up stuff. How do we proceed here? A lot of user-facing table management stuff like compression, statistics, collation, dropped columns, and probably future features like column reordering or periods, have to be represented in pg_attribute somehow, at least in the current architecture. Do we hope that hardware keeps up with the pace at which we add new features? Do we need to decouple tuple descriptors from pg_attribute altogether? How do we decide what goes into the tuple descriptor and what does not? I'm interested in addressing this, because obviously we do want the ability to add more features in the future, but I don't know what the direction should be.
Re: Transparent column encryption
Hi, On 2023-03-13 13:41:19 -0700, Andres Freund wrote: > On 2023-03-13 21:22:29 +0100, Peter Eisentraut wrote: > > On 12.03.23 01:11, Andres Freund wrote: > > > Have you done benchmarks of some simple workloads to verify this doesn't > > > cause > > > slowdowns (when not using encryption, obviously)? printtup.c is a > > > performance > > > sensitive portion for simple queries, particularly when they return > > > multiple > > > columns. > > > > The additional code isn't used when column encryption is off, so there > > shouldn't be any impact. > > It adds branches, and it makes tupledescs wider. In tight spots, such as > printtup, that can hurt, even if the branches aren't ever entered. In fact, I do see a noticable, but not huge, regression: $ cat /tmp/test.sql SELECT * FROM pg_class WHERE oid = 1247; c=1;taskset -c 10 pgbench -n -M prepared -c$c -j$c -f /tmp/test.sql -P1 -T10 with the server also pinned to core 1, and turbo boost disabled. Nothing else is allowed to run on the core, or its hyperthread sibling. This is my setup for comparing performance with the least noise in general, not related to this patch. head: 28495.858509 28823.055643 28731.074311 patch: 28298.498851 28285.426532 28489.359569 A ~1.1% loss. pipelined 50 statements (pgbench pinned to a different otherwise unused core) head: 1147.404506 1147.587475 1151.976547 patch: 1126.525708 1122.375337 1119.088734 A ~2.2% loss. That might not be prohibitive, but it does seem worth analyzing. Greetings, Andres Freund
Re: Transparent column encryption
Hi, On 2023-03-13 21:22:29 +0100, Peter Eisentraut wrote: > On 12.03.23 01:11, Andres Freund wrote: > > Have you done benchmarks of some simple workloads to verify this doesn't > > cause > > slowdowns (when not using encryption, obviously)? printtup.c is a > > performance > > sensitive portion for simple queries, particularly when they return multiple > > columns. > > The additional code isn't used when column encryption is off, so there > shouldn't be any impact. It adds branches, and it makes tupledescs wider. In tight spots, such as printtup, that can hurt, even if the branches aren't ever entered. Greetings, Andres Freund
Re: Transparent column encryption
On 12.03.23 01:11, Andres Freund wrote: Have you done benchmarks of some simple workloads to verify this doesn't cause slowdowns (when not using encryption, obviously)? printtup.c is a performance sensitive portion for simple queries, particularly when they return multiple columns. The additional code isn't used when column encryption is off, so there shouldn't be any impact.
Re: Transparent column encryption
Hi, On 2023-03-10 08:18:34 +0100, Peter Eisentraut wrote: > Here is an updated patch. I have done some cosmetic polishing and fixed a > minor Windows-related bug. > > In my mind, the patch is complete. > > If someone wants to do some in-depth code review, I suggest focusing on the > following files: > > * src/backend/access/common/printtup.c Have you done benchmarks of some simple workloads to verify this doesn't cause slowdowns (when not using encryption, obviously)? printtup.c is a performance sensitive portion for simple queries, particularly when they return multiple columns. And making tupledescs even wider is likely to have some price, both due to the increase in memory usage, and due to the lower cache density - and that's code where we're already hurting noticeably. Greetings, Andres Freund
Re: Transparent column encryption
> On Mar 9, 2023, at 11:18 PM, Peter Eisentraut > wrote: > > Here is an updated patch. Thanks, Peter. The patch appears to be in pretty good shape, but I have a few comments and concerns. CEKIsVisible() and CMKIsVisible() are obviously copied from TSParserIsVisible(), and the code comments weren't fully updated. Specifically, the phrase "hidden by another parser of the same name" should be updated to not mention "parser". Why does get_cmkalg_name() return the string "unspecified" for PG_CMK_UNSPECIFIED, but the next function get_cmkalg_jwa_name() returns NULL for PG_CMK_UNSPECIFIED? It seems they would both return NULL, or both return "unspecified". If there's a reason for the divergence, could you add a code comment to clarify? BTW, get_cmkalg_jwa_name() has no test coverage. Looking further at code coverage, the new conditional in printsimple_startup() is never tested with (MyProcPort->column_encryption_enabled), so the block is never entered. This would seem to be a consequence of backends like walsender not using column encryption, which is not terribly surprising, but it got me wondering if you had a particular client use case in mind when you added this block? The new function pg_encrypted_in() appears totally untested, but I have to wonder if that's because we're not round-trip testing pg_dump with column encryption...? The code coverage in pg_dump looks fairly decent, but some column encryption code is not covered. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Transparent column encryption
On 12.02.23 15:48, Mark Dilger wrote: I should mention, src/sgml/html/libpq-exec.html needs clarification: paramFormats[]Specifies whether parameters are text (put a zero in the array entry for the corresponding parameter) or binary (put a one in the array entry for the corresponding parameter). If the array pointer is null then all parameters are presumed to be text strings. Perhaps you should edit this last sentence to say that all parameters are presumed to be test strings without forced encryption. This is actually already mentioned later in that section. When column encryption is enabled, the second-least-significant byte of this parameter specifies whether encryption should be forced for a parameter. The value 0x10 has a one as its second-least-significant *nibble*, but that's a really strange way to describe the high-order nibble, and perhaps not what you mean. Could you clarify? Set this byte to one to force encryption. I think setting the byte to one (0x01) means "binary unencrypted", not "force encryption". Don't you mean to set this byte to 16? For example, use the C code literal 0x10 to specify text format with forced encryption. If the array pointer is null then encryption is not forced for any parameter. If encryption is forced for a parameter but the parameter does not correspond to an encrypted column on the server, then the call will fail and the parameter will not be sent. This can be used for additional security against a compromised server. (The drawback is that application code then needs to be kept up to date with knowledge about which columns are encrypted rather than letting the server specify this.) This was me being confused. I adjusted the text to use "half-byte". I think you should say something about how specifying 0x11 will behave -- in other words, asking for encrypted binary data. I believe that this is will draw a "format must be text for encrypted parameter" error, and that the docs should clearly say so. done
Re: Transparent column encryption
On 11.02.23 22:54, Mark Dilger wrote: Thanks Peter. Here are some observations about the documentation in patch version 15. In acronyms.sgml, the CEK and CMK entries should link to documentation, perhaps linkend="glossary-column-encryption-key" and linkend="glossary-column-master-key". These glossary entries should in turn link to linkend="ddl-column-encryption". In ddl.sgml, the sentence "forcing encryption of certain parameters in the client library (see its documentation)" should link to linkend="libpq-connect-column-encryption". Did you intend the use of "transparent data encryption" (rather than "transparent column encryption") in datatype.sgml? If so, what's the difference? There are all addressed in the v16 patch I just posted. Is this feature intended to be available from ecpg? If so, can we maybe include an example in 36.3.4. Prepared Statements showing how to pass the encrypted values securely. If not, can we include verbiage about that limitation, so folks don't waste time trying to figure out how to do it? It should "just work". I will give this a try sometime, but I don't see why it wouldn't work. The documentation for pg_dump (and pg_dumpall) now includes a --decrypt-encrypted-columns option, which I suppose requires cmklookup to first be configured, and for PGCMKLOOKUP to be exported. There isn't anything in the pg_dump docs about this, though, so maybe a link to section 5.5.3 with a warning about not running pg_dump this way on the database server itself? Also addressed in v16. How does a psql user mark a parameter as having forced encryption? A libpq user can specify this in the paramFormats array, but I don't see any syntax for doing this from psql. None of the perl tap tests you've included appear to do this (except indirectly when calling test_client); grep'ing for the libpq error message "parameter with forced encryption is not to be encrypted" in the tests has no matches. Is it just not possible? I thought you'd mentioned some syntax for this when we spoke in person, but I don't see it now. This has been asked about before. We just need to come up with a syntax for it. The issue is contained inside psql.
Re: Transparent column encryption
> On Feb 11, 2023, at 1:54 PM, Mark Dilger wrote: > > Here are some observations I should mention, src/sgml/html/libpq-exec.html needs clarification: > paramFormats[]Specifies whether parameters are text (put a zero in the array > entry for the corresponding parameter) or binary (put a one in the array > entry for the corresponding parameter). If the array pointer is null then all > parameters are presumed to be text strings. Perhaps you should edit this last sentence to say that all parameters are presumed to be test strings without forced encryption. > Values passed in binary format require knowledge of the internal > representation expected by the backend. For example, integers must be passed > in network byte order. Passing numeric values requires knowledge of the > server storage format, as implemented in > src/backend/utils/adt/numeric.c::numeric_send() and > src/backend/utils/adt/numeric.c::numeric_recv(). > When column encryption is enabled, the second-least-significant byte of this > parameter specifies whether encryption should be forced for a parameter. The value 0x10 has a one as its second-least-significant *nibble*, but that's a really strange way to describe the high-order nibble, and perhaps not what you mean. Could you clarify? > Set this byte to one to force encryption. I think setting the byte to one (0x01) means "binary unencrypted", not "force encryption". Don't you mean to set this byte to 16? > For example, use the C code literal 0x10 to specify text format with forced > encryption. If the array pointer is null then encryption is not forced for > any parameter. > If encryption is forced for a parameter but the parameter does not correspond > to an encrypted column on the server, then the call will fail and the > parameter will not be sent. This can be used for additional security against > a compromised server. (The drawback is that application code then needs to be > kept up to date with knowledge about which columns are encrypted rather than > letting the server specify this.) I think you should say something about how specifying 0x11 will behave -- in other words, asking for encrypted binary data. I believe that this is will draw a "format must be text for encrypted parameter" error, and that the docs should clearly say so. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Transparent column encryption
> On Jan 25, 2023, at 10:44 AM, Peter Eisentraut > wrote: > > Here is a new patch. Changes since v14: > > - Fixed some typos (review by Justin Pryzby) > - Fixed backward compat. psql and pg_dump (review by Justin Pryzby) > - Doc additions (review by Jacob Champion) > - Validate column_encryption option in libpq (review by Jacob Champion) > - Handle column encryption in inheritance > - Change CEKs and CMKs to live inside > schemas Thanks Peter. Here are some observations about the documentation in patch version 15. In acronyms.sgml, the CEK and CMK entries should link to documentation, perhaps linkend="glossary-column-encryption-key" and linkend="glossary-column-master-key". These glossary entries should in turn link to linkend="ddl-column-encryption". In ddl.sgml, the sentence "forcing encryption of certain parameters in the client library (see its documentation)" should link to linkend="libpq-connect-column-encryption". Did you intend the use of "transparent data encryption" (rather than "transparent column encryption") in datatype.sgml? If so, what's the difference? Is this feature intended to be available from ecpg? If so, can we maybe include an example in 36.3.4. Prepared Statements showing how to pass the encrypted values securely. If not, can we include verbiage about that limitation, so folks don't waste time trying to figure out how to do it? The documentation for pg_dump (and pg_dumpall) now includes a --decrypt-encrypted-columns option, which I suppose requires cmklookup to first be configured, and for PGCMKLOOKUP to be exported. There isn't anything in the pg_dump docs about this, though, so maybe a link to section 5.5.3 with a warning about not running pg_dump this way on the database server itself? How does a psql user mark a parameter as having forced encryption? A libpq user can specify this in the paramFormats array, but I don't see any syntax for doing this from psql. None of the perl tap tests you've included appear to do this (except indirectly when calling test_client); grep'ing for the libpq error message "parameter with forced encryption is not to be encrypted" in the tests has no matches. Is it just not possible? I thought you'd mentioned some syntax for this when we spoke in person, but I don't see it now. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Transparent column encryption
On Tue, Jan 31, 2023 at 5:26 AM Peter Eisentraut wrote: > See here for example: > https://learn.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-database-engine?view=sql-server-ver15 Hm. I notice they haven't implemented more than one algorithm, so I wonder if they're going to be happy with their decision to mix-and-match when number two arrives. > For pg_dump, I'd like a mode that makes all values parameters of an > INSERT statement. But obviously not all of those will be encrypted. So > I think we'd want a per-parameter syntax. Makes sense. Maybe something that defaults to encrypted with opt-out per parameter? UPDATE t SET name = $1 WHERE id = $2 \encbind "Jacob" plaintext(24) > > More concretely: should psql allow you to push arbitrary text into an > > encrypted \bind parameter, like it does now? > > We don't have any data type awareness like that now in psql or libpq. > It would be quite a change to start now. I agree. It just feels weird that a misbehaving client can "attack" the other client implementations using it, and we don't make any attempt to mitigate it. --Jacob
Re: Transparent column encryption
On 30.01.23 23:30, Jacob Champion wrote: The column encryption algorithm is set per-column -- but isn't it tightly coupled to the CEK, since the key length has to match? From a layperson perspective, using the same key to encrypt the same plaintext under two different algorithms (if they happen to have the same key length) seems like it might be cryptographically risky. Is there a reason I should be encouraged to do that? Not really. I was also initially confused by this setup, but that's how other similar systems are set up, so I thought it would be confusing to do it differently. Which systems let you mix and match keys and algorithms this way? I'd like to take a look at them. See here for example: https://learn.microsoft.com/en-us/sql/relational-databases/security/encryption/always-encrypted-database-engine?view=sql-server-ver15 With the loss of \gencr it looks like we also lost a potential way to force encryption from within psql. Any plans to add that for v1? \gencr didn't do that either. We could do it. The libpq API supports it. We just need to come up with some syntax for psql. Do you think people would rather set encryption for all parameters at once -- something like \encbind -- or have the ability to mix encrypted and unencrypted parameters? For pg_dump, I'd like a mode that makes all values parameters of an INSERT statement. But obviously not all of those will be encrypted. So I think we'd want a per-parameter syntax. More concretely: should psql allow you to push arbitrary text into an encrypted \bind parameter, like it does now? We don't have any data type awareness like that now in psql or libpq. It would be quite a change to start now. How would that deal with data type extensibility, is an obvious question to start with. Don't know.
Re: Transparent column encryption
On Wed, Jan 25, 2023 at 11:00 AM Peter Eisentraut wrote: > > When writing the paragraph on RSA-OAEP I was reminded that we didn't > > really dig into the asymmetric/symmetric discussion. Assuming that most > > first-time users will pick the builtin CMK encryption method, do we > > still want to have an asymmetric scheme implemented first instead of a > > symmetric keywrap? I'm still concerned about that public key, since it > > can't really be made public. > > I had started coding that, but one problem was that the openssl CLI > doesn't really provide any means to work with those kinds of keys. The > "openssl enc" command always wants to mix in a password. Without that, > there is no way to write a test case, and more crucially no way for > users to set up these kinds of keys. Unless we write our own tooling > for this, which, you know, the patch just passed 400k in size. Arrgh: https://github.com/openssl/openssl/issues/10605 > > The column encryption algorithm is set per-column -- but isn't it > > tightly coupled to the CEK, since the key length has to match? From a > > layperson perspective, using the same key to encrypt the same plaintext > > under two different algorithms (if they happen to have the same key > > length) seems like it might be cryptographically risky. Is there a > > reason I should be encouraged to do that? > > Not really. I was also initially confused by this setup, but that's how > other similar systems are set up, so I thought it would be confusing to > do it differently. Which systems let you mix and match keys and algorithms this way? I'd like to take a look at them. > > With the loss of \gencr it looks like we also lost a potential way to > > force encryption from within psql. Any plans to add that for v1? > > \gencr didn't do that either. We could do it. The libpq API supports > it. We just need to come up with some syntax for psql. Do you think people would rather set encryption for all parameters at once -- something like \encbind -- or have the ability to mix encrypted and unencrypted parameters? > > Are there plans to document client-side implementation requirements, to > > ensure cross-client compatibility? Things like the "PG\x00\x01" > > associated data are buried at the moment (or else I've missed them in > > the docs). If you're holding off until the feature is more finalized, > > that's fine too. > > This is documented in the protocol chapter, which I thought was the > right place. Did you want more documentation, or in a different place? I just missed it; sorry. > > Speaking of cross-client compatibility, I'm still disconcerted by the > > ability to write the value "hello world" into an encrypted integer > > column. Should clients be required to validate the text format, using > > the attrealtypid? > > Well, we can ask them to, but we can't really require them, in a > cryptographic sense. I'm not sure what more we can do. Right -- I just mean that clients need to pay more attention to it now, whereas before they may have delegated correctness to the server. The problem is documented in the context of deterministic encryption, but I think it applies to randomized as well. More concretely: should psql allow you to push arbitrary text into an encrypted \bind parameter, like it does now? > > It occurred to me when looking at the "unspecified" CMK scheme that the > > CEK doesn't really have to be an encryption key at all. In that case it > > can function more like a (possibly signed?) cookie for lookup, or even > > be ignored altogether if you don't want to use a wrapping scheme > > (similar to JWE's "direct" mode, maybe?). So now you have three ways to > > look up or determine a column encryption key (CMK realm, CMK name, CEK > > cookie)... is that a concept worth exploring in v1 and/or the documentation? > > I don't completely follow this. Yeah, I'm not expressing it very well. My feeling is that the organization system here -- a realm "contains" multiple CMKs, a CMK encrypts multiple CEKs -- is so general and flexible that it may need some suggested guardrails for people to use it sanely. I just don't know what those guardrails should be. I was motivated by the realization that CEKs don't even need to be keys. Thanks, --Jacob
Re: Transparent column encryption
On 19.01.23 21:48, Jacob Champion wrote: I like the existing "caveats" documentation, and I've attached a sample patch with some more caveats documented, based on some of the upthread conversation: - text format makes fixed-length columns leak length information too - you only get partial protection against the Evil DBA - RSA-OAEP public key safety (Feel free to use/remix/discard as desired.) I have added those in the v15 patch I just posted. When writing the paragraph on RSA-OAEP I was reminded that we didn't really dig into the asymmetric/symmetric discussion. Assuming that most first-time users will pick the builtin CMK encryption method, do we still want to have an asymmetric scheme implemented first instead of a symmetric keywrap? I'm still concerned about that public key, since it can't really be made public. I had started coding that, but one problem was that the openssl CLI doesn't really provide any means to work with those kinds of keys. The "openssl enc" command always wants to mix in a password. Without that, there is no way to write a test case, and more crucially no way for users to set up these kinds of keys. Unless we write our own tooling for this, which, you know, the patch just passed 400k in size. For the padding caveat: + There is no concern if all values are of the same length (e.g., credit + card numbers). I nodded along to this statement last year, and then this year I learned that CCNs aren't fixed-length. So with a 16-byte block, you're probably going to be able to figure out who has an American Express card. Heh. I have removed that parenthetical remark. The column encryption algorithm is set per-column -- but isn't it tightly coupled to the CEK, since the key length has to match? From a layperson perspective, using the same key to encrypt the same plaintext under two different algorithms (if they happen to have the same key length) seems like it might be cryptographically risky. Is there a reason I should be encouraged to do that? Not really. I was also initially confused by this setup, but that's how other similar systems are set up, so I thought it would be confusing to do it differently. With the loss of \gencr it looks like we also lost a potential way to force encryption from within psql. Any plans to add that for v1? \gencr didn't do that either. We could do it. The libpq API supports it. We just need to come up with some syntax for psql. While testing, I forgot how the new option worked and connected with `column_encryption=on` -- and then I accidentally sent unencrypted data to the server, since `on` means "not enabled". :( The server errors out after the damage is done, of course, but would it be okay to strictly validate that option's values? fixed in v15 Are there plans to document client-side implementation requirements, to ensure cross-client compatibility? Things like the "PG\x00\x01" associated data are buried at the moment (or else I've missed them in the docs). If you're holding off until the feature is more finalized, that's fine too. This is documented in the protocol chapter, which I thought was the right place. Did you want more documentation, or in a different place? Speaking of cross-client compatibility, I'm still disconcerted by the ability to write the value "hello world" into an encrypted integer column. Should clients be required to validate the text format, using the attrealtypid? Well, we can ask them to, but we can't really require them, in a cryptographic sense. I'm not sure what more we can do. It occurred to me when looking at the "unspecified" CMK scheme that the CEK doesn't really have to be an encryption key at all. In that case it can function more like a (possibly signed?) cookie for lookup, or even be ignored altogether if you don't want to use a wrapping scheme (similar to JWE's "direct" mode, maybe?). So now you have three ways to look up or determine a column encryption key (CMK realm, CMK name, CEK cookie)... is that a concept worth exploring in v1 and/or the documentation? I don't completely follow this.
Re: Transparent column encryption
On 12.01.23 17:32, Peter Eisentraut wrote: Can we do anything about the attack vector wherein a malicious DBA simply copies the encrypted datum from one row to another? We discussed this earlier [0]. This patch is not that feature. We could get there eventually, but it would appear to be an immense amount of additional work. We have to start somewhere. I've been thinking, this could be done as a "version 2" of the currently proposed feature, within the same framework. We'd extend the RowDescription and ParameterDescription messages to provide primary key information, some flags, then the client would have enough to know what to do. As you wrote in your follow-up message, a challenge would be to handle statements that do not touch all the columns. We'd need to work through this and consider all the details.
Re: Transparent column encryption
On 07.01.23 01:34, Justin Pryzby wrote: "ON (CASE WHEN a.attrealtypid <> 0 THEN a.attrealtypid ELSE a.atttypid END = t.oid)\n" This breaks interoperability with older servers: ERROR: column a.attrealtypid does not exist Same in describe.c Find attached some typos and bad indentation. I'm sending this off now as I've already sat on it for 2 weeks since starting to look at the patch. Thanks, I have integrated all that into the v15 patch I just posted.
Re: Transparent column encryption
On 12/31/22 06:17, Peter Eisentraut wrote: > On 21.12.22 06:46, Peter Eisentraut wrote: >> And another update. The main changes are that I added an 'unspecified' >> CMK algorithm, which indicates that the external KMS knows what it is >> but the database system doesn't. This was discussed a while ago. I >> also changed some details about how the "cmklookup" works in libpq. Also >> added more code comments and documentation and rearranged some code. Trying to delay a review until I had "completed it" has only led to me not reviewing, so here's a partial one. Let me know what pieces of the implementation and/or architecture you're hoping to get more feedback on. I like the existing "caveats" documentation, and I've attached a sample patch with some more caveats documented, based on some of the upthread conversation: - text format makes fixed-length columns leak length information too - you only get partial protection against the Evil DBA - RSA-OAEP public key safety (Feel free to use/remix/discard as desired.) When writing the paragraph on RSA-OAEP I was reminded that we didn't really dig into the asymmetric/symmetric discussion. Assuming that most first-time users will pick the builtin CMK encryption method, do we still want to have an asymmetric scheme implemented first instead of a symmetric keywrap? I'm still concerned about that public key, since it can't really be made public. (And now that "unspecified" is available, I think an asymmetric CMK could be easily created by users that have a niche use case, and then we wouldn't have to commit to supporting it forever.) For the padding caveat: > + There is no concern if all values are of the same length (e.g., credit > + card numbers). I nodded along to this statement last year, and then this year I learned that CCNs aren't fixed-length. So with a 16-byte block, you're probably going to be able to figure out who has an American Express card. The column encryption algorithm is set per-column -- but isn't it tightly coupled to the CEK, since the key length has to match? From a layperson perspective, using the same key to encrypt the same plaintext under two different algorithms (if they happen to have the same key length) seems like it might be cryptographically risky. Is there a reason I should be encouraged to do that? With the loss of \gencr it looks like we also lost a potential way to force encryption from within psql. Any plans to add that for v1? While testing, I forgot how the new option worked and connected with `column_encryption=on` -- and then I accidentally sent unencrypted data to the server, since `on` means "not enabled". :( The server errors out after the damage is done, of course, but would it be okay to strictly validate that option's values? Are there plans to document client-side implementation requirements, to ensure cross-client compatibility? Things like the "PG\x00\x01" associated data are buried at the moment (or else I've missed them in the docs). If you're holding off until the feature is more finalized, that's fine too. Speaking of cross-client compatibility, I'm still disconcerted by the ability to write the value "hello world" into an encrypted integer column. Should clients be required to validate the text format, using the attrealtypid? It occurred to me when looking at the "unspecified" CMK scheme that the CEK doesn't really have to be an encryption key at all. In that case it can function more like a (possibly signed?) cookie for lookup, or even be ignored altogether if you don't want to use a wrapping scheme (similar to JWE's "direct" mode, maybe?). So now you have three ways to look up or determine a column encryption key (CMK realm, CMK name, CEK cookie)... is that a concept worth exploring in v1 and/or the documentation? Thanks, --Jacobdiff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index 55f33a2f5f..06e1c077d5 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1588,7 +1588,33 @@ export PGCMKLOOKUP card numbers). But if there are signficant length differences between valid values and that length information is security-sensitive, then application-specific workarounds such as padding would need to be applied. -How to do that securely is beyond the scope of this manual. +How to do that securely is beyond the scope of this manual. Note that +column encryption is applied to the text representation of the stored value, +so length differences can be leaked even for fixed-length column types (e.g. +bigint, whose largest decimal representation is longer +than 16 bytes). + + + +Column encryption provides only partial protection against a malicious +user with write access to the table. Once encrypted, any modifications to a +stored value on the server side will cause a decryption failure on the +client. However, a user with write access may still freely swap encrypted +values between rows or columns (o
Re: Transparent column encryption
On 10.01.23 18:26, Mark Dilger wrote: I wonder if logical replication could be made to work more easily with this feature. Specifically, subscribers of encrypted columns will need the encrypted column encryption key (CEK) and the name of the column master key (CMD) as exists on the publisher, but getting access to that is not automated as far as I can see. It doesn't come through automatically as part of a subscription, and publisher's can't publish the pg_catalog tables where the keys are kept (because publishing system tables is not supported.) Is it reasonable to make available the CEK and CMK to subscribers in an automated fashion, to facilitate setting up logical replication with less manual distribution of key information? Is this already done, and I'm just not recognizing that you've done it? This would be done as part of DDL replication. Can we do anything about the attack vector wherein a malicious DBA simply copies the encrypted datum from one row to another? We discussed this earlier [0]. This patch is not that feature. We could get there eventually, but it would appear to be an immense amount of additional work. We have to start somewhere. [0]: https://www.postgresql.org/message-id/4fbcf5540633699fc3d81ffb59cb0ac884673a7c.ca...@vmware.com
Re: Transparent column encryption
On Sat, 31 Dec 2022 at 19:47, Peter Eisentraut wrote: > > On 21.12.22 06:46, Peter Eisentraut wrote: > > And another update. The main changes are that I added an 'unspecified' > > CMK algorithm, which indicates that the external KMS knows what it is > > but the database system doesn't. This was discussed a while ago. I > > also changed some details about how the "cmklookup" works in libpq. Also > > added more code comments and documentation and rearranged some code. > > > > According to my local todo list, this patch is now complete. > > Another update, with some merge conflicts resolved. I also fixed up the > remaining TODO markers in the code, which had something to do with Perl > and Windows. I did some more work on schema handling, e.g., CREATE > TABLE / LIKE, views, partitioning etc. on top of encrypted columns, > mostly tedious and repetitive, nothing interesting. I also rewrote the > code that extracts the underlying tables and columns corresponding to > query parameters. It's now much simpler and better encapsulated. The patch does not apply on top of HEAD as in [1], please post a rebased patch: === Applying patches on top of PostgreSQL commit ID 5f6401f81cb24bd3930e0dc589fc4aa8b5424cdc === === applying patch ./v14-0001-Transparent-column-encryption.patch Hunk #1 FAILED at 1109. 1 out of 5 hunks FAILED -- saving rejects to file doc/src/sgml/protocol.sgml.rej patching file doc/src/sgml/ref/create_table.sgml Hunk #3 FAILED at 351. Hunk #4 FAILED at 704. 2 out of 4 hunks FAILED -- saving rejects to file doc/src/sgml/ref/create_table.sgml.rej Hunk #1 FAILED at 1420. Hunk #2 FAILED at 4022. 2 out of 2 hunks FAILED -- saving rejects to file doc/src/sgml/ref/psql-ref.sgml.rej [1] - http://cfbot.cputube.org/patch_41_3718.log Regards, Vignesh
Re: Transparent column encryption
> On Jan 10, 2023, at 9:26 AM, Mark Dilger wrote: > >-- Cryptographically connected to the encrypted record >patient_id BIGINT NOT NULL, >patient_ssn CHAR(11), > >-- The encrypted record >patient_record TEXT ENCRYPTED WITH (column_encryption_key = cek1, >column_encryption_salt = (patient_id, > patient_ssn)), As you mention upthread, tying columns together creates problems for statements that only operate on a subset of columns. Allowing schema designers a choice about tying the encrypted column to zero or more other columns allows them to choose which works best for their security needs. The example above would make a statement like "UPDATE patient_record SET patient_record = $1 \bind '{some json whatever}'" raise an exception at the libpq client level, but maybe that's what schema designers wants it to do. If not, they should omit the column_encryption_salt option in the create table statement; but if so, they should expect to have to specify the other columns as part of the update statement, possibly as part of the where clause, like UPDATE patient_record SET patient_record = $1 WHERE patient_id = 12345 AND patient_ssn = '111-11-' \bind '{some json record}' and have the libpq get the salt column values from the where clause (which may be tricky to implement), or perhaps use some new bind syntax like UPDATE patient_record SET patient_record = ($1:$2,$3) -- new, wonky syntax WHERE patient_id = $2 AND patient_ssn = $3 \bind '{some json record}' 12345 '111-11-' which would be error prone, since the sql statement could specify the ($1:$2,$3) inconsistently with the where clause, or perhaps specify the "new" salt columns even when not changed, like UPDATE patient_record SET patient_record = $1, patient_id = 12345, patient_ssn = "111-11-" WHERE patient_id = 12345 AND patient_ssn = "111-11-" \bind '{some json record}' which looks kind of nuts at first glance, but is grammatically consistent with cases where one or both of the patient_id or patient_ssn are also being changed, like UPDATE patient_record SET patient_record = $1, patient_id = 98765, patient_ssn = "999-99-" WHERE patient_id = 12345 AND patient_ssn = "111-11-" \bind '{some json record}' Or, of course, you can ignore these suggestions or punt them to some future patch that extends the current work, rather than trying to get it all done in the first column encryption commit. But it seems useful to think about what future directions would be, to avoid coding ourselves into a corner, making such future work harder. — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Transparent column encryption
> On Dec 31, 2022, at 6:17 AM, Peter Eisentraut > wrote: > > Another update, with some merge conflicts resolved. Hi Peter, thanks for the patch! I wonder if logical replication could be made to work more easily with this feature. Specifically, subscribers of encrypted columns will need the encrypted column encryption key (CEK) and the name of the column master key (CMD) as exists on the publisher, but getting access to that is not automated as far as I can see. It doesn't come through automatically as part of a subscription, and publisher's can't publish the pg_catalog tables where the keys are kept (because publishing system tables is not supported.) Is it reasonable to make available the CEK and CMK to subscribers in an automated fashion, to facilitate setting up logical replication with less manual distribution of key information? Is this already done, and I'm just not recognizing that you've done it? Can we do anything about the attack vector wherein a malicious DBA simply copies the encrypted datum from one row to another? Imagine the DBA Alice wants to murder a hospital patient Bob by removing the fact that Bob is deathly allergic to latex. She cannot modify the Bob's encrypted and authenticated record, but she can easily update Bob's record with the encrypted record of a different patient Charlie. Likewise, if she want's Bob to pay Charlie's bill, she can replace Charlie's encrypted credit card number with Bob's, and once Bob is dead, he won't dispute the charges. An encrypted-and-authenticated column value should be connected with its row in some way that Alice cannot circumvent. In the patch as you have it written, the client application can include row information in the patient record (specifically, the patient's name, ssn, etc) and verify when any patient record is retrieved that this information matches. But that's hardly "transparent" to the client. It's something all clients will have to do, and easy to forget to do in some code path. Also, for encrypted fixed-width columns, it is not an option. So it seems the client needs to "salt" (maybe not the right term for what I have in mind) the encryption with some relevant other columns, and that's something the libpq client would need to understand, and something the patch's syntax needs to support. Something like: CREATE TABLE patient_records ( -- Cryptographically connected to the encrypted record patient_id BIGINT NOT NULL, patient_ssn CHAR(11), -- The encrypted record patient_record TEXT ENCRYPTED WITH (column_encryption_key = cek1, column_encryption_salt = (patient_id, patient_ssn)), -- Extra stuff, not cryptographically connected to anything next_of_kin TEXT, phone_number BIGINT, ... ); I have not selected any algorithms that include such "salt"ing (again, maybe the wrong word) because I'm just trying to discuss the general feature, not get into the weeds about which cryptographic algorithm to select. Thoughts? — Mark Dilger EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Transparent column encryption
"ON (CASE WHEN a.attrealtypid <> 0 THEN a.attrealtypid ELSE a.atttypid END = t.oid)\n" This breaks interoperability with older servers: ERROR: column a.attrealtypid does not exist Same in describe.c Find attached some typos and bad indentation. I'm sending this off now as I've already sat on it for 2 weeks since starting to look at the patch. -- Justin >From d9dcf23d25ba4452fb12c4b065ab5215e2228882 Mon Sep 17 00:00:00 2001 From: Justin Pryzby Date: Fri, 6 Jan 2023 18:28:59 -0600 Subject: [PATCH] f! --- doc/src/sgml/datatype.sgml | 4 ++-- doc/src/sgml/ddl.sgml | 4 ++-- doc/src/sgml/libpq.sgml| 2 +- doc/src/sgml/protocol.sgml | 2 +- src/bin/pg_dump/pg_dump.c | 4 ++-- src/interfaces/libpq/fe-exec.c | 12 ++-- 6 files changed, 14 insertions(+), 14 deletions(-) diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml index 56b2e1d0d1e..243e6861506 100644 --- a/doc/src/sgml/datatype.sgml +++ b/doc/src/sgml/datatype.sgml @@ -5369,7 +5369,7 @@ WHERE ... pg_encrypted_rnd (for randomized encryption) or pg_encrypted_det (for deterministic encryption); see . Most of the database system treats -this as normal types. For example, the type pg_encrypted_det has +these as normal types. For example, the type pg_encrypted_det has an equals operator that allows lookup of encrypted values. It is, however, not allowed to create a table using one of these types directly as a column type. @@ -5383,7 +5383,7 @@ WHERE ... Clients that don't support transparent column encryption or have disabled it will see the encrypted values in this format. Clients that support transparent data encryption will not see these types in result sets, as -the protocol layer will translate them back to declared underlying type in +the protocol layer will translate them back to the declared underlying type in the table definition. diff --git a/doc/src/sgml/ddl.sgml b/doc/src/sgml/ddl.sgml index b8364a91f9a..a7624d6a60c 100644 --- a/doc/src/sgml/ddl.sgml +++ b/doc/src/sgml/ddl.sgml @@ -1263,7 +1263,7 @@ CREATE TABLE customers ( randomized encryption, which is the default. Randomized encryption uses a random initialization vector for each encryption, so that even if the plaintext of two rows is equal, the encrypted values will be different. - This prevents someone with direct access to the database server to make + This prevents someone with direct access to the database server from making computations such as distinct counts on the encrypted values. Deterministic encryption uses a fixed initialization vector. This reduces security, but it allows equality searches on encrypted values. The @@ -1540,7 +1540,7 @@ export PGCMKLOOKUP In general, column encryption is never a replacement for additional -security and encryption techniques such as transmission encryption +security and encryption techniques such as transport encryption (SSL/TLS), storage encryption, strong access control, and password security. Column encryption only targets specific use cases and should be used in conjunction with additional security measures. diff --git a/doc/src/sgml/libpq.sgml b/doc/src/sgml/libpq.sgml index a2d413bafd5..9b7f76db7a9 100644 --- a/doc/src/sgml/libpq.sgml +++ b/doc/src/sgml/libpq.sgml @@ -3012,7 +3012,7 @@ PGresult *PQexecParams(PGconn *conn, If encryption is forced for a parameter but the parameter does not correspond to an encrypted column on the server, then the call will fail and the parameter will not be sent. This can be used -for additional security against a comprimised server. (The +for additional security against a compromised server. (The drawback is that application code then needs to be kept up to date with knowledge about which columns are encrypted rather than letting the server specify this.) diff --git a/doc/src/sgml/protocol.sgml b/doc/src/sgml/protocol.sgml index 1a9b8abd7f2..caa6e3174ee 100644 --- a/doc/src/sgml/protocol.sgml +++ b/doc/src/sgml/protocol.sgml @@ -5807,7 +5807,7 @@ psql "dbname=postgres replication=database" -c "IDENTIFY_SYSTEM;" If the field is encrypted, this specifies the identifier of the - encrypt algorithm, else zero. + encryption algorithm, else zero. diff --git a/src/bin/pg_dump/pg_dump.c b/src/bin/pg_dump/pg_dump.c index f82c2496fd5..99f2583c34a 100644 --- a/src/bin/pg_dump/pg_dump.c +++ b/src/bin/pg_dump/pg_dump.c @@ -690,7 +690,7 @@ main(int argc, char **argv) * --rows-per-insert were specified. */ if (dopt.cparams.column_encryption && dopt.dump_inserts == 0) - pg_fatal("option --decrypt_encrypted_columns requires option --inserts, --rows-per-insert, or --column-inserts"); + pg_fatal("option --decrypt-encrypted-columns requires
Re: Transparent column encryption
On Wed, 23 Nov 2022 19:45:06 +0100 Peter Eisentraut wrote: > On 28.10.22 12:16, Jehan-Guillaume de Rorthais wrote: [...] > >* I wonder if encryption related fields in ParameterDescription and > > RowDescription could be optional somehow? The former might be quite > > large when using a lot of parameters (like, imagine a large and ugly > > "IN($1...$N)"). On the other hand, these messages are not sent in high > > frequency anyway... > > They are only used if you turn on the column_encryption protocol option. > Or did you mean make them optional even then? I meant even when column_encryption is turned on. Regards,
Re: Transparent column encryption
On 28.10.22 12:16, Jehan-Guillaume de Rorthais wrote: I did a review of the documentation and usability. I have incorporated some of your feedback into the v11 patch I just posted. # Applying patch The patch applied on top of f13b2088fa2 without trouble. Notice a small warning during compilation: colenccmds.c:134:27: warning: ‘encval’ may be used uninitialized A simple fix could be: +++ b/src/backend/commands/colenccmds.c @@ -119,2 +119,3 encval = defGetString(encvalEl); + *encval_p = encval; } @@ -132,4 +133,2 *alg_p = alg; - if (encval_p) - *encval_p = encval; } fixed # Documentation * In page "create_column_encryption_key.sgml", both encryption algorithms for a CMK are declared as the default one: + + The encryption algorithm that was used to encrypt the key material of + this column encryption key. Supported algorithms are: + + +RSAES_OAEP_SHA_1 (default) + + +RSAES_OAEP_SHA_256 (default) + + + As far as I understand the code, I suppose RSAES_OAEP_SHA_1 should be the default. fixed I believe two information should be clearly shown to user somewhere in chapter 5.5 instead of being buried deep in documentation: * «COPY does not support column decryption», currently buried in pg_dump page * «When transparent column encryption is enabled, the client encoding must match the server encoding», currently buried in the protocol description page. * In the libpq doc of PQexecPrepared2, "paramForceColumnEncryptions" might deserve a little more detail about the array format, like «0 means "don't enforce" and anything else enforce the encryption is enabled on this column». By the way, maybe this array could be an array of boolean? * In chapter 55.2.5 (protocol-flow) is stated: «when column encryption is used, the plaintext is always in text format (not binary format).». Does it means parameter "resultFormat" in "PQexecPrepared2" should always be 0? If yes, is it worth keeping this argument? Moreover, this format constraint should probably be explained in the libpq page as well. I will keep these suggestions around. Some of these things will probably change again, so I'll make sure to update the documentation when I touch it again. # Protocol * In the ColumnEncryptionKey message, it seems the field holding the length key material is redundant with the message length itself, as all other fields have a known size. The key material length is the message length - (4+4+4+2). For comparison, the AuthenticationSASLContinue message has a variable data size but rely only on the message length without additional field. I find that weird, though. An explicit length seems better. Things like AuthenticationSASLContinue only work if they have exactly one variable-length data item. * I wonder if encryption related fields in ParameterDescription and RowDescription could be optional somehow? The former might be quite large when using a lot of parameters (like, imagine a large and ugly "IN($1...$N)"). On the other hand, these messages are not sent in high frequency anyway... They are only used if you turn on the column_encryption protocol option. Or did you mean make them optional even then? # libpq Would it be possible to have an encryption-ready PQexecParams() equivalent of what PQprepare/describe/exec do? I plan to do that. # psql * You already mark \d in the TODO. But having some psql command to list the known CEK/CMK might be useful as well. right * about write queries using psql, would it be possible to use psql parameters? Eg.: => \set c1 myval => INSERT INTO mytable VALUES (:'c1') \gencr No, because those are resolved by psql before libpq sees them. # Manual tests * The lookup error message is shown twice for some reason: => select * from test_tce; no CMK lookup found for realm "" no CMK lookup found for realm "" It might worth adding the column name and the CMK/CEK names related to each error message? Last, notice the useless empty line between messages. I'll look into that. * When "DROP IF EXISTS" a missing CEK or CMK, the command raise an "unrecognized object type": => DROP COLUMN MASTER KEY IF EXISTS noexists; ERROR: unrecognized object type: 10 => DROP COLUMN ENCRYPTION KEY IF EXISTS noexists; ERROR: unrecognized object type: 8 fixed * I was wondering what "pg_typeof" should return. It currently returns "pg_encrypted_*". If this is supposed to be transparent from the client perspective, shouldn't it re
Re: Transparent column encryption
Hi, Here are a few more things I noticed : If a CEK is encrypted with cmk1 and cmk2, but cmk1 isn't found on the client,the following error is printed twice for the very first SELECT statement: could not open file "/path/to/cmk1.pem": No such file or directory ...and nothing is returned. The next queries in the same session would work correctly (cmk2 is used for the decryption of the CEK). An INSERT statement si handled properly, though : one (and only one) error message, and line actually inserted in all cases). For example : postgres=# SELECT * FROM customers ; could not open file "/path/to/cmk1.pem": No such file or directory could not open file "/path/to/cmk1.pem": No such file or directory postgres=# SELECT * FROM customers ; id | name | creditcard_num +---+- 1 | toto | 546843351354245 2 | babar | 546843351354245 postgres=# INSERT INTO customers (id, name, creditcard_num) VALUES ($1, $2, $3) \gencr '3' 'toto' '546888351354245'; could not open file "/path/to/cmk1.pem": No such file or directory INSERT 0 1 postgres=# SELECT * FROM customers ; id | name | creditcard_num +---+- 1 | toto | 546843351354245 2 | babar | 546843351354245 3 | toto | 546888351354245 From the documentation of CREATE COLUMN MASTER KEY, it looks like the REALM is optional, but both CREATE COLUMN MASTER KEY cmk1; and CREATE COLUMN MASTER KEY cmk1 WITH (); returns a syntax error. About AEAD, the documentation says : > The “associated data” in these algorithms consists of 4 bytes: The ASCII letters P and G (byte values 80 and 71), followed by the algorithm ID as a 16-bit unsigned integer in network byte order. My guess is that it serves no real purpose, did I misunderstand ?
Re: Transparent column encryption
Hi, I did a review of the documentation and usability. # Applying patch The patch applied on top of f13b2088fa2 without trouble. Notice a small warning during compilation: colenccmds.c:134:27: warning: ‘encval’ may be used uninitialized A simple fix could be: +++ b/src/backend/commands/colenccmds.c @@ -119,2 +119,3 encval = defGetString(encvalEl); + *encval_p = encval; } @@ -132,4 +133,2 *alg_p = alg; - if (encval_p) - *encval_p = encval; } # Documentation * In page "create_column_encryption_key.sgml", both encryption algorithms for a CMK are declared as the default one: + + The encryption algorithm that was used to encrypt the key material of + this column encryption key. Supported algorithms are: + + +RSAES_OAEP_SHA_1 (default) + + +RSAES_OAEP_SHA_256 (default) + + + As far as I understand the code, I suppose RSAES_OAEP_SHA_1 should be the default. I believe two information should be clearly shown to user somewhere in chapter 5.5 instead of being buried deep in documentation: * «COPY does not support column decryption», currently buried in pg_dump page * «When transparent column encryption is enabled, the client encoding must match the server encoding», currently buried in the protocol description page. * In the libpq doc of PQexecPrepared2, "paramForceColumnEncryptions" might deserve a little more detail about the array format, like «0 means "don't enforce" and anything else enforce the encryption is enabled on this column». By the way, maybe this array could be an array of boolean? * In chapter 55.2.5 (protocol-flow) is stated: «when column encryption is used, the plaintext is always in text format (not binary format).». Does it means parameter "resultFormat" in "PQexecPrepared2" should always be 0? If yes, is it worth keeping this argument? Moreover, this format constraint should probably be explained in the libpq page as well. # Protocol * In the ColumnEncryptionKey message, it seems the field holding the length key material is redundant with the message length itself, as all other fields have a known size. The key material length is the message length - (4+4+4+2). For comparison, the AuthenticationSASLContinue message has a variable data size but rely only on the message length without additional field. * I wonder if encryption related fields in ParameterDescription and RowDescription could be optional somehow? The former might be quite large when using a lot of parameters (like, imagine a large and ugly "IN($1...$N)"). On the other hand, these messages are not sent in high frequency anyway... # libpq Would it be possible to have an encryption-ready PQexecParams() equivalent of what PQprepare/describe/exec do? # psql * You already mark \d in the TODO. But having some psql command to list the known CEK/CMK might be useful as well. * about write queries using psql, would it be possible to use psql parameters? Eg.: => \set c1 myval => INSERT INTO mytable VALUES (:'c1') \gencr # Manual tests * The lookup error message is shown twice for some reason: => select * from test_tce; no CMK lookup found for realm "" no CMK lookup found for realm "" It might worth adding the column name and the CMK/CEK names related to each error message? Last, notice the useless empty line between messages. * When "DROP IF EXISTS" a missing CEK or CMK, the command raise an "unrecognized object type": => DROP COLUMN MASTER KEY IF EXISTS noexists; ERROR: unrecognized object type: 10 => DROP COLUMN ENCRYPTION KEY IF EXISTS noexists; ERROR: unrecognized object type: 8 * I was wondering what "pg_typeof" should return. It currently returns "pg_encrypted_*". If this is supposed to be transparent from the client perspective, shouldn't it return "attrealtypid" when the field is encrypted? * any reason to not support altering the CMK realm? This patch is really interesting and would be a nice addition to the core. Thanks! Regards,
Re: Transparent column encryption
If memory serves me correctly, if you statically link openssl this will work. If you are using ssl in a DLL, I believe that the DLL has its own "c-library" and its own heap. On Thu, Oct 13, 2022 at 9:43 AM Peter Eisentraut < peter.eisentr...@enterprisedb.com> wrote: > On 06.10.22 17:19, Andres Freund wrote: > >>> psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no > OPENSSL_Applink' > >> What in the world is that about? What scant information I could find > >> suggests that it has something to do with building a "release" build > against > >> an "debug" build of the openssl library, or vice versa. But this patch > >> doesn't introduce any use of openssl that we haven't seen before. > > It looks to me that one needs to compile, in some form, > openssl/applink.c and > > link it to the application. No idea why that'd be required now and not > > earlier. > > I have figured this out. The problem is that on Windows you can't > reliably pass stdio FILE * handles between the application and OpenSSL. > To give the helpful places I found some Google juice, I'll mention them > here: > > - https://github.com/edenhill/librdkafka/pull/3602 > - https://github.com/edenhill/librdkafka/issues/3554 > - https://www.mail-archive.com/openssl-users@openssl.org/msg91029.html > > > >
Re: Transparent column encryption
On 06.10.22 17:19, Andres Freund wrote: psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no OPENSSL_Applink' What in the world is that about? What scant information I could find suggests that it has something to do with building a "release" build against an "debug" build of the openssl library, or vice versa. But this patch doesn't introduce any use of openssl that we haven't seen before. It looks to me that one needs to compile, in some form, openssl/applink.c and link it to the application. No idea why that'd be required now and not earlier. I have figured this out. The problem is that on Windows you can't reliably pass stdio FILE * handles between the application and OpenSSL. To give the helpful places I found some Google juice, I'll mention them here: - https://github.com/edenhill/librdkafka/pull/3602 - https://github.com/edenhill/librdkafka/issues/3554 - https://www.mail-archive.com/openssl-users@openssl.org/msg91029.html
Re: Transparent column encryption
Hi, On 2022-10-06 16:25:51 +0200, Peter Eisentraut wrote: > On 02.10.22 09:16, Andres Freund wrote: > > On 2022-09-27 15:51:25 +0200, Peter Eisentraut wrote: > > > Updated version with meson build system support added (for added files and > > > new tests). > > > > This fails on windows: https://cirrus-ci.com/task/6151847080624128 > > > > https://api.cirrus-ci.com/v1/artifact/task/6151847080624128/testrun/build/testrun/column_encryption/001_column_encryption/log/regress_log_001_column_encryption > > > > psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no > > OPENSSL_Applink' > > What in the world is that about? What scant information I could find > suggests that it has something to do with building a "release" build against > an "debug" build of the openssl library, or vice versa. But this patch > doesn't introduce any use of openssl that we haven't seen before. It looks to me that one needs to compile, in some form, openssl/applink.c and link it to the application. No idea why that'd be required now and not earlier. Greetings, Andres Freund
Re: Transparent column encryption
On 02.10.22 09:16, Andres Freund wrote: On 2022-09-27 15:51:25 +0200, Peter Eisentraut wrote: Updated version with meson build system support added (for added files and new tests). This fails on windows: https://cirrus-ci.com/task/6151847080624128 https://api.cirrus-ci.com/v1/artifact/task/6151847080624128/testrun/build/testrun/column_encryption/001_column_encryption/log/regress_log_001_column_encryption psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no OPENSSL_Applink' What in the world is that about? What scant information I could find suggests that it has something to do with building a "release" build against an "debug" build of the openssl library, or vice versa. But this patch doesn't introduce any use of openssl that we haven't seen before.
Re: Transparent column encryption
Hi, On 2022-09-27 15:51:25 +0200, Peter Eisentraut wrote: > Updated version with meson build system support added (for added files and > new tests). This fails on windows: https://cirrus-ci.com/task/6151847080624128 https://api.cirrus-ci.com/v1/artifact/task/6151847080624128/testrun/build/testrun/column_encryption/001_column_encryption/log/regress_log_001_column_encryption psql error: stderr: 'OPENSSL_Uplink(7FFC165CBD50,08): no OPENSSL_Applink' Greetings, Andres Freund
Re: Transparent column encryption
On Tue, Aug 30, 2022 at 4:53 AM Peter Eisentraut wrote: > I would be interested in learning more about such padding systems. I > have done a lot of reading for this development project, and I have > never come across a cryptographic approach to hide length differences by > padding. Of course, padding to the block cipher's block size is already > part of the process, but that is done out of necessity, not because you > want to disguise the length. Are there any other methods? I'm > interested to learn more. TLS 1.3 has one example. Here is a description from GnuTLS: https://gnutls.org/manual/html_node/On-Record-Padding.html (Note the option to turn on constant-time padding; that may not be a good tradeoff for us if we're focusing on offline attacks.) Here's a recent paper that claims to formally characterize length hiding, but it's behind a wall and I haven't read it: https://dl.acm.org/doi/abs/10.1145/3460120.3484590 I'll try to find more when I get the chance. --Jacob
Re: Transparent column encryption
On 27.07.22 01:19, Jacob Champion wrote: Now, if we don't have a padding system built into the feature, then that does put even more on the user; it's hard to argue with that. Right. If they can even fix it at all. Having a well-documented padding feature would not only help mitigate that, it would conveniently hang a big sign on the caveats that exist. I would be interested in learning more about such padding systems. I have done a lot of reading for this development project, and I have never come across a cryptographic approach to hide length differences by padding. Of course, padding to the block cipher's block size is already part of the process, but that is done out of necessity, not because you want to disguise the length. Are there any other methods? I'm interested to learn more.
Re: Transparent column encryption
On 20.07.22 08:12, Masahiko Sawada wrote: --- Regarding the documentation, I'd like to have a page that describes the generic information of the transparent column encryption for users such as what this feature actually does, what can be achieved by this feature, CMK rotation, and its known limitations. The patch has "Transparent Column Encryption" section in protocol.sgml but it seems to be more internal information. I have added more documentation in the v6 patch. --- In datatype.sgml, it says "Thus, clients that don't support transparent column encryption or have disabled it will see the encrypted values as byte arrays." but I got an error rather than encrypted values when I tried to connect to the server using by clients that don't support the encryption: postgres(1:6040)=# select * from tbl; no CMK lookup found for realm "" This has now been improved in v6. The protocol changes need to be activated explicitly at connection time, so if you use a client that doesn't support it or activates it, you get the described behavior. --- In single-user mode, the user cannot decrypt the encrypted value but probably it's fine in practice. Yes, there is nothing really to do about that. --- Regarding the column master key rotation, would it be useful if we provide a tool for that? For example, it takes old and new CMK as input, re-encrypt all CEKs realted to the CMK, and registers them to the server. I imagine users using a variety of key management systems, so I don't see how a single tool would work. But it's something we can think about in the future. --- Is there any convenient way to load a large amount of test data to the encrypted columns? I tried to use generate_series() but it seems not to work as it generates the data on the server side: No, that doesn't work, by design. You'd have to write a client program to generate the data. I've also tried to load the data from a file on the client by using \copy command, but it seems not to work: postgres(1:80556)=# copy (select generate_series(1, 1000)::text) to '/tmp/tmp.dat'; COPY 1000 postgres(1:80556)=# \copy a from '/tmp/tmp.dat' COPY 1000 postgres(1:80556)=# select * from a; out out memory This was a bug that I have fixed. --- I got SEGV in the following two situations: I have fixed these.
Re: Transparent column encryption
On 7/26/22 13:25, Robert Haas wrote: > I certainly have no objection to being clear about such details in the > documentation. Cool. > I fear the phenomenon where > doing anything about a problem makes you responsible for the whole > problem. If we disclaim the ability to hide the length of values, > that's clear enough. I don't think disclaiming responsibility absolves you of it here, in part because choices are being made (text format) that make length hiding even more important than it otherwise would be. A user who already knows that encryption doesn't hide length might still reasonably expect a fixed-length column type like bigint to be protected in all cases. It won't be (at least not with your 16-byte example). And sure, you can document that caveat too, but said user might then reasonably wonder how they're supposed to actually make it safe. > But if we start padding to try to hide the length > of values, then people might expect it to work in all cases, and I > don't see how it ever can. Well, that's where I agree with you on the value of solid documentation. But there are other things we can do as well. In general we should - choose a default that will protect most people out of the box, - document the heck out of the default's limitations, - provide guardrails that warn the user when they're outgrowing those limitations, and - give people a way to tune it to their own use cases. As an example, a naive guardrail in this instance could be to simply have the client refuse to encrypt data past the padding maximum, if you've gone so far as to set one up. It'd suck to hit that maximum in production and have to rewrite the column, but you did want your encryption to hide your data, right? Maybe that's way too complex to think about for a v1, but it'll be easier to maintain this into the future if there's at least a plan to create a v2. If you declare it out of scope, instead of considering it a potential TODO, then I think it'll be a lot harder for people to improve it. > Moreover, I think that the padding might > need to be done in a "cryptographically intelligent" way rather than > just, say, adding trailing blanks. Possibly. I think that's where AEAD comes in -- if you've authenticated your ciphertext sufficiently, padding oracles should be prohibitively difficult(?). (But see below; I think we also have other things to worry about in terms of authentication and oracles.) > Now that being said, if Peter wants > to implement something around padding that he has reason to believe > will not create cryptographic weaknesses, I have no issue with that. I > just don't view it as an essential part of the feature, because hiding > such things doesn't seem like it can ever be the main point of a > feature like this. I think that side channel consideration has to be an essential part of any cryptography feature. Recent history has shown "obscure" side channels gaining power to the point of completely breaking crypto schemes. And it's not like TLS where we have to protect an infinite stream of arbitrary bytes; this is going to be used on small values that probably get repeated often and have (comparatively) very little entropy. Cryptanalysis based on length seems to me like part and parcel of the problem space. > I guess my view on this is that, if you're trying to hide something > like a credit card number, most likely every value in the system is > the same length, and then this is a non-issue. Agreed. > On the other hand, if > the secret column is a person's name, then there is an issue, but > you're not going to pad every value out the maximum length of a > varlena, so you have to make an estimate of how long a name someone > might reasonably have to decide how much padding to include. You also > have to decide whether the storage cost of padding every value is > worth it to you given the potential information leakage. Only the > human user can make those decisions, so some amount of "putting that > on the user" feels inevitable. Agreed. > Now, if we don't have a padding system > built into the feature, then that does put even more on the user; it's > hard to argue with that. Right. If they can even fix it at all. Having a well-documented padding feature would not only help mitigate that, it would conveniently hang a big sign on the caveats that exist. -- Speaking of oracles and side channels. Users may want to use associated data to further lock an encrypted value to its column type, too. Otherwise it seems like an active DBA could feed an encrypted text blob to a client in place of, say, an int column, to see whether or not that text blob is a number. Seems like AD is going to be important to prevent active attacks in general. --Jacob
Re: Transparent column encryption
On Tue, Jul 26, 2022 at 2:27 PM Jacob Champion wrote: > Right. My point is, if you have a column that has exactly one > important value that is 17 bytes long when converted to text, you're > going to want to know that block size exactly, because the encryption > will be effectively useless for that value. That size needs to be > documented, and it'd be helpful to know that it's longer than, say, > the longest text representation of our fixed-length column types. I certainly have no objection to being clear about such details in the documentation. > If the goal is to provide real encryption, and not just a toy, I think > you're going to need to put a *lot* of effort into analysis. Even if > the result of the analysis is "we don't plan to address this in v1". > > Crypto is inherently a cycle of > make-it-and-break-it-and-fix-it-and-break-it-again. If that's > considered a "wild goose chase" and not seriously pursued at some > level, then this implementation will probably not last long in the > face of real abuse. (That doesn't mean you have to take my advice; I'm > just a dude with opinions -- but you will need to have real > cryptographers look at this, and you're going to need to think about > how the system will evolve when it's broken.) Well, I'm just a dude with opinions, too. I fear the phenomenon where doing anything about a problem makes you responsible for the whole problem. If we disclaim the ability to hide the length of values, that's clear enough. But if we start padding to try to hide the length of values, then people might expect it to work in all cases, and I don't see how it ever can. Moreover, I think that the padding might need to be done in a "cryptographically intelligent" way rather than just, say, adding trailing blanks. Now that being said, if Peter wants to implement something around padding that he has reason to believe will not create cryptographic weaknesses, I have no issue with that. I just don't view it as an essential part of the feature, because hiding such things doesn't seem like it can ever be the main point of a feature like this. > > Padding values to try to further obscure > > things may be situationally useful, but if you find yourself worrying > > too much about that sort of thing, you likely should have picked > > stronger medicine initially. > > In my experience, this entire field is the application of > situationally useful protection. That's one of the reasons it's hard, > and designing this sort of patch is going to be hard too. Agreed. > Putting that > on the user isn't quite fair when you're the ones designing the > system; you determine what they have to worry about when you choose > the crypto. I guess my view on this is that, if you're trying to hide something like a credit card number, most likely every value in the system is the same length, and then this is a non-issue. On the other hand, if the secret column is a person's name, then there is an issue, but you're not going to pad every value out the maximum length of a varlena, so you have to make an estimate of how long a name someone might reasonably have to decide how much padding to include. You also have to decide whether the storage cost of padding every value is worth it to you given the potential information leakage. Only the human user can make those decisions, so some amount of "putting that on the user" feels inevitable. Now, if we don't have a padding system built into the feature, then that does put even more on the user; it's hard to argue with that. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
On Tue, Jul 26, 2022 at 10:52 AM Robert Haas wrote: > On Thu, Jul 21, 2022 at 2:30 PM Jacob Champion > wrote: > > A minimum padding option would fix the leak here, right? If every > > entry is the same length then there's no information to be gained, at > > least in an offline analysis. > > Sure, but padding every text column that you have, even the ones > containing only 'a', out to the length of Don Quixote in the original > Spanish, is unlikely to be an appealing option. If you are honestly trying to conceal Don Quixote, I suspect you are already in the business of making unappealing decisions. I don't think that's necessarily an argument against hiding the length for real-world use cases. > > I think some work around that is probably going to be needed for > > serious use of this encryption, in part because of the use of text > > format as the canonical input. If the encrypted values of 1, 10, 100, > > and 1000 hypothetically leaked their exact lengths, then an encrypted > > int wouldn't be very useful. So I'd want to quantify (and possibly > > configure) exactly how much data you can encrypt in a single message > > before the length starts being leaked, and then make sure that my > > encrypted values stay inside that bound. > > I think most ciphers these days are block ciphers, so you're going to > get output that is a multiple of the block size anyway - e.g. I think > for AES it's 128 bits = 16 bytes. So small differences in length will > be concealed naturally, which may be good enough for some use cases. Right. My point is, if you have a column that has exactly one important value that is 17 bytes long when converted to text, you're going to want to know that block size exactly, because the encryption will be effectively useless for that value. That size needs to be documented, and it'd be helpful to know that it's longer than, say, the longest text representation of our fixed-length column types. > I'm not really convinced that it's worth putting a lot of effort into > bolstering the security of this kind of tech above what it naturally > gives. I think it's likely to be a wild goose chase. If the goal is to provide real encryption, and not just a toy, I think you're going to need to put a *lot* of effort into analysis. Even if the result of the analysis is "we don't plan to address this in v1". Crypto is inherently a cycle of make-it-and-break-it-and-fix-it-and-break-it-again. If that's considered a "wild goose chase" and not seriously pursued at some level, then this implementation will probably not last long in the face of real abuse. (That doesn't mean you have to take my advice; I'm just a dude with opinions -- but you will need to have real cryptographers look at this, and you're going to need to think about how the system will evolve when it's broken.) > If you have major > worries about someone reading your disk in its entirety, use full-disk > encryption. This patchset was designed to protect against the evil DBA case, I think. Full disk encryption doesn't help. > Selective encryption is only suitable when you want to add > a modest level of protection for individual value and are willing to > accept that some information leakage is likely if an adversary can in > fact read the full disk. ...but there's a known countermeasure to this particular leakage, right? Which would make it more suitable for that case. > Padding values to try to further obscure > things may be situationally useful, but if you find yourself worrying > too much about that sort of thing, you likely should have picked > stronger medicine initially. In my experience, this entire field is the application of situationally useful protection. That's one of the reasons it's hard, and designing this sort of patch is going to be hard too. Putting that on the user isn't quite fair when you're the ones designing the system; you determine what they have to worry about when you choose the crypto. --Jacob
Re: Transparent column encryption
On Thu, Jul 21, 2022 at 2:30 PM Jacob Champion wrote: > On Mon, Jul 18, 2022 at 9:07 AM Robert Haas wrote: > > Even there, what can be accomplished with a feature that only encrypts > > individual column values is by nature somewhat limited. If you have a > > text column that, for one row, stores the value 'a', and for some > > other row, stores the entire text of Don Quixote in the original > > Spanish, it is going to be really difficult to keep an adversary who > > can read from the disk from distinguishing those rows. If you want to > > fix that, you're going to need to do block-level encryption or > > something of that sort. > > A minimum padding option would fix the leak here, right? If every > entry is the same length then there's no information to be gained, at > least in an offline analysis. Sure, but padding every text column that you have, even the ones containing only 'a', out to the length of Don Quixote in the original Spanish, is unlikely to be an appealing option. > I think some work around that is probably going to be needed for > serious use of this encryption, in part because of the use of text > format as the canonical input. If the encrypted values of 1, 10, 100, > and 1000 hypothetically leaked their exact lengths, then an encrypted > int wouldn't be very useful. So I'd want to quantify (and possibly > configure) exactly how much data you can encrypt in a single message > before the length starts being leaked, and then make sure that my > encrypted values stay inside that bound. I think most ciphers these days are block ciphers, so you're going to get output that is a multiple of the block size anyway - e.g. I think for AES it's 128 bits = 16 bytes. So small differences in length will be concealed naturally, which may be good enough for some use cases. I'm not really convinced that it's worth putting a lot of effort into bolstering the security of this kind of tech above what it naturally gives. I think it's likely to be a wild goose chase. If you have major worries about someone reading your disk in its entirety, use full-disk encryption. Selective encryption is only suitable when you want to add a modest level of protection for individual value and are willing to accept that some information leakage is likely if an adversary can in fact read the full disk. Padding values to try to further obscure things may be situationally useful, but if you find yourself worrying too much about that sort of thing, you likely should have picked stronger medicine initially. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
On Mon, Jul 18, 2022 at 12:53:23PM +0200, Peter Eisentraut wrote: > Asymmetric keys gives you some more options for how you set up the keys at > the beginning. For example, you create the asymmetric key pair on the host > where your client program that wants access to the encrypted data will run. > You put the private key in an appropriate location for run time. You send > the public key to another host. On that other host, you create the CEK, > encrypt it with the CMK, and then upload it into the server (CREATE COLUMN > ENCRYPTION KEY). Then you can wipe that second host. That way, you can be > even more sure that the unencrypted CEK isn't left anywhere. I'm not sure > whether this method is very useful in practice, but it's interesting. > > In any case, as I mentioned above, this particular aspect is up for > discussion. I caution against adding complexity without a good reason, because historically complexity often leads to exploits and bugs, especially with crypto. -- Bruce Momjian https://momjian.us EDB https://enterprisedb.com Indecision is a decision. Inaction is an action. Mark Batterson
Re: Transparent column encryption
On Mon, Jul 18, 2022 at 9:07 AM Robert Haas wrote: > Even there, what can be accomplished with a feature that only encrypts > individual column values is by nature somewhat limited. If you have a > text column that, for one row, stores the value 'a', and for some > other row, stores the entire text of Don Quixote in the original > Spanish, it is going to be really difficult to keep an adversary who > can read from the disk from distinguishing those rows. If you want to > fix that, you're going to need to do block-level encryption or > something of that sort. A minimum padding option would fix the leak here, right? If every entry is the same length then there's no information to be gained, at least in an offline analysis. I think some work around that is probably going to be needed for serious use of this encryption, in part because of the use of text format as the canonical input. If the encrypted values of 1, 10, 100, and 1000 hypothetically leaked their exact lengths, then an encrypted int wouldn't be very useful. So I'd want to quantify (and possibly configure) exactly how much data you can encrypt in a single message before the length starts being leaked, and then make sure that my encrypted values stay inside that bound. --Jacob
Re: Transparent column encryption
On Mon, Jul 18, 2022 at 3:53 AM Peter Eisentraut wrote: > Some other products make use of secure enclaves to do computations on > (otherwise) encrypted values on the server. I don't fully know how that > works, but I suspect that asymmetric keys can play a role in that. (I > don't have any immediate plans for that in my patch. It seems to be a > dying technology at the moment.) > > Asymmetric keys gives you some more options for how you set up the keys > at the beginning. For example, you create the asymmetric key pair on > the host where your client program that wants access to the encrypted > data will run. You put the private key in an appropriate location for > run time. You send the public key to another host. On that other host, > you create the CEK, encrypt it with the CMK, and then upload it into the > server (CREATE COLUMN ENCRYPTION KEY). Then you can wipe that second > host. That way, you can be even more sure that the unencrypted CEK > isn't left anywhere. I'm not sure whether this method is very useful in > practice, but it's interesting. As long as it's clear to people trying this that the "public" key cannot actually be made public, I suppose. That needs to be documented IMO. I like your idea of providing a symmetric option as well. > In any case, as I mentioned above, this particular aspect is up for > discussion. > > Also note that if you use a KMS (cmklookup "run" method), the actual > algorithm doesn't even matter (depending on details of the KMS setup), > since you just tell the KMS "decrypt this", and the KMS knows by itself > what algorithm to use. Maybe there should be a way to specify "unknown" > in the ckdcmkalg field. +1, an officially client-defined method would probably be useful. > The short answer is, these same algorithms are used in equivalent > products (see MS SQL Server, MongoDB). They even reference the same > exact draft document. > > Besides that, here is my analysis for why these are good choices: You > can't use any of the counter modes, because since the encryption happens > on the client, there is no way to coordinate to avoid nonce reuse. So > among mainstream modes, you are basically left with AES-CBC with a > random IV. In that case, even if you happen to reuse an IV, the > possible damage is very contained.) I think both AES-GCM-SIV and XChaCha20-Poly1305 are designed to handle the nonce problem as well. In any case, if I were deploying this, I'd want to know the characteristics/limits of our chosen suites (e.g. how much data can be encrypted per key) so that I could plan rotations correctly. Something like the table in [1]? > > Since we're requiring "canonical" use of text format, and the docs say > > there are no embedded or trailing nulls allowed in text values, could we > > steal the use of a single zero byte to mean NULL? One additional > > complication would be that the client would have to double-check that > > we're not writing a NULL into a NOT NULL column, and complain if it > > reads one during decryption. Another complication would be that the > > client would need to complain if it got a plaintext NULL. > > You're already alluding to some of the complications. Also consider > that null values could arise from, say, outer joins. So you could be in > a situation where encrypted and unencrypted null values coexist. (I realize I'm about to wade into the pool of what NULL means in SQL, the subject of which I've stayed mostly, gleefully, ignorant.) To be honest that sounds pretty useful. Any unencrypted null must have come from the server computation; it's a clear claim by the server that no such rows exist. (If the encrypted column is itself NOT NULL then there's no ambiguity to begin with, I think.) That wouldn't be transparent behavior anymore, so it may (understandably) be a non-goal for the patch, but it really does sound useful. And it might be a little extreme, but if I as a user decided that I wanted in-band encrypted null, it wouldn't be particularly surprising to me if such a column couldn't be included in an outer join. Just like I can't join on a randomized encrypted column, or add two encrypted NUMERICs to each other. In fact I might even want the server to enforce NOT NULL transparently on the underlying pg_encrypted_* column, to make sure that I didn't accidentally push an unencrypted NULL by mistake. > And of > course the server doesn't know about the encrypted null values. So how > do you maintain semantics, like for aggregate functions, primary keys, > anything that treats null values specially? Could you elaborate? Any special cases seem like they'd be important to document regardless of whether or not we support in-band null encryption. For example, do you plan to support encrypted primary keys, null or not? That seems like it'd be particularly difficult during CEK rotation. > How do clients deal with a > mix of encrypted and unencrypted null values, how do they know which one > is real. That one seems s
Re: Transparent column encryption
On Tue, Jul 19, 2022 at 10:52 PM Peter Eisentraut wrote: > > On 12.07.22 20:29, Peter Eisentraut wrote: > > Updated patch, to resolve some merge conflicts. > > Rebased patch, no new functionality Thank you for working on this and updating the patch! I've mainly looked at the documentation and tests and done some tests. Before looking at the code in depth, I'd like to share my comments/questions. --- Regarding the documentation, I'd like to have a page that describes the generic information of the transparent column encryption for users such as what this feature actually does, what can be achieved by this feature, CMK rotation, and its known limitations. The patch has "Transparent Column Encryption" section in protocol.sgml but it seems to be more internal information. --- In datatype.sgml, it says "Thus, clients that don't support transparent column encryption or have disabled it will see the encrypted values as byte arrays." but I got an error rather than encrypted values when I tried to connect to the server using by clients that don't support the encryption: postgres(1:6040)=# select * from tbl; no CMK lookup found for realm "" no CMK lookup found for realm "" postgres(1:6040)=# --- In single-user mode, the user cannot decrypt the encrypted value but probably it's fine in practice. --- Regarding the column master key rotation, would it be useful if we provide a tool for that? For example, it takes old and new CMK as input, re-encrypt all CEKs realted to the CMK, and registers them to the server. --- Is there any convenient way to load a large amount of test data to the encrypted columns? I tried to use generate_series() but it seems not to work as it generates the data on the server side: postgres(1:80556)=# create table a (i text encrypted with (column_encryption_key = cek1)); CREATE TABLE postgres(1:80556)=# insert into a select i::text from generate_series(1, 1000) i; 2022-07-20 15:06:38.502 JST [80556] ERROR: column "i" is of type pg_encrypted_rnd but expression is of type text at character 22 I've also tried to load the data from a file on the client by using \copy command, but it seems not to work: postgres(1:80556)=# copy (select generate_series(1, 1000)::text) to '/tmp/tmp.dat'; COPY 1000 postgres(1:80556)=# \copy a from '/tmp/tmp.dat' COPY 1000 postgres(1:80556)=# select * from a; out out memory --- I got SEGV in the following two situations: (1) SEGV by backend postgres(1:59931)=# create table tbl (i int encrypted with (column_encryption_key = cek1)); CREATE TABLE postgres(1:59931)=# insert into tbl values ($1) \gencr 1 INSERT 0 1 postgres(1:59931)=# select * from tbl; server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The backtrace is: (lldb) bt * thread #1, stop reason = signal SIGSTOP * frame #0: 0x000106830a30 postgres`pg_detoast_datum_packed(datum=0xab32c563) at fmgr.c:1742:6 frame #1: 0x0001067d9dbf postgres`byteaout(fcinfo=0x7ffee9c311a8) at varlena.c:392:20 frame #2: 0x00010682ed0c postgres`FunctionCall1Coll(flinfo=0x7faeb28193a8, collation=0, arg1=18446744072286815587) at fmgr.c:1124:11 frame #3: 0x000106830611 postgres`OutputFunctionCall(flinfo=0x7faeb28193a8, val=18446744072286815587) at fmgr.c:1561:9 frame #4: 0x000105fed702 postgres`printtup(slot=0x7faeb2818960, self=0x7faeb3809390) at printtup.c:519:16 frame #5: 0x000106319318 postgres`ExecutePlan(estate=0x7faeb2818520, planstate=0x7faeb2818758, use_parallel_mode=false, operation=CMD_SELECT, sendTuples=true, numberTuples=0, direction=ForwardScanDirection, dest=0x7faeb3809390, execute_once=true) at execMain.c:1667:9 frame #6: 0x000106319180 postgres`standard_ExecutorRun(queryDesc=0x7faeb280d920, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:363:3 frame #7: 0x000106318f11 postgres`ExecutorRun(queryDesc=0x7faeb280d920, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:307:3 frame #8: 0x00010661139c postgres`PortalRunSelect(portal=0x7faeb5028920, forward=true, count=0, dest=0x7faeb3809390) at pquery.c:924:4 frame #9: 0x000106610d3f postgres`PortalRun(portal=0x7faeb5028920, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x7faeb3809390, altdest=0x7faeb3809390, qc=0x7ffee9c31620) at pquery.c:768:18 frame #10: 0x00010660bb93 postgres`exec_simple_query(query_string="select * from tbl;") at postgres.c:1246:10 frame #11: 0x00010660ac2f postgres`PostgresMain(dbname="postgres", username="masahiko") at postgres.c:4534:7 frame #12: 0x00010650d9c6 postgres`BackendRun(port=0x7faeb3004210) at postmaster.c:4490:2 frame #13: 0x00010650cf8a postgres`BackendStartup(port=0x7faeb3004210) at postmaster.c:4218:3 frame #14: 0x00010650bd57 postgres`ServerLoop at postmaster.c:1808
Re: Transparent column encryption
On Mon, Jul 18, 2022 at 6:53 AM Peter Eisentraut wrote: > I think a way to look at this is that this column encryption feature > isn't suitable for disguising the existence or absence of data, it can > only disguise the particular data that you know exists. +1. Even there, what can be accomplished with a feature that only encrypts individual column values is by nature somewhat limited. If you have a text column that, for one row, stores the value 'a', and for some other row, stores the entire text of Don Quixote in the original Spanish, it is going to be really difficult to keep an adversary who can read from the disk from distinguishing those rows. If you want to fix that, you're going to need to do block-level encryption or something of that sort. And even then, you still have another version of the problem, because now imagine you have one *table* that contains nothing but the value 'a' and another that contains nothing but the entire text of Don Quixote, it is going to be possible for an adversary to tell those tables apart, even with the corresponding files encrypted in their entirety. But I don't think that this means that a feature like this has no value. I think it just means that we need to clearly document how the feature works and not over-promise. -- Robert Haas EDB: http://www.enterprisedb.com
Re: Transparent column encryption
On 15.07.22 19:47, Jacob Champion wrote: The CEK key material is in turn encrypted by an assymmetric key called the column master key (CMK). I'm not yet understanding why the CMK is asymmetric. I'm not totally sure either. I started to build it that way because other systems were doing it that way, too. But I have been thinking about adding a symmetric alternative for the CMKs as well (probably AESKW). I think there are a couple of reasons why asymmetric keys are possibly useful for CMKs: Some other products make use of secure enclaves to do computations on (otherwise) encrypted values on the server. I don't fully know how that works, but I suspect that asymmetric keys can play a role in that. (I don't have any immediate plans for that in my patch. It seems to be a dying technology at the moment.) Asymmetric keys gives you some more options for how you set up the keys at the beginning. For example, you create the asymmetric key pair on the host where your client program that wants access to the encrypted data will run. You put the private key in an appropriate location for run time. You send the public key to another host. On that other host, you create the CEK, encrypt it with the CMK, and then upload it into the server (CREATE COLUMN ENCRYPTION KEY). Then you can wipe that second host. That way, you can be even more sure that the unencrypted CEK isn't left anywhere. I'm not sure whether this method is very useful in practice, but it's interesting. In any case, as I mentioned above, this particular aspect is up for discussion. Also note that if you use a KMS (cmklookup "run" method), the actual algorithm doesn't even matter (depending on details of the KMS setup), since you just tell the KMS "decrypt this", and the KMS knows by itself what algorithm to use. Maybe there should be a way to specify "unknown" in the ckdcmkalg field. +#define PG_CEK_AEAD_AES_128_CBC_HMAC_SHA_256 130 +#define PG_CEK_AEAD_AES_192_CBC_HMAC_SHA_384 131 +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_384 132 +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_512 133 It looks like these ciphersuites were abandoned by the IETF. Are there existing implementations of them that have been audited/analyzed? Are they safe (and do we know that the claims made in the draft are correct)? How do they compare to other constructions like AES-GCM-SIV and XChacha20-Poly1305? The short answer is, these same algorithms are used in equivalent products (see MS SQL Server, MongoDB). They even reference the same exact draft document. Besides that, here is my analysis for why these are good choices: You can't use any of the counter modes, because since the encryption happens on the client, there is no way to coordinate to avoid nonce reuse. So among mainstream modes, you are basically left with AES-CBC with a random IV. In that case, even if you happen to reuse an IV, the possible damage is very contained. And then, if you want to use AEAD, you combine that with some MAC, and HMAC is just as good as any for that. The referenced draft document doesn't really contain any additional cryptographic insights, it's just a guide on a particular way to put these two together. So altogether I think this is a pretty solid choice. +-- \gencr +-- (This just tests the parameter passing; there is no encryption here.) +CREATE TABLE test_gencr (a int, b text); +INSERT INTO test_gencr VALUES (1, 'one') \gencr +SELECT * FROM test_gencr WHERE a = 1 \gencr + a | b +---+- + 1 | one +(1 row) + +INSERT INTO test_gencr VALUES ($1, $2) \gencr 2 'two' +SELECT * FROM test_gencr WHERE a IN ($1, $2) \gencr 2 3 + a | b +---+- + 2 | two +(1 row) I'd expect \gencr to error out without sending plaintext. I know that under the hood this is just setting up a prepared statement, but if I'm using \gencr, presumably I really do want to be encrypting my data. Would it be a problem to always set force-column-encryption for the parameters we're given here? Any unencrypted columns could be provided directly. Yeah, this needs a bit of refinement. You don't want something named "encr" but it only encrypts some of the time. We could possibly do what you suggest and make it set the force-encryption flag, or maybe rename it or add another command that just uses prepared statements and doesn't promise anything about encryption from its name. This also ties in with how pg_dump will eventually work. I think by default pg_dump will just dump things encrypted and set it up so that COPY writes it back encrypted. But there should probably be a mode that dumps out plaintext and then uses one of these commands to load the plaintext back in. What these psql commands need to do also depends on what pg_dump needs them to do. + + Null values are not encrypted by transparent column encryption; null values + sent by the client are visible as null values in the database. If the fact + that
Re: Transparent column encryption
On 7/12/22 11:29, Peter Eisentraut wrote: > > Updated patch, to resolve some merge conflicts. Thank you for working on this; it's an exciting feature. > The CEK key > material is in turn encrypted by an assymmetric key called the column > master key (CMK). I'm not yet understanding why the CMK is asymmetric. Maybe you could use the public key to add ephemeral, single-use encryption keys that no one but the private key holder could use (after you forget them on your side, that is). But since the entire column is encrypted with a single CEK, you would essentially only be able to do that if you created an entirely new column or table; do I have that right? I'm used to public keys being safe for... publication, but if I'm understanding correctly, it's important that the server admin doesn't get hold of the public key for your CMK, because then they could substitute their own CEKs transparently and undermine future encrypted writes. That seems surprising. Am I just missing something important about RSAES-OAEP? > +#define PG_CEK_AEAD_AES_128_CBC_HMAC_SHA_256 130 > +#define PG_CEK_AEAD_AES_192_CBC_HMAC_SHA_384 131 > +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_384 132 > +#define PG_CEK_AEAD_AES_256_CBC_HMAC_SHA_512 133 It looks like these ciphersuites were abandoned by the IETF. Are there existing implementations of them that have been audited/analyzed? Are they safe (and do we know that the claims made in the draft are correct)? How do they compare to other constructions like AES-GCM-SIV and XChacha20-Poly1305? > +-- \gencr > +-- (This just tests the parameter passing; there is no encryption here.) > +CREATE TABLE test_gencr (a int, b text); > +INSERT INTO test_gencr VALUES (1, 'one') \gencr > +SELECT * FROM test_gencr WHERE a = 1 \gencr > + a | b > +---+- > + 1 | one > +(1 row) > + > +INSERT INTO test_gencr VALUES ($1, $2) \gencr 2 'two' > +SELECT * FROM test_gencr WHERE a IN ($1, $2) \gencr 2 3 > + a | b > +---+- > + 2 | two > +(1 row) I'd expect \gencr to error out without sending plaintext. I know that under the hood this is just setting up a prepared statement, but if I'm using \gencr, presumably I really do want to be encrypting my data. Would it be a problem to always set force-column-encryption for the parameters we're given here? Any unencrypted columns could be provided directly. Another idle thought I had was that it'd be nice to have some syntax for providing a null value to \gencr (assuming I didn't overlook it in the patch). But that brings me to... > + > + Null values are not encrypted by transparent column encryption; null > values > + sent by the client are visible as null values in the database. If the > fact > + that a value is null needs to be hidden from the server, this information > + needs to be encoded into a nonnull value in the client somehow. > + This is a major gap, IMO. Especially with the switch to authenticated ciphers, because it means you can't sign your NULL values. And having each client or user that's out there solve this with a magic in-band value seems like a recipe for pain. Since we're requiring "canonical" use of text format, and the docs say there are no embedded or trailing nulls allowed in text values, could we steal the use of a single zero byte to mean NULL? One additional complication would be that the client would have to double-check that we're not writing a NULL into a NOT NULL column, and complain if it reads one during decryption. Another complication would be that the client would need to complain if it got a plaintext NULL. (The need for robust client-side validation of encrypted columns might be something to expand on in the docs more generally, since before this feature, it could probably be assumed that the server was buggy if it sent you unparsable junk in a column.) > + > +The associated data in these algorithms consists of 4 > +bytes: The ASCII letters P and G > +(byte values 80 and 71), followed by the algorithm ID as a 16-bit > unsigned > +integer in network byte order. > + Is this AD intended as a placeholder for the future, or does it serve a particular purpose? Thanks, --Jacob
Re: Transparent column encryption
On 17.12.21 01:41, Jacob Champion wrote: (And I think the client should be able to enforce encryption in the first place, before I distract you too much with other stuff.) Yes, this is a useful point that I have added to my notes.
Re: Transparent column encryption
On Thu, 2021-12-09 at 11:04 +0100, Tomas Vondra wrote: > On 12/9/21 01:12, Jacob Champion wrote: > > > > The rabbit hole I led you down is one where we use the rest of the row > > as AD, to try to freeze pieces of it in place. That might(?) have some > > useful security properties (if the client defines its use and doesn't > > defer to the server). But it's not what I intended to propose and I'd > > have to think about that case some more. So after thinking about it some more, in the case where the client is relying on the server to return both the encrypted data and its associated data -- and you don't trust the server -- then tying even the entire row together doesn't help you. I was briefly led astray by the idea that you could include a unique or primary key column in the associated data, and then SELECT based on that column -- but a motivated DBA could simply corrupt state so that the row they wanted got returned regardless of the query. So the client still has to have prior knowledge. > > In my credit card example, I'm imagining something like (forgive the > > contrived syntax): > > > > SELECT address, :{aead(users.credit_card, 'u...@example.com')} > > FROM users WHERE email = 'u...@example.com'; > > > > UPDATE users > >SET :{aead(users.credit_card, 'u...@example.com')} = '1234-...' > > WHERE email = 'u...@example.com'; > > > > The client explicitly links a table's column to its AD for the duration > > of the query. This approach can't scale to > > > > SELECT credit_card FROM users; > > > > because in this case the AD for each row is different, but I'd argue > > that's ideal for this particular case. The client doesn't need to (and > > probably shouldn't) grab everyone's credit card details all at once, so > > there's no reason to optimize for it. > > Maybe, but it seems like a rather annoying limitation, as it restricts > the client to single-row queries (or at least it looks like that to me). > Yes, it may be fine for some use cases, but I'd bet a DBA who can modify > data can do plenty other things - swapping "old" values, which will have > the right AD, for example. Resurrecting old data doesn't help the DBA read the values, right? I view that as similar to the "increasing account balance" problem, in that it's definitely a problem but not one we're trying to tackle here. (And I'm not familiar with any solutions for resurrections -- other than having data expire and tying the timestamp into the authentication, which I think again requires AD. Revoking signed data is one of those hard problems. Do you know a better way?) > OK. In any case, I think we shouldn't require this capability from the > get go - it's fine to get the simple version done first, which gives us > privacy / protects against passive attacker. And then sometime in the > future improve this further. Agreed. (And I think the client should be able to enforce encryption in the first place, before I distract you too much with other stuff.) --Jacob
Re: Transparent column encryption
On 16.12.21 05:47, Greg Stark wrote: In the server, the encrypted datums are stored in types called encryptedr and encryptedd (for randomized and deterministic encryption). These are essentially cousins of bytea. Does that mean someone could go in with psql and select out the data without any keys and just get a raw bytea-like representation? That seems like a natural and useful thing to be able to do. For example to allow dumping a table and loading it elsewhere and transferring keys through some other channel (perhaps only as needed). Yes to all of that.
Re: Transparent column encryption
> In the server, the encrypted datums are stored in types called > encryptedr and encryptedd (for randomized and deterministic > encryption). These are essentially cousins of bytea. Does that mean someone could go in with psql and select out the data without any keys and just get a raw bytea-like representation? That seems like a natural and useful thing to be able to do. For example to allow dumping a table and loading it elsewhere and transferring keys through some other channel (perhaps only as needed).
Re: Transparent column encryption
On 12/9/21 01:12, Jacob Champion wrote: > On Wed, 2021-12-08 at 02:58 +0100, Tomas Vondra wrote: >> >> On 12/8/21 00:26, Jacob Champion wrote: >>> On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote: IMO it's impossible to solve this attack within TCE, because it requires ensuring consistency at the row level, but TCE obviously works at column level only. >>> >>> I was under the impression that clients already had to be modified to >>> figure out how to encrypt the data? If part of that process ends up >>> including enforcement of encryption for a specific column set, then the >>> addition of AEAD data could hypothetically be part of that hand- >>> waviness. >> >> I think "transparency" here means the client just uses the regular >> prepared-statement API without having to explicitly encrypt/decrypt any >> data. The problem is we can't easily tie this to other columns in the >> table, because the client may not even know what values are in those >> columns. > > The way I originally described my request -- "I'd like to be able to > tie an encrypted value to other column (or external) data" -- was not > very clear. > > With my proposed model -- where the DBA (and the server) are completely > untrusted, and the DBA needs to be prevented from using the encrypted > value -- I don't think there's a useful way for the client to use > associated data that comes from the server. The client has to know what > the AD should be beforehand, because otherwise the DBA can make it so > the server returns whatever is correct. > True. With untrusted server the additional data would have to come from some other source. Say, an isolated auth system or so. >> Imagine you do this >> >> UPDATE t SET encrypted_column = $1 WHERE another_column = $2; >> >> but you want to ensure the encrypted value belongs to a particular row >> (which may or may not be identified by the another_column value). How >> would the client do that? Should it fetch the value or what? >> >> Similarly, what if the client just does >> >> SELECT encrypted_column FROM t; >> >> How would it verify the values belong to the row, without having all the >> data for the row (or just the required columns)? > > So with my (hopefully more clear) model above, it wouldn't. The client > would already have the AD, and somehow tell libpq what that data was > for the query. > > The rabbit hole I led you down is one where we use the rest of the row > as AD, to try to freeze pieces of it in place. That might(?) have some > useful security properties (if the client defines its use and doesn't > defer to the server). But it's not what I intended to propose and I'd > have to think about that case some more. > OK > In my credit card example, I'm imagining something like (forgive the > contrived syntax): > > SELECT address, :{aead(users.credit_card, 'u...@example.com')} > FROM users WHERE email = 'u...@example.com'; > > UPDATE users >SET :{aead(users.credit_card, 'u...@example.com')} = '1234-...' > WHERE email = 'u...@example.com'; > > The client explicitly links a table's column to its AD for the duration > of the query. This approach can't scale to > > SELECT credit_card FROM users; > > because in this case the AD for each row is different, but I'd argue > that's ideal for this particular case. The client doesn't need to (and > probably shouldn't) grab everyone's credit card details all at once, so > there's no reason to optimize for it. > Maybe, but it seems like a rather annoying limitation, as it restricts the client to single-row queries (or at least it looks like that to me). Yes, it may be fine for some use cases, but I'd bet a DBA who can modify data can do plenty other things - swapping "old" values, which will have the right AD, for example. >>> Unless "transparent" means that the client completely defers to the >>> server on whether to encrypt or not, and silently goes along with it if >>> the server tells it not to encrypt? >> I think that's probably a valid concern - a "bad DBA" could alter the >> table definition to not contain the "ENCRYPTED" bits, and then peek at >> the plaintext values. >> >> But it's not clear to me how exactly would the AEAD prevent this? >> Wouldn't that be also specified on the server, somehow? In which case >> the DBA could just tweak that too, no? >> >> In other words, this issue seems mostly orthogonal to the AEAD, and the >> right solution would be to allow the client to define which columns have >> to be encrypted (in which case altering the server definition would not >> be enough). > > Right, exactly. When I mentioned AEAD I had assumed that "allow the > client to define which columns have to be encrypted" was already > planned or in the works; I just misunderstood pieces of Peter's email. > It's that piece where a client would probably have to add details > around AEAD and its use. > >>> That would only protect against a >>> _completely_ passive DBA, like som
Re: Transparent column encryption
On Wed, 2021-12-08 at 02:58 +0100, Tomas Vondra wrote: > > On 12/8/21 00:26, Jacob Champion wrote: > > On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote: > > > IMO it's impossible to solve this attack within TCE, because it requires > > > ensuring consistency at the row level, but TCE obviously works at column > > > level only. > > > > I was under the impression that clients already had to be modified to > > figure out how to encrypt the data? If part of that process ends up > > including enforcement of encryption for a specific column set, then the > > addition of AEAD data could hypothetically be part of that hand- > > waviness. > > I think "transparency" here means the client just uses the regular > prepared-statement API without having to explicitly encrypt/decrypt any > data. The problem is we can't easily tie this to other columns in the > table, because the client may not even know what values are in those > columns. The way I originally described my request -- "I'd like to be able to tie an encrypted value to other column (or external) data" -- was not very clear. With my proposed model -- where the DBA (and the server) are completely untrusted, and the DBA needs to be prevented from using the encrypted value -- I don't think there's a useful way for the client to use associated data that comes from the server. The client has to know what the AD should be beforehand, because otherwise the DBA can make it so the server returns whatever is correct. > Imagine you do this > > UPDATE t SET encrypted_column = $1 WHERE another_column = $2; > > but you want to ensure the encrypted value belongs to a particular row > (which may or may not be identified by the another_column value). How > would the client do that? Should it fetch the value or what? > > Similarly, what if the client just does > > SELECT encrypted_column FROM t; > > How would it verify the values belong to the row, without having all the > data for the row (or just the required columns)? So with my (hopefully more clear) model above, it wouldn't. The client would already have the AD, and somehow tell libpq what that data was for the query. The rabbit hole I led you down is one where we use the rest of the row as AD, to try to freeze pieces of it in place. That might(?) have some useful security properties (if the client defines its use and doesn't defer to the server). But it's not what I intended to propose and I'd have to think about that case some more. In my credit card example, I'm imagining something like (forgive the contrived syntax): SELECT address, :{aead(users.credit_card, 'u...@example.com')} FROM users WHERE email = 'u...@example.com'; UPDATE users SET :{aead(users.credit_card, 'u...@example.com')} = '1234-...' WHERE email = 'u...@example.com'; The client explicitly links a table's column to its AD for the duration of the query. This approach can't scale to SELECT credit_card FROM users; because in this case the AD for each row is different, but I'd argue that's ideal for this particular case. The client doesn't need to (and probably shouldn't) grab everyone's credit card details all at once, so there's no reason to optimize for it. > > Unless "transparent" means that the client completely defers to the > > server on whether to encrypt or not, and silently goes along with it if > > the server tells it not to encrypt? > I think that's probably a valid concern - a "bad DBA" could alter the > table definition to not contain the "ENCRYPTED" bits, and then peek at > the plaintext values. > > But it's not clear to me how exactly would the AEAD prevent this? > Wouldn't that be also specified on the server, somehow? In which case > the DBA could just tweak that too, no? > > In other words, this issue seems mostly orthogonal to the AEAD, and the > right solution would be to allow the client to define which columns have > to be encrypted (in which case altering the server definition would not > be enough). Right, exactly. When I mentioned AEAD I had assumed that "allow the client to define which columns have to be encrypted" was already planned or in the works; I just misunderstood pieces of Peter's email. It's that piece where a client would probably have to add details around AEAD and its use. > > That would only protect against a > > _completely_ passive DBA, like someone reading unencrypted backups, > > etc. And that still has a lot of value, certainly. But it seems like > > this prototype is very close to a system where the client can reliably > > secure data even if the server isn't trustworthy, if that's a use case > > you're interested in. > > Right. IMHO the "passive attacker" is a perfectly fine model for use > cases that would be fine with e.g. pgcrypto if there was no risk of > leaking plaintext values to logs, system catalogs, etc. > > If we can improve it to provide (at least some) protection against > active attackers, that'd be a nice bonus. I agree that resis
Re: Transparent column encryption
On 12/8/21 00:26, Jacob Champion wrote: > On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote: >> IMO it's impossible to solve this attack within TCE, because it requires >> ensuring consistency at the row level, but TCE obviously works at column >> level only. > > I was under the impression that clients already had to be modified to > figure out how to encrypt the data? If part of that process ends up > including enforcement of encryption for a specific column set, then the > addition of AEAD data could hypothetically be part of that hand- > waviness. > I think "transparency" here means the client just uses the regular prepared-statement API without having to explicitly encrypt/decrypt any data. The problem is we can't easily tie this to other columns in the table, because the client may not even know what values are in those columns. Imagine you do this UPDATE t SET encrypted_column = $1 WHERE another_column = $2; but you want to ensure the encrypted value belongs to a particular row (which may or may not be identified by the another_column value). How would the client do that? Should it fetch the value or what? Similarly, what if the client just does SELECT encrypted_column FROM t; How would it verify the values belong to the row, without having all the data for the row (or just the required columns)? > Unless "transparent" means that the client completely defers to the > server on whether to encrypt or not, and silently goes along with it if > the server tells it not to encrypt? I think that's probably a valid concern - a "bad DBA" could alter the table definition to not contain the "ENCRYPTED" bits, and then peek at the plaintext values. But it's not clear to me how exactly would the AEAD prevent this? Wouldn't that be also specified on the server, somehow? In which case the DBA could just tweak that too, no? In other words, this issue seems mostly orthogonal to the AEAD, and the right solution would be to allow the client to define which columns have to be encrypted (in which case altering the server definition would not be enough). > That would only protect against a > _completely_ passive DBA, like someone reading unencrypted backups, > etc. And that still has a lot of value, certainly. But it seems like > this prototype is very close to a system where the client can reliably > secure data even if the server isn't trustworthy, if that's a use case > you're interested in. > Right. IMHO the "passive attacker" is a perfectly fine model for use cases that would be fine with e.g. pgcrypto if there was no risk of leaking plaintext values to logs, system catalogs, etc. If we can improve it to provide (at least some) protection against active attackers, that'd be a nice bonus. >> I believe TCE can do AEAD at the column level, which protects against >> attacks that flipping bits, and similar attacks. It's just a matter of >> how the client encrypts the data. > > Right, I think authenticated encryption ciphers (without AD) will be > important to support in practice. I think users are going to want > *some* protection against active attacks. > >> Extending it to protect the whole row seems tricky, because the client >> may not even know the other columns, and it's not clear to me how it'd >> deal with things like updates of the other columns, hint bits, dropped >> columns, etc. > > Covering the entire row automatically probably isn't super helpful in > practice. As you mention later: > >> It's probably possible to get something like this (row-level AEAD) by >> encrypting enriched data, i.e. not just the card number, but {user ID, >> card number} or something like that, and verify that in the webapp. The >> problem of course is that the "user ID" is just another column in the >> table, and there's nothing preventing the DBA from modifying that too. > > Right. That's why the client has to be able to choose AD according to > the application. In my previous example, the victim's email address can > be copied by the DBA, but they wouldn't be able to authenticate as that > user and couldn't convince the client to use the plaintext on their > behalf. > Well, yeah. But I'm not sure how to make that work easily, because the client may not have the data :-( I was thinking about using a composite data type combining the data with the extra bits - that'd not be all that transparent as it'd require the client to build this manually and then also cross-check it after loading the data. So the user would be responsible for having all the data. But doing that automatically/transparently seems hard, because how would you deal e.g. with SELECT queries reading data through a view or CTE? How would you declare this, either at the client or server? Do any other databases have this capability? How do they do it? regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Transparent column encryption
On Tue, 2021-12-07 at 22:21 +0100, Tomas Vondra wrote: > IMO it's impossible to solve this attack within TCE, because it requires > ensuring consistency at the row level, but TCE obviously works at column > level only. I was under the impression that clients already had to be modified to figure out how to encrypt the data? If part of that process ends up including enforcement of encryption for a specific column set, then the addition of AEAD data could hypothetically be part of that hand- waviness. Unless "transparent" means that the client completely defers to the server on whether to encrypt or not, and silently goes along with it if the server tells it not to encrypt? That would only protect against a _completely_ passive DBA, like someone reading unencrypted backups, etc. And that still has a lot of value, certainly. But it seems like this prototype is very close to a system where the client can reliably secure data even if the server isn't trustworthy, if that's a use case you're interested in. > I believe TCE can do AEAD at the column level, which protects against > attacks that flipping bits, and similar attacks. It's just a matter of > how the client encrypts the data. Right, I think authenticated encryption ciphers (without AD) will be important to support in practice. I think users are going to want *some* protection against active attacks. > Extending it to protect the whole row seems tricky, because the client > may not even know the other columns, and it's not clear to me how it'd > deal with things like updates of the other columns, hint bits, dropped > columns, etc. Covering the entire row automatically probably isn't super helpful in practice. As you mention later: > It's probably possible to get something like this (row-level AEAD) by > encrypting enriched data, i.e. not just the card number, but {user ID, > card number} or something like that, and verify that in the webapp. The > problem of course is that the "user ID" is just another column in the > table, and there's nothing preventing the DBA from modifying that too. Right. That's why the client has to be able to choose AD according to the application. In my previous example, the victim's email address can be copied by the DBA, but they wouldn't be able to authenticate as that user and couldn't convince the client to use the plaintext on their behalf. --Jacob
Re: Transparent column encryption
On 12/7/21 19:02, Jacob Champion wrote: On Tue, 2021-12-07 at 16:39 +0100, Peter Eisentraut wrote: On 06.12.21 21:44, Jacob Champion wrote: I think reusing a zero IV will potentially leak more information than just equality, depending on the cipher in use. You may be interested in synthetic IVs and nonce-misuse resistance (e.g. [1]), since they seem like they would match this use case exactly. (But I'm not a cryptographer.) I'm aware of this and plan to make use of SIV. The current implementation is just an example. Sounds good. Have you given any thought to AEAD? As a client I'd like to be able to tie an encrypted value to other column (or external) data. For example, AEAD could be used to prevent a DBA from copying the (encrypted) value of my credit card column into their account's row to use it. I don't know how that is supposed to work. When the value is encrypted for insertion, the client may know things like table name or column name, so it can tie it to those. But it doesn't know what row it will go in, so you can't prevent the value from being copied into another row. You would need some permanent logical row ID for this, I think. Sorry, my description was confusing. There's nothing preventing the DBA from copying the value inside the database, but AEAD can make it so that the copied value isn't useful to the DBA. Sample case. Say I have a webapp backed by Postgres, which stores encrypted credit card numbers. Users authenticate to the webapp which then uses the client (which has the keys) to talk to the database. Additionally, I assume that: - the DBA can't access the client directly (because if they can, then they can unencrypt the victim's info using the client's keys), and - the DBA can't authenticate as the user/victim (because if they can, they can just log in themselves and have the data). The webapp might for example use federated authn with a separate provider, using an email address as an identifier. Now, if the client encrypts a user's credit card number using their email address as associated data, then it doesn't matter if the DBA copies that user's encrypted card over to their own account. The DBA can't log in as the victim, so the client will fail to authenticate the value because its associated data won't match. This is not targeting PostgreSQL 15. But I'd appreciate some feedback on the direction. What kinds of attacks are you hoping to prevent (and not prevent)? The point is to prevent admins from getting at plaintext data. The scenario you show is an interesting one but I think it's not meant to be addressed by this. If admins can alter the database to their advantage, they could perhaps increase their account balance, create discount codes, etc. also. Sure, but increasing account balances and discount codes don't lead to getting at plaintext data, right? Whereas stealing someone else's encrypted value seems like it would be covered under your threat model, since it lets you trick a real-world client into decrypting it for you. Other avenues of attack might depend on how you choose to add HMAC to the current choice of AES-CBC. My understanding of AE ciphers (with or without associated data) is that you don't have to design that yourself, which is nice. IMO it's impossible to solve this attack within TCE, because it requires ensuring consistency at the row level, but TCE obviously works at column level only. I believe TCE can do AEAD at the column level, which protects against attacks that flipping bits, and similar attacks. It's just a matter of how the client encrypts the data. Extending it to protect the whole row seems tricky, because the client may not even know the other columns, and it's not clear to me how it'd deal with things like updates of the other columns, hint bits, dropped columns, etc. It's probably possible to get something like this (row-level AEAD) by encrypting enriched data, i.e. not just the card number, but {user ID, card number} or something like that, and verify that in the webapp. The problem of course is that the "user ID" is just another column in the table, and there's nothing preventing the DBA from modifying that too. So I think it's pointless to try extending this to row-level AEAD. regards -- Tomas Vondra EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
Re: Transparent column encryption
On Tue, 2021-12-07 at 16:39 +0100, Peter Eisentraut wrote: > On 06.12.21 21:44, Jacob Champion wrote: > > I think reusing a zero IV will potentially leak more information than > > just equality, depending on the cipher in use. You may be interested in > > synthetic IVs and nonce-misuse resistance (e.g. [1]), since they seem > > like they would match this use case exactly. (But I'm not a > > cryptographer.) > > I'm aware of this and plan to make use of SIV. The current > implementation is just an example. Sounds good. > > Have you given any thought to AEAD? As a client I'd like to be able to > > tie an encrypted value to other column (or external) data. For example, > > AEAD could be used to prevent a DBA from copying the (encrypted) value > > of my credit card column into their account's row to use it. > > I don't know how that is supposed to work. When the value is encrypted > for insertion, the client may know things like table name or column > name, so it can tie it to those. But it doesn't know what row it will > go in, so you can't prevent the value from being copied into another > row. You would need some permanent logical row ID for this, I think. Sorry, my description was confusing. There's nothing preventing the DBA from copying the value inside the database, but AEAD can make it so that the copied value isn't useful to the DBA. Sample case. Say I have a webapp backed by Postgres, which stores encrypted credit card numbers. Users authenticate to the webapp which then uses the client (which has the keys) to talk to the database. Additionally, I assume that: - the DBA can't access the client directly (because if they can, then they can unencrypt the victim's info using the client's keys), and - the DBA can't authenticate as the user/victim (because if they can, they can just log in themselves and have the data). The webapp might for example use federated authn with a separate provider, using an email address as an identifier. Now, if the client encrypts a user's credit card number using their email address as associated data, then it doesn't matter if the DBA copies that user's encrypted card over to their own account. The DBA can't log in as the victim, so the client will fail to authenticate the value because its associated data won't match. > > > This is not targeting PostgreSQL 15. But I'd appreciate some feedback > > > on the direction. > > > > What kinds of attacks are you hoping to prevent (and not prevent)? > > The point is to prevent admins from getting at plaintext data. The > scenario you show is an interesting one but I think it's not meant to be > addressed by this. If admins can alter the database to their advantage, > they could perhaps increase their account balance, create discount > codes, etc. also. Sure, but increasing account balances and discount codes don't lead to getting at plaintext data, right? Whereas stealing someone else's encrypted value seems like it would be covered under your threat model, since it lets you trick a real-world client into decrypting it for you. Other avenues of attack might depend on how you choose to add HMAC to the current choice of AES-CBC. My understanding of AE ciphers (with or without associated data) is that you don't have to design that yourself, which is nice. --Jacob
Re: Transparent column encryption
On 06.12.21 21:44, Jacob Champion wrote: I think reusing a zero IV will potentially leak more information than just equality, depending on the cipher in use. You may be interested in synthetic IVs and nonce-misuse resistance (e.g. [1]), since they seem like they would match this use case exactly. (But I'm not a cryptographer.) I'm aware of this and plan to make use of SIV. The current implementation is just an example. Have you given any thought to AEAD? As a client I'd like to be able to tie an encrypted value to other column (or external) data. For example, AEAD could be used to prevent a DBA from copying the (encrypted) value of my credit card column into their account's row to use it. I don't know how that is supposed to work. When the value is encrypted for insertion, the client may know things like table name or column name, so it can tie it to those. But it doesn't know what row it will go in, so you can't prevent the value from being copied into another row. You would need some permanent logical row ID for this, I think. For this scenario, the deterministic encryption mode is perhaps not the right one. This is not targeting PostgreSQL 15. But I'd appreciate some feedback on the direction. What kinds of attacks are you hoping to prevent (and not prevent)? The point is to prevent admins from getting at plaintext data. The scenario you show is an interesting one but I think it's not meant to be addressed by this. If admins can alter the database to their advantage, they could perhaps increase their account balance, create discount codes, etc. also. If this is a problem, then perhaps a better approach would be to store parts of the data in a separate database with separate admins.
Re: Transparent column encryption
On 06.12.21 19:28, Robert Haas wrote: Speaking of parameter descriptions, the trickiest part of this whole thing appears to be how to get transparently encrypted data into the database (as opposed to reading it out). It is required to use protocol-level prepared statements (i.e., extended query) for this. Why? If the client knows the CEK, can't the client choose to send unprepared insert or update statements with pre-encrypted blobs? That might be a bad idea from a security perspective because the encrypted blob might then got logged, but we sometimes log parameters, too. The client can send something like PQexec(conn, "INSERT INTO tbl VALUES ('ENCBLOB', 'ENCBLOB')"); and it will work. (See the included test suite where 'ENCBLOB' is actually computed by pgcrypto.) But that is not transparent encryption. The client wants to send "INSERT INTO tbl VALUES ('val1', 'val2')" and have libpq take care of encrypting 'val1' and 'val2' before hitting the wire. For that you need to use the prepared statement API so that the values are available separately from the statement. And furthermore the client needs to know what columns the insert statements is writing to, so that it can get the CEK for that column. That's what it needs the parameter description for. As alluded to, workarounds exist or might be made available to do part of that work yourself, but that shouldn't be the normal way of using it.
Re: Transparent column encryption
On Fri, 2021-12-03 at 22:32 +0100, Peter Eisentraut wrote: > This feature does support deterministic > encryption as an alternative to the default randomized encryption, so > in that mode you can do equality lookups, at the cost of some > security. > + if (enc_det) > + memset(iv, ivlen, 0); I think reusing a zero IV will potentially leak more information than just equality, depending on the cipher in use. You may be interested in synthetic IVs and nonce-misuse resistance (e.g. [1]), since they seem like they would match this use case exactly. (But I'm not a cryptographer.) > The encryption algorithms are mostly hardcoded right now, but there > are facilities for picking algorithms and adding new ones that will be > expanded. The CMK process uses RSA-OAEP. The CEK process uses > AES-128-CBC right now; a more complete solution should probably > involve some HMAC thrown in. Have you given any thought to AEAD? As a client I'd like to be able to tie an encrypted value to other column (or external) data. For example, AEAD could be used to prevent a DBA from copying the (encrypted) value of my credit card column into their account's row to use it. > This is not targeting PostgreSQL 15. But I'd appreciate some feedback > on the direction. What kinds of attacks are you hoping to prevent (and not prevent)? --Jacob [1] https://datatracker.ietf.org/doc/html/rfc8452
Re: Transparent column encryption
On Fri, Dec 3, 2021 at 4:32 PM Peter Eisentraut wrote: > But it's missing the remaining 90% of the work, > including additional DDL support, error handling, robust memory > management, protocol versioning, forward and backward compatibility, > pg_dump support, psql \d support, refinement of the cryptography, and > so on. But I think obvious solutions exist to all of those things, so > it isn't that interesting to focus on them for now. Right, we wouldn't want to get bogged down at this stage in little details like, uh, everything. > Some protocol extensions are required. These should be guarded by > some _pq_... setting, but this is not done in this patch yet. As > mentioned above, extra messages are added for sending the CMKs and > CEKs. In the RowDescription message, I have commandeered the format > field to add a bit that indicates that the field is encrypted. This > could be made a separate field, and there should probably be > additional fields to indicate the algorithm and CEK name, but this was > easiest for now. The ParameterDescription message is extended to > contain format fields for each parameter, for the same purpose. > Again, this could be done differently. I think this is reasonable. I would choose to use an additional bit in the format field as opposed to a separate field. It is worth considering whether it makes more sense to extend the existing ParameterDescription message conditionally on some protocol-level option, or whether we should instead, say, add ParameterDescription2 or the moral equivalent. As I see it, the latter feels conceptually simpler, but on the other hand, our wire protocol supposes that we will never run out of 1-byte codes for messages, so perhaps some prudence is needed. > Speaking of parameter descriptions, the trickiest part of this whole > thing appears to be how to get transparently encrypted data into the > database (as opposed to reading it out). It is required to use > protocol-level prepared statements (i.e., extended query) for this. Why? If the client knows the CEK, can't the client choose to send unprepared insert or update statements with pre-encrypted blobs? That might be a bad idea from a security perspective because the encrypted blob might then got logged, but we sometimes log parameters, too. -- Robert Haas EDB: http://www.enterprisedb.com