Greetings, * Magnus Hagander (mag...@hagander.net) wrote: > On Thu, Oct 3, 2019 at 4:40 PM Stephen Frost <sfr...@snowman.net> wrote: > > * Robert Haas (robertmh...@gmail.com) wrote: > > > On Mon, Sep 30, 2019 at 5:26 PM Bruce Momjian <br...@momjian.us> wrote: > > > > For full-cluster Transparent Data Encryption (TDE), the current plan is > > > > to encrypt all heap and index files, WAL, and all pgsql_tmp (work_mem > > > > overflow). The plan is: > > > > > > > > > > https://wiki.postgresql.org/wiki/Transparent_Data_Encryption#TODO_for_Full-Cluster_Encryption > > > > > > > > We don't see much value to encrypting vm, fsm, pg_xact, pg_multixact, > > or > > > > other files. Is that correct? Do any other PGDATA files contain user > > > > data? > > > > > > As others have said, that sounds wrong to me. I think you need to > > > encrypt everything. > > > > That isn't what other database systems do though and isn't what people > > actually asking for this feature are expecting to have or deal with. > > Do any of said other database even *have* the equivalence of say pg_clog or > pg_multixact *stored outside their tablespaces*? (Because as long as the > data is in the tablespace, it's encrypted when using tablespace > encryption..)
That's a fair question and while I'm not specifically sure about all of them, I do believe you're right that for some, the tablespace/database includes that information (and WAL) instead of having it external. I'm also pretty sure that there's still enough information that isn't encrypted to at least *start* the database server. In many ways, we are unfortunately the oddball when it comes to having these cluster-level things that we probably do want to encrypt (I'd be thinking more about pg_authid here than clog, and potentially the WAL). I've been meaning to write up a wiki page or something on this but I just haven't found time, so I'm going to give up on that and just share my thoughts here and folks can do with them what they wish- When it comes to use-cases and attack vectors, I feel like there's really two "big" choices, and I'd like us to support both, ideally, but it boils down to this: do you trust the database maintenance, et al, processes, or no? The same question, put another way, is, do you trust having unencrypted/sensitive data in shared buffers? Let's talk through these for a minute: Yes, shared_buffers is trusted implies: - More data (usefully or not) can be encrypted - WAL, clog, multixact, pg statistics, et al - Various PG processes need to know the decryption keys necessary (autovacuum, crash recovery, being big ones) ... ideally, we could still *start*, which is why I continue to argue that we shouldn't encrypt *everything* because not being able to even start the database system really sucks. What exactly it is that we need I don't know off-hand, maybe we don't need clog, but it seems likely we'll need pg_controldata, for example. My gut feeling on this is really that we need enough to start and open up the vault- which probably means that the vault needs to look more like what I describe below in the situation where you don't trust shared_buffers, to the point where we might have seperate WAL/clog/et al for the vault itself - Fewer limitations (indexes can work more-or-less as-is, for example) - Attack vectors: - Anything that can access shared buffers can get a ton of data - Bugs in PG that expose memory can be leveraged to get access to data and keys - root on the system can pretty trivially gain access to everything - If someone steals the disks/backups, they can't get access to much - Or, if your cloud/storage vendor decides to snoop around they can't see much No, shared_buffers is NOT trusted implies: - we need enough unencrypted data to bring the system up and online and working (crash recovery, autovacuum, need to work)- this likely implies that things like WAL, clog, et al, have to be mostly unencrypted, to allow these processes to work - Limitations on indexes (we can't have the index have unencrypted data, but we also have to have autovacuum able to work... I actually wonder if this might be something we could solve by encrypting the internal pages, leaving the TIDs exposed so that they can be cleaned up but leaf pages have their own ordering so that's not great... I suspect something like this is the reason for the index limitation in other database systems that support column-level encryption) - Sensitive data in WAL is already encrypted - All decryption happens in a given backend when it's sending data to the client - Attack vectors: - root can watch network traffic or individual sessions, possibly gain access to keys (certainly with more difficulty though) - Bugs in PG shouldn't make it very easy for an external attacker to gain access to anything except what they already had access to (sure, they could see shared buffers and see what's in their backend, but everything in shared buffers that's sensitive should be encrypted, and for the most part what's in their backend should only be things they're allowed to access anyway) - If someone steals the disks/backups, they could potentially figure out more information about what was happening on the system - Or, if your cloud/storage vendor decides to snoop around, they could possibly figure things out And then, of course, you can get into the fun of, well, maybe we should have both options be supported at the same time. Looking from an attack-vector standpoint, if the concern is primairly about external attackers through SQL injection and database bugs, not trusting shared buffers is pretty clearly the way to go. If the concern is about stealing hard drives or backups, well, FDE is a great solution there, along with encrypted backups, but, sure, if we rule those out for some reason then we can say that, yes, this will be helpful for that kind of an attack. In either case, we do need a vaulting system, and I think we need to be able to start up PG and get the vault open and accept connections. Thanks, Stephen
signature.asc
Description: PGP signature