Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Bruce Momjian Fri, 16 Aug 2019 06:31:37 -0700

On Thu, Aug 15, 2019 at 09:01:05PM -0400, Stephen Frost wrote:
> * Bruce Momjian (br...@momjian.us) wrote:
> > I assume you are talking about my option #1.  I can see if you only need
> > a few tables encrypted, e.g., credit card numbers, it can be excessive
> > to encrypt the entire cluster.  (I think you would need to encrypt
> > pg_statistic too.)
> 
> Or we would need a seperate encrypted pg_statistic, or a way to encrypt
> certain entries inside pg_statistic.


Yes.

> > The tricky part will be WAL --- if we encrypt all of WAL, the per-table
> > overhead might be minimal compared to the WAL encryption overhead.  The
> > better solution would be to add a flag to WAL records to indicate
> > encrypted entries, but you would then leak when an encryption change
> > happens and WAL record length.  (FYI, numeric values have different
> > lengths, as do character strings.)  I assume we would still use a single
> > key for all tables/indexes, and one for WAL, plus key rotation
> > requirements.
> 
> I don't think the fact that a change was done to an encrypted blob is an
> actual 'leak'- anyone can tell that by looking at the at the encrypted
> data before and after.  Further, the actual change would be encrypted,
> right?  Length of data is necessary to include in the vast majority of
> cases that the data is being dealt with and so I'm not sure that it
> makes sense for us to be worrying about that as a leak, unless you have
> a specific recommendation from a well known source discussing that
> concern..?

Yes, it is a minor negative, but we would need to see some performance
reason to have that minor negative, and I have already stated why I
think there might be no performance reason to do so.  Masahiko Sawada
talk at PGCon 2019 supports that conclusion:

        https://www.youtube.com/watch?v=TXKoo2SNMzk

> > I personally would like to see full cluster implemented first to find
> > out exactly what the overhead is.  As I stated earlier, the overhead of
> > determining which things to encrypt, both in code complexity, user
> > interface, and processing overhead, might not be worth it.
> 
> I disagree with this and feel that the overhead that's being discussed
> here (user interface, figuring out if we should encrypt it or not,
> processing overhead for those determinations) is along the lines of
> UNLOGGED tables, yet there wasn't any question about if that was a valid
> or useful feature to implement.  The biggest challenge here is really

We implemented UNLOGGED tables because where was a clear performance win
to doing so.  I have not seen any measurements for encryption,
particularly when WAL is considered.

> around key management and I agree that's difficult but it's also really
> important and something that we need to be thinking about- and thinking
> about how to work with multiple keys and not just one.  Building in an
> assumption that we will only ever work with one key would make this
> capability nothing more than DBA-managed filesystem-level encryption

Agreed, that's all it is.

> (though even there different tablespaces could have different keys...)
> and I worry would make later work to support multiple keys more
> difficult and less likely to actually happen.  It's also not clear to me
> why we aren't building in *some* mechanism to work with multiple keys
> from the start as part of the initial design.

Well, every time I look at multiple keys, I go over exactly what that
means and how it behaves, but get no feedback on how to address the
problems.

> > I can see why you would think that encrypting less would be easier than
> > encrypting more, but security boundaries are hard to construct, and
> > anything that requires a user API, even more so.
> 
> I'm not sure I'm follwing here- I'm pretty sure everyone understands
> that selective encryption will require more work to implement, in part
> because an API needs to be put in place and we need to deal with
> multiple keys, etc.  I don't think anyone thinks that'll be "easier".

Uh, I thought Masahiko Sawada stated but, but looking back, I don't see
it, so I must be wrong.

> > > > > At least it should be clear how [2] will retrieve the master key 
> > > > > because [1]
> > > > > should not do it in a differnt way. (The GUC 
> > > > > cluster_passphrase_command
> > > > > mentioned in [3] seems viable, although I think [1] uses approach 
> > > > > which is
> > > > > more convenient if the passphrase should be read from console.)
> > > 
> > > I think that we can also provide a way to pass encryption key directly
> > > to postmaster rather than using passphrase. Since it's common that
> > > user stores keys in KMS it's useful if we can do that.
> > 
> > Why would it not be simpler to have the cluster_passphrase_command run
> > whatever command-line program it wants?  If you don't want to use a
> > shell command, create an executable and call that.
> 
> Having direct integration with a KMS would certainly be valuable, and I
> don't see a reason to deny users that option if someone would like to
> spend time implementing it- in addition to a simpler mechanism such as a
> passphrase command, which I believe is what was being suggested here.

OK,  I am just trying to see why we would not use the
cluster_passphrase_command-like interface to do that.

> > > > > Rotation of
> > > > > the master key is another thing that both versions of the feature 
> > > > > should do in
> > > > > the same way. And of course, the fronend applications need consistent 
> > > > > approach
> > > > > too.
> > > >
> > > > I don't see the value of an external library for key storage.
> > > 
> > > I think that big benefit is that PostgreSQL can seamlessly work with
> > > external services such as KMS. For instance, when key rotation,
> > > PostgreSQL can register new key to KMS and use it, and it can remove
> > > keys when it no longer necessary. That is, it can enable PostgreSQL to
> > > not only not only getting key from KMS but also registering and
> > > removing keys. And we also can decrypt MDEK in KMS instead of doing in
> > > PostgreSQL which is more safety. In addition, once someone create the
> > > plugin library of an external services individual projects don't need
> > > to create that.
> > 
> > I think the big win for an external library is when you don't want the
> > overhead of calling an external program.  For example, we certainly
> > would not want to call an external program while processing a query.  Do
> > we have any such requirements for encryption, especially since we only
> > are going to allow offline mode for encryption mode changes and key
> > rotation in the first version?
> 
> The strong push for a stripped-down and "first version" that is
> extremely limited is really grating on me as it seems we have quite a

Well, "grating" doesn't change any facts.  If you want to change that,
you will need to do as I stated earlier:

        
https://www.postgresql.org/message-id/20190810021716.ovpqenqjb3b7u...@momjian.us

> few people who are interested in making progress here and a small number
> of others who are pushing back and putting up limitations that "the
> first version can't have X" or "the first version can't have Y".
>
> I'm all for incremental development, but we need to be thinking about
> the larger picture when we develop features and make sure that we don't
> bake in assumptions that will later become very difficult for us to work
> ourselves out of (especially when it comes to user interface and things
> like GUCs...), but where we decide to draw a line shouldn't be based on
> assumptions about what's going to be difficult and what isn't- let's let
> those who want to work on this capability work on it and as we see the
> progress, if there's issues which come up with a specific area that seem
> likely to prove difficult to include, then we can consider backing away
> from that while keeping it in mind while doing further development.

I have seen no one present a clear description of how anything beyond
all-cluster encryption would work or be secure.  Wishing that were not
the case doesn't change things.

> In other words, I feel like we're getting trapped here in a
> "requirements definition" phase of a traditional waterfall-style
> development cycle we have to decide, up front, the EXACT set of features
> and capabilities that we want and then we are going to expect people to
> develop according to EXACTLY that set, and we'll shoot down anything
> that comes across which is trying to do more or is trying to be more
> flexible in anticipation of capabilities that we know we will want down
> the road.  It's likely clear already but I'll say it anyway- I don't
> think it's a good idea to go down that route.

I will continue to shoot down whatever I think has no reasonable chance
of working.  I can just let it go and watch it fail, but I don't see
that as a good approach.

I will state whet I have already told some people privately, that for
this feature, we have many people understanding 40% of the problem, but
thinking they understand 90%.  I do agree we should plan for our
eventual full feature set, but I can't figure out what that feature set
looks like beyond full-cluster encryption, and no one is addressing my
concerns to move that forward.  Vague complains that they don't like the
process are not changing that.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +

Re: [Proposal] Table-level Transparent Data Encryption (TDE) and Key Management Service (KMS)

Reply via email to