Greetings,

* Bruce Momjian (br...@momjian.us) wrote:
> On Thu, Aug 15, 2019 at 06:10:24PM +0900, Masahiko Sawada wrote:
> > On Thu, Aug 15, 2019 at 10:19 AM Bruce Momjian <br...@momjian.us> wrote:
> > >
> > > On Wed, Aug 14, 2019 at 04:36:35PM +0200, Antonin Houska wrote:
> > > > I can work on it right away but don't know where to start.
> > >
> > > I think the big open question is whether there will be acceptance of an
> > > all-cluster encyption feature.  I guess if no one objects, we can move
> > > forward.
> > 
> > I still feel that we need to have per table/tablespace keys although
> > it might not be the first implementation. I think the safeness of both
> > table/tablespace level and cluster level would be almost the same but
> > the former would have an advantage in terms of operation and
> > performance.
> 
> I assume you are talking about my option #1.  I can see if you only need
> a few tables encrypted, e.g., credit card numbers, it can be excessive
> to encrypt the entire cluster.  (I think you would need to encrypt
> pg_statistic too.)

Or we would need a seperate encrypted pg_statistic, or a way to encrypt
certain entries inside pg_statistic.

> The tricky part will be WAL --- if we encrypt all of WAL, the per-table
> overhead might be minimal compared to the WAL encryption overhead.  The
> better solution would be to add a flag to WAL records to indicate
> encrypted entries, but you would then leak when an encryption change
> happens and WAL record length.  (FYI, numeric values have different
> lengths, as do character strings.)  I assume we would still use a single
> key for all tables/indexes, and one for WAL, plus key rotation
> requirements.

I don't think the fact that a change was done to an encrypted blob is an
actual 'leak'- anyone can tell that by looking at the at the encrypted
data before and after.  Further, the actual change would be encrypted,
right?  Length of data is necessary to include in the vast majority of
cases that the data is being dealt with and so I'm not sure that it
makes sense for us to be worrying about that as a leak, unless you have
a specific recommendation from a well known source discussing that
concern..?

> I personally would like to see full cluster implemented first to find
> out exactly what the overhead is.  As I stated earlier, the overhead of
> determining which things to encrypt, both in code complexity, user
> interface, and processing overhead, might not be worth it.

I disagree with this and feel that the overhead that's being discussed
here (user interface, figuring out if we should encrypt it or not,
processing overhead for those determinations) is along the lines of
UNLOGGED tables, yet there wasn't any question about if that was a valid
or useful feature to implement.  The biggest challenge here is really
around key management and I agree that's difficult but it's also really
important and something that we need to be thinking about- and thinking
about how to work with multiple keys and not just one.  Building in an
assumption that we will only ever work with one key would make this
capability nothing more than DBA-managed filesystem-level encryption
(though even there different tablespaces could have different keys...)
and I worry would make later work to support multiple keys more
difficult and less likely to actually happen.  It's also not clear to me
why we aren't building in *some* mechanism to work with multiple keys
from the start as part of the initial design.

> I can see why you would think that encrypting less would be easier than
> encrypting more, but security boundaries are hard to construct, and
> anything that requires a user API, even more so.

I'm not sure I'm follwing here- I'm pretty sure everyone understands
that selective encryption will require more work to implement, in part
because an API needs to be put in place and we need to deal with
multiple keys, etc.  I don't think anyone thinks that'll be "easier".

> > > > At least it should be clear how [2] will retrieve the master key 
> > > > because [1]
> > > > should not do it in a differnt way. (The GUC cluster_passphrase_command
> > > > mentioned in [3] seems viable, although I think [1] uses approach which 
> > > > is
> > > > more convenient if the passphrase should be read from console.)
> > 
> > I think that we can also provide a way to pass encryption key directly
> > to postmaster rather than using passphrase. Since it's common that
> > user stores keys in KMS it's useful if we can do that.
> 
> Why would it not be simpler to have the cluster_passphrase_command run
> whatever command-line program it wants?  If you don't want to use a
> shell command, create an executable and call that.

Having direct integration with a KMS would certainly be valuable, and I
don't see a reason to deny users that option if someone would like to
spend time implementing it- in addition to a simpler mechanism such as a
passphrase command, which I believe is what was being suggested here.

> > > > Rotation of
> > > > the master key is another thing that both versions of the feature 
> > > > should do in
> > > > the same way. And of course, the fronend applications need consistent 
> > > > approach
> > > > too.
> > >
> > > I don't see the value of an external library for key storage.
> > 
> > I think that big benefit is that PostgreSQL can seamlessly work with
> > external services such as KMS. For instance, when key rotation,
> > PostgreSQL can register new key to KMS and use it, and it can remove
> > keys when it no longer necessary. That is, it can enable PostgreSQL to
> > not only not only getting key from KMS but also registering and
> > removing keys. And we also can decrypt MDEK in KMS instead of doing in
> > PostgreSQL which is more safety. In addition, once someone create the
> > plugin library of an external services individual projects don't need
> > to create that.
> 
> I think the big win for an external library is when you don't want the
> overhead of calling an external program.  For example, we certainly
> would not want to call an external program while processing a query.  Do
> we have any such requirements for encryption, especially since we only
> are going to allow offline mode for encryption mode changes and key
> rotation in the first version?

The strong push for a stripped-down and "first version" that is
extremely limited is really grating on me as it seems we have quite a
few people who are interested in making progress here and a small number
of others who are pushing back and putting up limitations that "the
first version can't have X" or "the first version can't have Y".

I'm all for incremental development, but we need to be thinking about
the larger picture when we develop features and make sure that we don't
bake in assumptions that will later become very difficult for us to work
ourselves out of (especially when it comes to user interface and things
like GUCs...), but where we decide to draw a line shouldn't be based on
assumptions about what's going to be difficult and what isn't- let's let
those who want to work on this capability work on it and as we see the
progress, if there's issues which come up with a specific area that seem
likely to prove difficult to include, then we can consider backing away
from that while keeping it in mind while doing further development.

In other words, I feel like we're getting trapped here in a
"requirements definition" phase of a traditional waterfall-style
development cycle we have to decide, up front, the EXACT set of features
and capabilities that we want and then we are going to expect people to
develop according to EXACTLY that set, and we'll shoot down anything
that comes across which is trying to do more or is trying to be more
flexible in anticipation of capabilities that we know we will want down
the road.  It's likely clear already but I'll say it anyway- I don't
think it's a good idea to go down that route.

Thanks,

Stephen

Attachment: signature.asc
Description: PGP signature

Reply via email to