On Thu, Aug 15, 2019 at 09:01:05PM -0400, Stephen Frost wrote: > * Bruce Momjian (br...@momjian.us) wrote: > > I assume you are talking about my option #1. I can see if you only need > > a few tables encrypted, e.g., credit card numbers, it can be excessive > > to encrypt the entire cluster. (I think you would need to encrypt > > pg_statistic too.) > > Or we would need a seperate encrypted pg_statistic, or a way to encrypt > certain entries inside pg_statistic.
Yes. > > The tricky part will be WAL --- if we encrypt all of WAL, the per-table > > overhead might be minimal compared to the WAL encryption overhead. The > > better solution would be to add a flag to WAL records to indicate > > encrypted entries, but you would then leak when an encryption change > > happens and WAL record length. (FYI, numeric values have different > > lengths, as do character strings.) I assume we would still use a single > > key for all tables/indexes, and one for WAL, plus key rotation > > requirements. > > I don't think the fact that a change was done to an encrypted blob is an > actual 'leak'- anyone can tell that by looking at the at the encrypted > data before and after. Further, the actual change would be encrypted, > right? Length of data is necessary to include in the vast majority of > cases that the data is being dealt with and so I'm not sure that it > makes sense for us to be worrying about that as a leak, unless you have > a specific recommendation from a well known source discussing that > concern..? Yes, it is a minor negative, but we would need to see some performance reason to have that minor negative, and I have already stated why I think there might be no performance reason to do so. Masahiko Sawada talk at PGCon 2019 supports that conclusion: https://www.youtube.com/watch?v=TXKoo2SNMzk > > I personally would like to see full cluster implemented first to find > > out exactly what the overhead is. As I stated earlier, the overhead of > > determining which things to encrypt, both in code complexity, user > > interface, and processing overhead, might not be worth it. > > I disagree with this and feel that the overhead that's being discussed > here (user interface, figuring out if we should encrypt it or not, > processing overhead for those determinations) is along the lines of > UNLOGGED tables, yet there wasn't any question about if that was a valid > or useful feature to implement. The biggest challenge here is really We implemented UNLOGGED tables because where was a clear performance win to doing so. I have not seen any measurements for encryption, particularly when WAL is considered. > around key management and I agree that's difficult but it's also really > important and something that we need to be thinking about- and thinking > about how to work with multiple keys and not just one. Building in an > assumption that we will only ever work with one key would make this > capability nothing more than DBA-managed filesystem-level encryption Agreed, that's all it is. > (though even there different tablespaces could have different keys...) > and I worry would make later work to support multiple keys more > difficult and less likely to actually happen. It's also not clear to me > why we aren't building in *some* mechanism to work with multiple keys > from the start as part of the initial design. Well, every time I look at multiple keys, I go over exactly what that means and how it behaves, but get no feedback on how to address the problems. > > I can see why you would think that encrypting less would be easier than > > encrypting more, but security boundaries are hard to construct, and > > anything that requires a user API, even more so. > > I'm not sure I'm follwing here- I'm pretty sure everyone understands > that selective encryption will require more work to implement, in part > because an API needs to be put in place and we need to deal with > multiple keys, etc. I don't think anyone thinks that'll be "easier". Uh, I thought Masahiko Sawada stated but, but looking back, I don't see it, so I must be wrong. > > > > > At least it should be clear how [2] will retrieve the master key > > > > > because [1] > > > > > should not do it in a differnt way. (The GUC > > > > > cluster_passphrase_command > > > > > mentioned in [3] seems viable, although I think [1] uses approach > > > > > which is > > > > > more convenient if the passphrase should be read from console.) > > > > > > I think that we can also provide a way to pass encryption key directly > > > to postmaster rather than using passphrase. Since it's common that > > > user stores keys in KMS it's useful if we can do that. > > > > Why would it not be simpler to have the cluster_passphrase_command run > > whatever command-line program it wants? If you don't want to use a > > shell command, create an executable and call that. > > Having direct integration with a KMS would certainly be valuable, and I > don't see a reason to deny users that option if someone would like to > spend time implementing it- in addition to a simpler mechanism such as a > passphrase command, which I believe is what was being suggested here. OK, I am just trying to see why we would not use the cluster_passphrase_command-like interface to do that. > > > > > Rotation of > > > > > the master key is another thing that both versions of the feature > > > > > should do in > > > > > the same way. And of course, the fronend applications need consistent > > > > > approach > > > > > too. > > > > > > > > I don't see the value of an external library for key storage. > > > > > > I think that big benefit is that PostgreSQL can seamlessly work with > > > external services such as KMS. For instance, when key rotation, > > > PostgreSQL can register new key to KMS and use it, and it can remove > > > keys when it no longer necessary. That is, it can enable PostgreSQL to > > > not only not only getting key from KMS but also registering and > > > removing keys. And we also can decrypt MDEK in KMS instead of doing in > > > PostgreSQL which is more safety. In addition, once someone create the > > > plugin library of an external services individual projects don't need > > > to create that. > > > > I think the big win for an external library is when you don't want the > > overhead of calling an external program. For example, we certainly > > would not want to call an external program while processing a query. Do > > we have any such requirements for encryption, especially since we only > > are going to allow offline mode for encryption mode changes and key > > rotation in the first version? > > The strong push for a stripped-down and "first version" that is > extremely limited is really grating on me as it seems we have quite a Well, "grating" doesn't change any facts. If you want to change that, you will need to do as I stated earlier: https://www.postgresql.org/message-id/20190810021716.ovpqenqjb3b7u...@momjian.us > few people who are interested in making progress here and a small number > of others who are pushing back and putting up limitations that "the > first version can't have X" or "the first version can't have Y". > > I'm all for incremental development, but we need to be thinking about > the larger picture when we develop features and make sure that we don't > bake in assumptions that will later become very difficult for us to work > ourselves out of (especially when it comes to user interface and things > like GUCs...), but where we decide to draw a line shouldn't be based on > assumptions about what's going to be difficult and what isn't- let's let > those who want to work on this capability work on it and as we see the > progress, if there's issues which come up with a specific area that seem > likely to prove difficult to include, then we can consider backing away > from that while keeping it in mind while doing further development. I have seen no one present a clear description of how anything beyond all-cluster encryption would work or be secure. Wishing that were not the case doesn't change things. > In other words, I feel like we're getting trapped here in a > "requirements definition" phase of a traditional waterfall-style > development cycle we have to decide, up front, the EXACT set of features > and capabilities that we want and then we are going to expect people to > develop according to EXACTLY that set, and we'll shoot down anything > that comes across which is trying to do more or is trying to be more > flexible in anticipation of capabilities that we know we will want down > the road. It's likely clear already but I'll say it anyway- I don't > think it's a good idea to go down that route. I will continue to shoot down whatever I think has no reasonable chance of working. I can just let it go and watch it fail, but I don't see that as a good approach. I will state whet I have already told some people privately, that for this feature, we have many people understanding 40% of the problem, but thinking they understand 90%. I do agree we should plan for our eventual full feature set, but I can't figure out what that feature set looks like beyond full-cluster encryption, and no one is addressing my concerns to move that forward. Vague complains that they don't like the process are not changing that. -- Bruce Momjian <br...@momjian.us> http://momjian.us EnterpriseDB http://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +