Agreed with what Dan said. I think there are a number of interesting design alternatives to be considered, so before coding it would be great to work through a design document to explore the alternatives. For example, we could try to apply encryption at the 'fs/' layer, which would cover all non-WAL data, but then we would lose the ability to specify encryption on a per-column basis. There are other requirements that need to be ironed out about whether we'd need to support separate encryption keys per column/table/server/etc, whether metadata also needs to be encrypted, etc.
-Todd On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert <[email protected]> wrote: > Hi Franco, > > I think you are right that a client-based approach wouldn't work, because > we wouldn't want to encrypt at the level of individual cell values. That > would get in the way of encoding, compression, predicate evaluation, etc. > As you note, adding encryption at the block layer is probably the way to > go. Key management is definitely the tricky issue. We do have one > advantage over HDFS - because Kudu does logical replication, the encryption > key can be scoped to a particular tablet server or tablet replica, it > wouldn't need to be shared among all replicas. I haven't done enough > research to know if this makes it fundamentally easier to do key > management. I would assume at a minimum we would want to integrate with > key providers such an HSM. It would be good to have a thorough review of > existing solutions in the space, such as TDE > <https://en.wikipedia.org/wiki/Transparent_Data_Encryption> and the > Hadoop KMS. Is this something you are interested in working on? > > - Dan > > On Tue, Apr 25, 2017 at 8:30 AM, David Alves <[email protected]> > wrote: > >> Hi Franco >> >> Dan, Alexey, Todd are our security experts. >> Folks, thoughts on this? >> >> Best >> David >> >> On Mon, Apr 24, 2017 at 7:08 PM, <[email protected]> wrote: >> >>> Over the weekend I started looking at what it would take to add data >>> encryption to Kudu (besides using filesystem encryption via dm-crypt or >>> something like that). >>> >>> Here are a few notes - please feel free to comment on them and add >>> suggestions: >>> >>> - reading through this mailing list, it looks like this feature has been >>> asked a couple of times but last year, but from what I can tell, noone is >>> currently working on it. >>> - a client-based approach to encryption like the one used by HDFS >>> wouldn't work (at least out of the box) because for instance encrypting the >>> primary key at the client would prevent being able to have range filters >>> for scans; it might work for the columns that are not part of the primary >>> key >>> - there's already code in Kudu for several compression codecs (LZ4, >>> gzip, etc); I thought it would be possible to add similar code for >>> encryption codecs (to be applied after the compression, of course) >>> - the WAL log files and delta files should be similarly encrypted too >>> - not sure what would be the best way to manage the key - I see that in >>> HDFS they use a double key mechanism, where the encryption key for the data >>> file is itself encrypted with the allowed user key and this whole process >>> is managed by an external Key Management Service >>> >>> Thanks in advance for your ideas and suggestions, >>> Franco >>> >> >> > -- Todd Lipcon Software Engineer, Cloudera
