Over the weekend I started looking at what it would take to add data encryption 
to Kudu (besides using filesystem encryption via dm-crypt or something like 
that). 

Here are a few notes - please feel free to comment on them and add suggestions: 

- reading through this mailing list, it looks like this feature has been asked 
a couple of times but last year, but from what I can tell, noone is currently 
working on it. 
- a client-based approach to encryption like the one used by HDFS wouldn't work 
(at least out of the box) because for instance encrypting the primary key at 
the client would prevent being able to have range filters for scans; it might 
work for the columns that are not part of the primary key 
- there's already code in Kudu for several compression codecs (LZ4, gzip, etc); 
I thought it would be possible to add similar code for encryption codecs (to be 
applied after the compression, of course) 
- the WAL log files and delta files should be similarly encrypted too 
- not sure what would be the best way to manage the key - I see that in HDFS 
they use a double key mechanism, where the encryption key for the data file is 
itself encrypted with the allowed user key and this whole process is managed by 
an external Key Management Service 

Thanks in advance for your ideas and suggestions, 
Franco 

Reply via email to