David, Dan, Todd, 
thanks for your prompt replies. 

At this stage I am just exploring what it would take to implement some sort of 
data encryption in Kudu. 

After reading your comments here are some further thoughts: 

- according to the first sentence in this paragraph in the Kudu docs ( 
https://kudu.apache.org/docs/schema_design.html#compression ): 

Kudu allows per-column compression using the LZ4 , Snappy , or zlib compression 
codecs. 

it should be possible to perform per-column encryption by adding 'encryption 
codecs' right after the compression codecs. I browsed through the code quickly 
and I think this done when reading/writing a 'cfile' (please correct me if I am 
wrong). If this is correct, this change could be 'minimally invasive' (at least 
for the 'cfile' part) and would not require a major overhaul of the Kudu 
architecture. 

- as per the key management aspect, I am not a security expert at all, so I am 
not sure what would be the best approach here - my thought here is that in most 
places Kudu is deployed together with HDFS, so it would be 'desirable' if the 
key management were consistent between the two services; on the other hand, I 
also realize that the basic premises are fundamentally different: HDFS encrypts 
everything at the client level and therefore the HDFS engine itself is almost 
completely unaware that the data it stores is actually encrypted (except for a 
special file hidden attribute, if I understand correctly), while in Kudu the 
storage engine must have both the 'public' key (when encrypting) and the 
'private' key (when decrypting) otherwise it can't take advantage of knowing 
the 'structure' of the data (for instance the Bloom filters wouldn't probably 
work with the key being encrypted). This means for instance that an attacker 
who is able to gain access to the Kudu tablet servers would probably be able to 
decrypt the data. Also one way to achieve something similar to what HDFS does 
(i.e. client-based encryption and data encrypted in-flight) could be perhaps 
using a one-time client certificate generated by the KMS server, but this would 
also require changes to the client code. 

Franco 


----- Original Message -----

From: "Todd Lipcon" <[email protected]> 
To: [email protected] 
Sent: Tuesday, April 25, 2017 3:49:50 PM 
Subject: Re: Data encryption in Kudu 

Agreed with what Dan said. 

I think there are a number of interesting design alternatives to be considered, 
so before coding it would be great to work through a design document to explore 
the alternatives. For example, we could try to apply encryption at the 'fs/' 
layer, which would cover all non-WAL data, but then we would lose the ability 
to specify encryption on a per-column basis. There are other requirements that 
need to be ironed out about whether we'd need to support separate encryption 
keys per column/table/server/etc, whether metadata also needs to be encrypted, 
etc. 

-Todd 

On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert < [email protected] > wrote: 



Hi Franco, 

I think you are right that a client-based approach wouldn't work, because we 
wouldn't want to encrypt at the level of individual cell values. That would get 
in the way of encoding, compression, predicate evaluation, etc. As you note, 
adding encryption at the block layer is probably the way to go. Key management 
is definitely the tricky issue. We do have one advantage over HDFS - because 
Kudu does logical replication, the encryption key can be scoped to a particular 
tablet server or tablet replica, it wouldn't need to be shared among all 
replicas. I haven't done enough research to know if this makes it fundamentally 
easier to do key management. I would assume at a minimum we would want to 
integrate with key providers such an HSM. It would be good to have a thorough 
review of existing solutions in the space, such as TDE and the Hadoop KMS. Is 
this something you are interested in working on? 

- Dan 

On Tue, Apr 25, 2017 at 8:30 AM, David Alves < [email protected] > wrote: 

<blockquote>

Hi Franco 

Dan, Alexey, Todd are our security experts. 
Folks, thoughts on this? 

Best 
David 

On Mon, Apr 24, 2017 at 7:08 PM, < [email protected] > wrote: 

<blockquote>

Over the weekend I started looking at what it would take to add data encryption 
to Kudu (besides using filesystem encryption via dm-crypt or something like 
that). 

Here are a few notes - please feel free to comment on them and add suggestions: 

- reading through this mailing list, it looks like this feature has been asked 
a couple of times but last year, but from what I can tell, noone is currently 
working on it. 
- a client-based approach to encryption like the one used by HDFS wouldn't work 
(at least out of the box) because for instance encrypting the primary key at 
the client would prevent being able to have range filters for scans; it might 
work for the columns that are not part of the primary key 
- there's already code in Kudu for several compression codecs (LZ4, gzip, etc); 
I thought it would be possible to add similar code for encryption codecs (to be 
applied after the compression, of course) 
- the WAL log files and delta files should be similarly encrypted too 
- not sure what would be the best way to manage the key - I see that in HDFS 
they use a double key mechanism, where the encryption key for the data file is 
itself encrypted with the allowed user key and this whole process is managed by 
an external Key Management Service 

Thanks in advance for your ideas and suggestions, 
Franco 





</blockquote>



</blockquote>




-- 
Todd Lipcon 
Software Engineer, Cloudera 

Reply via email to