Re: Data encryption in Kudu

Franco Venturi Tue, 02 May 2017 20:39:06 -0700


Dan, 
first of all thanks for reading through my long post and providing your 
comments and advice.





You are 100% correct on the TDE column encryption in Oracle; I looked it up 
again in the 'Introduction to Transparent Data Encryption' in the 'Data 
Advanced Security Guide' 
(https://docs.oracle.com/database/121/ASOAG/asotrans.htm#ASOAG10117) and Figure 
2-1 clearly shows the keys being stored in the database. 
With this piece of information, it doesn't seem to me that Oracle column TDE 
offers much protection in case of an active attacker who has full access to the 
the DB server, since there must be a proces somewhere where the database engine 
is able to retrieve the decryption key for a given column. 


Another interesting piece of information in that chapter is this sentence: 




TDE tablespace encryption also allows index range scans on data in encrypted 
tablespaces. This is not possible with TDE column encryption. 




which makes me think that TDE column encryption must encrypt the data before 
placing it into the Btree, and therefore is not able to use the Btree for range 
searches. 





I think the main reason why an organization would want one or the other type of 
encryption (client-side vs server-side) is what kind of possible attack they 
are trying to prevent (and the criteria are often dictated by internal security 
policies): 
- with server-side encryption, the encrypted data is protected against a disk 
being lost (the so called 'encryption at rest'), but it is not protected 
against an active attacker on the server with full access (they could retrieve 
the key and then decrypt the data). 
- with client-side encryption, the server has no way to decrypt the data and 
therefore even the active attacker above wouldn't be able to do much with the 
encrypted data. As I mentioned in my previous post, this is similar to what 
HDFS does for transparent data encryption and I think it's one of their selling 
points ('not even root can decrypt the data on HDFS'), and for some IT security 
groups this may sound attractive. 





I also spent some time in the last two days looking at what MySQL/MariaDB and 
PostgreSQL do and this is what I found: 
- MySQL/MariaDB seems to only have table-level encryption 
(https://mariadb.com/kb/en/mariadb/data-at-rest-encryption/), and therefore it 
is of the server-side type 
- PostgreSQL's encryption options 
(https://www.postgresql.org/docs/9.6/static/encryption-options.html) list this 
module 'pgcrypto', that does column-level encryption, but the decryption 
happens on the server with the key being provided by the client, hence it looks 
like a hybrid between server-side and client-side. 







100% agree with your performance concerns that client-side encryption raises 
(no range scans on the encryped columns, no compression, RLE, etc), to the 
point that last night I wondered if other people have asked themselves similar 
questions, and I did find a couple of interesting approaches: 
- CryptDB (http://css.csail.mit.edu/cryptdb/ - the main paper is here: 
http://people.csail.mit.edu/nickolai/papers/raluca-cryptdb.pdf) 
- ZeroDB (https://opensource.zerodb.com/) 


in order to be able to do range scans, for instance CryptDB uses this 'Order 
Preserving Encryption', which in theory allows to encrypt data in a way that 
preservers ordering, i.e. Enc(x) < Enc(y) iff x < y; however several research 
papers after that show that this Order Preserving Encryption leaks a 
significant amount of information on the encrypted data and is susceptible to 
frequency and other kind of attacks. As you can imagine there's a lot of 
academic research actively being done in this field and, even if not ready for 
prime time, I though I would share these findings. 





After this long digression (hopefully not too boring), I agree that the way 
forward would be to start with looking into the encryption of the file store (I 
think they are called 'cfiles'; I saw also mentions to some 'delta' files, and 
I am not sure if they are written the same way and should be encrypted too), 
and after that the WALs. 


Oh, one last thing; you asked me: 




Could you elaborate on this? As long as we use an external keystore and 
intermediate keys, I don't know how an attacker with access to the on-disk 
files could decrypt them. 




The scenario I was thinking is of an attacker who has full access to the tablet 
server; he can not only read the on-disk files, but he also knows how the 
tablet server retrieves the intermediate keys from the external keystore, i.e. 
he is able to 'impersonate' the tablet server engine and request the decryption 
key from wherever it is stored. 




Franco 


----- Original Message -----

From: "Dan Burkert" <[email protected]> 
To: [email protected] 
Sent: Tuesday, May 2, 2017 2:54:26 PM 
Subject: Re: Data encryption in Kudu 

Hi Franco, 

Thanks for the writeup! I'm not an Oracle expert, but my interpretation of the 
TDE column level encryption documentation/implementation is very different than 
yours. As far as I can tell, in both the per-column and table-space encryption 
modes, encryption/decryption is handled entirely on the Oracle server. The 
difference is that column-level encryption will encrypt individual cells on 
disk (leaving the overall tree/index structure unencrypted), while table-space 
level encryption will encrypt at the block or file level. 

I agree with everything you wrote about the tradoffs involved with client vs 
server encryption, but I think you are underestimating both the complexity 
involved with client-side encryption, as well as the performance hit that it 
would impose. The loss on encoding, compression, and range predicate pushdown 
would absolutely kill performance for many important usecases. The 
implementation would also be significantly _more_ difficult than server side 
encryption, because the client would need to manage the encryption keys, 
encrypt/decrypt data, and the solution would need to be implemented for every 
client library (of which there are currently two). 

For those reasons, I think server side encryption is the way to go with Kudu. I 
think you're right that it would slot in as an additional step in the encode -> 
compress -> encrypt pipeline for blocks. Because blocks are relatively large 
(typically > 1 MiB), the overhead of a 16 byte salt and additional MAC are 
negligible, so we wouldn't need to force the user to make that tradeoff. 
Basically, we could get all of the advantages that Oracle's tablespace level 
encryption provides, but on a per-column basis. There are a couple of 
additional complications - we also have a WAL that lives outside of our file 
block abstraction, and we would almost certainly need to provide encryption for 
that as well (but perhaps it could be a second step in the process). 

In-line responses to some other comments below. 

On Sat, Apr 29, 2017 at 8:35 PM, Franco Venturi < [email protected] > wrote: 








- also from the security point of view, since the encryption happens at the 
client side, the data that is transfered on the network between the client and 
the server is already encrypted and there's no need (at least from this point 
of view) to add a layer of encryption between client and server 




I'm skeptical of this. For instances, every scan request includes the names and 
types of the columns that the client wishes to scan, and that would be in 
plaintext without wire encryption. That would be an issue for some usecases. 

<blockquote>



- from the security point of view, an attacker with full access to the server 
would probably be able to decrypt the encrypted data 

</blockquote>


Could you elaborate on this? As long as we use an external keystore and 
intermediate keys, I don't know how an attacker with access to the on-disk 
files could decrypt them. 

<blockquote>



- also from a security point of view the server returns the data back in 
plaintext format; if the data transferred over the network contains sensitive 
information, it would need an extra encryption layer like TLS or something like 
that 

</blockquote>


Correct, and Kudu 1.3 includes TLS wire encryption for exactly this reason. 

<blockquote>



- as per performance implications, if the encryption on the server side uses 
something like AES192 or AES256, there are libraries like libcrypto that take 
advantage of the hardware acceleration for AES encryption on many modern CPUs 
and therefore I suspect the performance overhead would be limited; this is also 
indicated by what the Oracle documentation says regarding processing overhead 
in the case of tablespace encryption in TDE 

</blockquote>


I agree, I think the overhead of per-block encryption would be pretty minimal. 

<blockquote>



- it would also require a way to have the server manage these column encryption 
keys (possibly though additional client API's); I haven't looked yet at the way 
Oracle handles encryption/decryption keys for the tablespace encryption TDE, 
but it's on my 'to-do' list 

</blockquote>


Yah, the normal thing to do here is call out to an external keystore that holds 
a master encryption key. 
- Dan 


<blockquote>


From: [email protected] 
To: [email protected] 
Sent: Wednesday, April 26, 2017 9:48:07 PM 

Subject: Re: Data encryption in Kudu 

David, Dan, Todd, 
thanks for your prompt replies. 

At this stage I am just exploring what it would take to implement some sort of 
data encryption in Kudu. 

After reading your comments here are some further thoughts: 

- according to the first sentence in this paragraph in the Kudu docs ( 
https://kudu.apache.org/docs/schema_design.html#compression ): 

Kudu allows per-column compression using the LZ4 , Snappy , or zlib compression 
codecs. 

it should be possible to perform per-column encryption by adding 'encryption 
codecs' right after the compression codecs. I browsed through the code quickly 
and I think this done when reading/writing a 'cfile' (please correct me if I am 
wrong). If this is correct, this change could be 'minimally invasive' (at least 
for the 'cfile' part) and would not require a major overhaul of the Kudu 
architecture. 

- as per the key management aspect, I am not a security expert at all, so I am 
not sure what would be the best approach here - my thought here is that in most 
places Kudu is deployed together with HDFS, so it would be 'desirable' if the 
key management were consistent between the two services; on the other hand, I 
also realize that the basic premises are fundamentally different: HDFS encrypts 
everything at the client level and therefore the HDFS engine itself is almost 
completely unaware that the data it stores is actually encrypted (except for a 
special file hidden attribute, if I understand correctly), while in Kudu the 
storage engine must have both the 'public' key (when encrypting) and the 
'private' key (when decrypting) otherwise it can't take advantage of knowing 
the 'structure' of the data (for instance the Bloom filters wouldn't probably 
work with the key being encrypted). This means for instance that an attacker 
who is able to gain access to the Kudu tablet servers would probably be able to 
decrypt the data. Also one way to achieve something similar to what HDFS does 
(i.e. client-based encryption and data encrypted in-flight) could be perhaps 
using a one-time client certificate generated by the KMS server, but this would 
also require changes to the client code. 

Franco 



From: "Todd Lipcon" < [email protected] > 
To: [email protected] 
Sent: Tuesday, April 25, 2017 3:49:50 PM 
Subject: Re: Data encryption in Kudu 

Agreed with what Dan said. 

I think there are a number of interesting design alternatives to be considered, 
so before coding it would be great to work through a design document to explore 
the alternatives. For example, we could try to apply encryption at the 'fs/' 
layer, which would cover all non-WAL data, but then we would lose the ability 
to specify encryption on a per-column basis. There are other requirements that 
need to be ironed out about whether we'd need to support separate encryption 
keys per column/table/server/etc, whether metadata also needs to be encrypted, 
etc. 

-Todd 

On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert < [email protected] > wrote: 

<blockquote>

Hi Franco, 

I think you are right that a client-based approach wouldn't work, because we 
wouldn't want to encrypt at the level of individual cell values. That would get 
in the way of encoding, compression, predicate evaluation, etc. As you note, 
adding encryption at the block layer is probably the way to go. Key management 
is definitely the tricky issue. We do have one advantage over HDFS - because 
Kudu does logical replication, the encryption key can be scoped to a particular 
tablet server or tablet replica, it wouldn't need to be shared among all 
replicas. I haven't done enough research to know if this makes it fundamentally 
easier to do key management. I would assume at a minimum we would want to 
integrate with key providers such an HSM. It would be good to have a thorough 
review of existing solutions in the space, such as TDE and the Hadoop KMS. Is 
this something you are interested in working on? 

- Dan 

On Tue, Apr 25, 2017 at 8:30 AM, David Alves < [email protected] > wrote: 

<blockquote>

Hi Franco 

Dan, Alexey, Todd are our security experts. 
Folks, thoughts on this? 

Best 
David 

On Mon, Apr 24, 2017 at 7:08 PM, < [email protected] > wrote: 

<blockquote>

Over the weekend I started looking at what it would take to add data encryption 
to Kudu (besides using filesystem encryption via dm-crypt or something like 
that). 

Here are a few notes - please feel free to comment on them and add suggestions: 

- reading through this mailing list, it looks like this feature has been asked 
a couple of times but last year, but from what I can tell, noone is currently 
working on it. 
- a client-based approach to encryption like the one used by HDFS wouldn't work 
(at least out of the box) because for instance encrypting the primary key at 
the client would prevent being able to have range filters for scans; it might 
work for the columns that are not part of the primary key 
- there's already code in Kudu for several compression codecs (LZ4, gzip, etc); 
I thought it would be possible to add similar code for encryption codecs (to be 
applied after the compression, of course) 
- the WAL log files and delta files should be similarly encrypted too 
- not sure what would be the best way to manage the key - I see that in HDFS 
they use a double key mechanism, where the encryption key for the data file is 
itself encrypted with the allowed user key and this whole process is managed by 
an external Key Management Service 

Thanks in advance for your ideas and suggestions, 
Franco 

</blockquote>



</blockquote>



</blockquote>




-- 
Todd Lipcon 
Software Engineer, Cloudera 



</blockquote>

Re: Data encryption in Kudu

Reply via email to