On Fri, May 5, 2017 at 4:54 PM, Dan Burkert <[email protected]> wrote:
> > > On Tue, May 2, 2017 at 8:38 PM, Franco Venturi <[email protected]> > wrote: > >> Dan, >> first of all thanks for reading through my long post and providing your >> comments and advice. >> >> >> You are 100% correct on the TDE column encryption in Oracle; I looked it >> up again in the 'Introduction to Transparent Data Encryption' in the 'Data >> Advanced Security Guide' (https://docs.oracle.com/datab >> ase/121/ASOAG/asotrans.htm#ASOAG10117) and Figure 2-1 clearly shows the >> keys being stored in the database. >> With this piece of information, it doesn't seem to me that Oracle column >> TDE offers much protection in case of an active attacker who has full >> access to the the DB server, since there must be a proces somewhere where >> the database engine is able to retrieve the decryption key for a given >> column. >> > > Yes, but this could be in a hardware HSM. > > >> Another interesting piece of information in that chapter is this >> sentence: >> > >> TDE tablespace encryption also allows index range scans >> on data in encrypted tablespaces. This is not possible with TDE column >> encryption. >> >> >> which makes me think that TDE column encryption must encrypt the data >> before placing it into the Btree, and therefore is not able to use the >> Btree for range searches. >> > > That's my interpretation as well. > > >> I think the main reason why an organization would want one or the other >> type of encryption (client-side vs server-side) is what kind of possible >> attack they are trying to prevent (and the criteria are often dictated by >> internal security policies): >> - with server-side encryption, the encrypted data is protected >> against a disk being lost (the so called 'encryption at rest'), but it is >> not protected against an active attacker on the server with full access >> (they could retrieve the key and then decrypt the data). >> - with client-side encryption, the server has no way to decrypt >> the data and therefore even the active attacker above wouldn't be able to >> do much with the encrypted data. As I mentioned in my previous post, this >> is similar to what HDFS does for transparent data encryption and I think >> it's one of their selling points ('not even root can decrypt the data on >> HDFS'), and for some IT security groups this may sound attractive. >> > > Root privileges on a machine doesn't necessary guarantee access to the > key; the key could be stored remotely, or even on an HSM. > > >> 100% agree with your performance concerns that client-side encryption >> raises (no range scans on the encryped columns, no compression, RLE, etc), >> to the point that last night I wondered if other people have asked >> themselves similar questions, and I did find a couple of interesting >> approaches: >> - CryptDB (http://css.csail.mit.edu/cryptdb/ - the main paper is >> here: http://people.csail.mit.edu/nickolai/papers/raluca-cryptdb.pdf) >> - ZeroDB (https://opensource.zerodb.com/) >> >> in order to be able to do range scans, for instance CryptDB uses this >> 'Order Preserving Encryption', which in theory allows to encrypt data in a >> way that preservers ordering, i.e. Enc(x) < Enc(y) iff x < y; however >> several research papers after that show that this Order Preserving >> Encryption leaks a significant amount of information on the encrypted data >> and is susceptible to frequency and other kind of attacks. As you can >> imagine there's a lot of academic research actively being done in this >> field and, even if not ready for prime time, I though I would share these >> findings. >> > > That's really interesting. Pretty different threat model being assumed by > ZeroDB :). > > >> After this long digression (hopefully not too boring), I agree that the >> way forward would be to start with looking into the encryption of the file >> store (I think they are called 'cfiles'; I saw also mentions to some >> 'delta' files, and I am not sure if they are written the same way and >> should be encrypted too), and after that the WALs. >> > > Yah, I think cfiles are a good place to start. AFAIK delta files reuse > the cfile machinery when writing to disk. I originally considered > recommending looking at the filesystem block manager, but we often do > offset lookups into the FS blocks, which I don't think could be supported > with encryption. > I think it could be -- if you use CTR mode for encryption, you can support random access, right? However, I do think it makes sense to consider column-level encryption keys/policies in which case it may be easier to do at a higher level. Though, it may be possible for the higher level to just pass down a key ID into the FS layer when writing a file, so that the policy can be set at a high level while the implementation is done at a lower one. > > - Dan > > ------------------------------ >> *From: *"Dan Burkert" <[email protected]> >> *To: *[email protected] >> *Sent: *Tuesday, May 2, 2017 2:54:26 PM >> >> *Subject: *Re: Data encryption in Kudu >> >> Hi Franco, >> >> Thanks for the writeup! I'm not an Oracle expert, but my interpretation >> of the TDE column level encryption documentation/implementation is very >> different than yours. As far as I can tell, in both the per-column and >> table-space encryption modes, encryption/decryption is handled entirely on >> the Oracle server. The difference is that column-level encryption will >> encrypt individual cells on disk (leaving the overall tree/index structure >> unencrypted), while table-space level encryption will encrypt at the block >> or file level. >> >> I agree with everything you wrote about the tradoffs involved with client >> vs server encryption, but I think you are underestimating both the >> complexity involved with client-side encryption, as well as the performance >> hit that it would impose. The loss on encoding, compression, and range >> predicate pushdown would absolutely kill performance for many important >> usecases. The implementation would also be significantly _more_ difficult >> than server side encryption, because the client would need to manage the >> encryption keys, encrypt/decrypt data, and the solution would need to be >> implemented for every client library (of which there are currently two). >> >> For those reasons, I think server side encryption is the way to go with >> Kudu. I think you're right that it would slot in as an additional step in >> the encode -> compress -> encrypt pipeline for blocks. Because blocks are >> relatively large (typically > 1 MiB), the overhead of a 16 byte salt and >> additional MAC are negligible, so we wouldn't need to force the user to >> make that tradeoff. Basically, we could get all of the advantages that >> Oracle's tablespace level encryption provides, but on a per-column basis. >> There are a couple of additional complications - we also have a WAL that >> lives outside of our file block abstraction, and we would almost certainly >> need to provide encryption for that as well (but perhaps it could be a >> second step in the process). >> >> In-line responses to some other comments below. >> >> On Sat, Apr 29, 2017 at 8:35 PM, Franco Venturi <[email protected]> >> wrote: >> >>> >>> - also from the security point of view, since the encryption happens at >>> the client side, the data that is transfered on the network between the >>> client and the server is already encrypted and there's no need (at least >>> from this point of view) to add a layer of encryption between client and >>> server >>> >> >> I'm skeptical of this. For instances, every scan request includes the >> names and types of the columns that the client wishes to scan, and that >> would be in plaintext without wire encryption. That would be an issue for >> some usecases. >> >> >>> - from the security point of view, an attacker with full access to the >>> server would probably be able to decrypt the encrypted data >>> >> >> Could you elaborate on this? As long as we use an external keystore and >> intermediate keys, I don't know how an attacker with access to the on-disk >> files could decrypt them. >> >> >>> - also from a security point of view the server returns the data back in >>> plaintext format; if the data transferred over the network contains >>> sensitive information, it would need an extra encryption layer like TLS or >>> something like that >>> >> >> Correct, and Kudu 1.3 includes TLS wire encryption for exactly this >> reason. >> >> >>> - as per performance implications, if the encryption on the server side >>> uses something like AES192 or AES256, there are libraries like libcrypto >>> that take advantage of the hardware acceleration for AES encryption on many >>> modern CPUs and therefore I suspect the performance overhead would be >>> limited; this is also indicated by what the Oracle documentation says >>> regarding processing overhead in the case of tablespace encryption in TDE >>> >> >> I agree, I think the overhead of per-block encryption would be pretty >> minimal. >> >> >>> - it would also require a way to have the server manage these column >>> encryption keys (possibly though additional client API's); I haven't looked >>> yet at the way Oracle handles encryption/decryption keys for the tablespace >>> encryption TDE, but it's on my 'to-do' list >>> >> >> Yah, the normal thing to do here is call out to an external keystore that >> holds a master encryption key. >> >> - Dan >> >> ------------------------------ >>> *From: *[email protected] >>> *To: *[email protected] >>> *Sent: *Wednesday, April 26, 2017 9:48:07 PM >>> >>> *Subject: *Re: Data encryption in Kudu >>> >>> David, Dan, Todd, >>> thanks for your prompt replies. >>> >>> At this stage I am just exploring what it would take to implement some >>> sort of data encryption in Kudu. >>> >>> After reading your comments here are some further thoughts: >>> >>> - according to the first sentence in this paragraph in the Kudu docs ( >>> https://kudu.apache.org/docs/schema_design.html#compression): >>> >>> Kudu allows per-column compression using the LZ4, Snappy, or >>> zlib compression codecs. >>> >>> it should be possible to perform per-column encryption by adding >>> 'encryption codecs' right after the compression codecs. I browsed through >>> the code quickly and I think this done when reading/writing a 'cfile' >>> (please correct me if I am wrong). If this is correct, this change could be >>> 'minimally invasive' (at least for the 'cfile' part) and would not require >>> a major overhaul of the Kudu architecture. >>> >>> - as per the key management aspect, I am not a security expert at all, >>> so I am not sure what would be the best approach here - my thought here is >>> that in most places Kudu is deployed together with HDFS, so it would be >>> 'desirable' if the key management were consistent between the two services; >>> on the other hand, I also realize that the basic premises are fundamentally >>> different: HDFS encrypts everything at the client level and therefore the >>> HDFS engine itself is almost completely unaware that the data it stores is >>> actually encrypted (except for a special file hidden attribute, if I >>> understand correctly), while in Kudu the storage engine must have both the >>> 'public' key (when encrypting) and the 'private' key (when decrypting) >>> otherwise it can't take advantage of knowing the 'structure' of the data >>> (for instance the Bloom filters wouldn't probably work with the key being >>> encrypted). This means for instance that an attacker who is able to gain >>> access to the Kudu tablet servers would probably be able to decrypt the >>> data. Also one way to achieve something similar to what HDFS does (i.e. >>> client-based encryption and data encrypted in-flight) could be perhaps >>> using a one-time client certificate generated by the KMS server, but this >>> would also require changes to the client code. >>> >>> Franco >>> >>> >>> ------------------------------ >>> *From: *"Todd Lipcon" <[email protected]> >>> *To: *[email protected] >>> *Sent: *Tuesday, April 25, 2017 3:49:50 PM >>> *Subject: *Re: Data encryption in Kudu >>> >>> Agreed with what Dan said. >>> >>> I think there are a number of interesting design alternatives to be >>> considered, so before coding it would be great to work through a design >>> document to explore the alternatives. For example, we could try to apply >>> encryption at the 'fs/' layer, which would cover all non-WAL data, but then >>> we would lose the ability to specify encryption on a per-column basis. >>> There are other requirements that need to be ironed out about whether we'd >>> need to support separate encryption keys per column/table/server/etc, >>> whether metadata also needs to be encrypted, etc. >>> >>> -Todd >>> >>> On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert <[email protected]> >>> wrote: >>> >>>> Hi Franco, >>>> >>>> I think you are right that a client-based approach wouldn't work, >>>> because we wouldn't want to encrypt at the level of individual cell >>>> values. That would get in the way of encoding, compression, predicate >>>> evaluation, etc. As you note, adding encryption at the block layer is >>>> probably the way to go. Key management is definitely the tricky issue. We >>>> do have one advantage over HDFS - because Kudu does logical replication, >>>> the encryption key can be scoped to a particular tablet server or tablet >>>> replica, it wouldn't need to be shared among all replicas. I haven't done >>>> enough research to know if this makes it fundamentally easier to do key >>>> management. I would assume at a minimum we would want to integrate with >>>> key providers such an HSM. It would be good to have a thorough review of >>>> existing solutions in the space, such as TDE >>>> <https://en.wikipedia.org/wiki/Transparent_Data_Encryption> and the >>>> Hadoop KMS. Is this something you are interested in working on? >>>> >>>> - Dan >>>> >>>> On Tue, Apr 25, 2017 at 8:30 AM, David Alves <[email protected]> >>>> wrote: >>>> >>>>> Hi Franco >>>>> >>>>> Dan, Alexey, Todd are our security experts. >>>>> Folks, thoughts on this? >>>>> >>>>> Best >>>>> David >>>>> >>>>> On Mon, Apr 24, 2017 at 7:08 PM, <[email protected]> wrote: >>>>> >>>>>> Over the weekend I started looking at what it would take to add data >>>>>> encryption to Kudu (besides using filesystem encryption via dm-crypt or >>>>>> something like that). >>>>>> >>>>>> Here are a few notes - please feel free to comment on them and add >>>>>> suggestions: >>>>>> >>>>>> - reading through this mailing list, it looks like this feature has >>>>>> been asked a couple of times but last year, but from what I can tell, >>>>>> noone >>>>>> is currently working on it. >>>>>> - a client-based approach to encryption like the one used by HDFS >>>>>> wouldn't work (at least out of the box) because for instance encrypting >>>>>> the >>>>>> primary key at the client would prevent being able to have range filters >>>>>> for scans; it might work for the columns that are not part of the primary >>>>>> key >>>>>> - there's already code in Kudu for several compression codecs (LZ4, >>>>>> gzip, etc); I thought it would be possible to add similar code for >>>>>> encryption codecs (to be applied after the compression, of course) >>>>>> - the WAL log files and delta files should be similarly encrypted too >>>>>> - not sure what would be the best way to manage the key - I see that >>>>>> in HDFS they use a double key mechanism, where the encryption key for the >>>>>> data file is itself encrypted with the allowed user key and this whole >>>>>> process is managed by an external Key Management Service >>>>>> >>>>>> Thanks in advance for your ideas and suggestions, >>>>>> Franco >>>>>> >>>>> >>>>> >>>> >>> >>> >>> -- >>> Todd Lipcon >>> Software Engineer, Cloudera >>> >>> >>> >> >> > -- Todd Lipcon Software Engineer, Cloudera
