On Tue, May 2, 2017 at 8:38 PM, Franco Venturi <[email protected]> wrote:
> Dan, > first of all thanks for reading through my long post and providing your > comments and advice. > > > You are 100% correct on the TDE column encryption in Oracle; I looked it > up again in the 'Introduction to Transparent Data Encryption' in the 'Data > Advanced Security Guide' (https://docs.oracle.com/ > database/121/ASOAG/asotrans.htm#ASOAG10117) and Figure 2-1 clearly shows > the keys being stored in the database. > With this piece of information, it doesn't seem to me that Oracle column > TDE offers much protection in case of an active attacker who has full > access to the the DB server, since there must be a proces somewhere where > the database engine is able to retrieve the decryption key for a given > column. > Yes, but this could be in a hardware HSM. > Another interesting piece of information in that chapter is this > sentence: > > TDE tablespace encryption also allows index range scans on > data in encrypted tablespaces. This is not possible with TDE column > encryption. > > > which makes me think that TDE column encryption must encrypt the data > before placing it into the Btree, and therefore is not able to use the > Btree for range searches. > That's my interpretation as well. > I think the main reason why an organization would want one or the other > type of encryption (client-side vs server-side) is what kind of possible > attack they are trying to prevent (and the criteria are often dictated by > internal security policies): > - with server-side encryption, the encrypted data is protected > against a disk being lost (the so called 'encryption at rest'), but it is > not protected against an active attacker on the server with full access > (they could retrieve the key and then decrypt the data). > - with client-side encryption, the server has no way to decrypt > the data and therefore even the active attacker above wouldn't be able to > do much with the encrypted data. As I mentioned in my previous post, this > is similar to what HDFS does for transparent data encryption and I think > it's one of their selling points ('not even root can decrypt the data on > HDFS'), and for some IT security groups this may sound attractive. > Root privileges on a machine doesn't necessary guarantee access to the key; the key could be stored remotely, or even on an HSM. > 100% agree with your performance concerns that client-side encryption > raises (no range scans on the encryped columns, no compression, RLE, etc), > to the point that last night I wondered if other people have asked > themselves similar questions, and I did find a couple of interesting > approaches: > - CryptDB (http://css.csail.mit.edu/cryptdb/ - the main paper is > here: http://people.csail.mit.edu/nickolai/papers/raluca-cryptdb.pdf) > - ZeroDB (https://opensource.zerodb.com/) > > in order to be able to do range scans, for instance CryptDB uses this > 'Order Preserving Encryption', which in theory allows to encrypt data in a > way that preservers ordering, i.e. Enc(x) < Enc(y) iff x < y; however > several research papers after that show that this Order Preserving > Encryption leaks a significant amount of information on the encrypted data > and is susceptible to frequency and other kind of attacks. As you can > imagine there's a lot of academic research actively being done in this > field and, even if not ready for prime time, I though I would share these > findings. > That's really interesting. Pretty different threat model being assumed by ZeroDB :). > After this long digression (hopefully not too boring), I agree that the > way forward would be to start with looking into the encryption of the file > store (I think they are called 'cfiles'; I saw also mentions to some > 'delta' files, and I am not sure if they are written the same way and > should be encrypted too), and after that the WALs. > Yah, I think cfiles are a good place to start. AFAIK delta files reuse the cfile machinery when writing to disk. I originally considered recommending looking at the filesystem block manager, but we often do offset lookups into the FS blocks, which I don't think could be supported with encryption. - Dan ------------------------------ > *From: *"Dan Burkert" <[email protected]> > *To: *[email protected] > *Sent: *Tuesday, May 2, 2017 2:54:26 PM > > *Subject: *Re: Data encryption in Kudu > > Hi Franco, > > Thanks for the writeup! I'm not an Oracle expert, but my interpretation > of the TDE column level encryption documentation/implementation is very > different than yours. As far as I can tell, in both the per-column and > table-space encryption modes, encryption/decryption is handled entirely on > the Oracle server. The difference is that column-level encryption will > encrypt individual cells on disk (leaving the overall tree/index structure > unencrypted), while table-space level encryption will encrypt at the block > or file level. > > I agree with everything you wrote about the tradoffs involved with client > vs server encryption, but I think you are underestimating both the > complexity involved with client-side encryption, as well as the performance > hit that it would impose. The loss on encoding, compression, and range > predicate pushdown would absolutely kill performance for many important > usecases. The implementation would also be significantly _more_ difficult > than server side encryption, because the client would need to manage the > encryption keys, encrypt/decrypt data, and the solution would need to be > implemented for every client library (of which there are currently two). > > For those reasons, I think server side encryption is the way to go with > Kudu. I think you're right that it would slot in as an additional step in > the encode -> compress -> encrypt pipeline for blocks. Because blocks are > relatively large (typically > 1 MiB), the overhead of a 16 byte salt and > additional MAC are negligible, so we wouldn't need to force the user to > make that tradeoff. Basically, we could get all of the advantages that > Oracle's tablespace level encryption provides, but on a per-column basis. > There are a couple of additional complications - we also have a WAL that > lives outside of our file block abstraction, and we would almost certainly > need to provide encryption for that as well (but perhaps it could be a > second step in the process). > > In-line responses to some other comments below. > > On Sat, Apr 29, 2017 at 8:35 PM, Franco Venturi <[email protected]> > wrote: > >> >> - also from the security point of view, since the encryption happens at >> the client side, the data that is transfered on the network between the >> client and the server is already encrypted and there's no need (at least >> from this point of view) to add a layer of encryption between client and >> server >> > > I'm skeptical of this. For instances, every scan request includes the > names and types of the columns that the client wishes to scan, and that > would be in plaintext without wire encryption. That would be an issue for > some usecases. > > >> - from the security point of view, an attacker with full access to the >> server would probably be able to decrypt the encrypted data >> > > Could you elaborate on this? As long as we use an external keystore and > intermediate keys, I don't know how an attacker with access to the on-disk > files could decrypt them. > > >> - also from a security point of view the server returns the data back in >> plaintext format; if the data transferred over the network contains >> sensitive information, it would need an extra encryption layer like TLS or >> something like that >> > > Correct, and Kudu 1.3 includes TLS wire encryption for exactly this reason. > > >> - as per performance implications, if the encryption on the server side >> uses something like AES192 or AES256, there are libraries like libcrypto >> that take advantage of the hardware acceleration for AES encryption on many >> modern CPUs and therefore I suspect the performance overhead would be >> limited; this is also indicated by what the Oracle documentation says >> regarding processing overhead in the case of tablespace encryption in TDE >> > > I agree, I think the overhead of per-block encryption would be pretty > minimal. > > >> - it would also require a way to have the server manage these column >> encryption keys (possibly though additional client API's); I haven't looked >> yet at the way Oracle handles encryption/decryption keys for the tablespace >> encryption TDE, but it's on my 'to-do' list >> > > Yah, the normal thing to do here is call out to an external keystore that > holds a master encryption key. > > - Dan > > ------------------------------ >> *From: *[email protected] >> *To: *[email protected] >> *Sent: *Wednesday, April 26, 2017 9:48:07 PM >> >> *Subject: *Re: Data encryption in Kudu >> >> David, Dan, Todd, >> thanks for your prompt replies. >> >> At this stage I am just exploring what it would take to implement some >> sort of data encryption in Kudu. >> >> After reading your comments here are some further thoughts: >> >> - according to the first sentence in this paragraph in the Kudu docs ( >> https://kudu.apache.org/docs/schema_design.html#compression): >> >> Kudu allows per-column compression using the LZ4, Snappy, or >> zlib compression codecs. >> >> it should be possible to perform per-column encryption by adding >> 'encryption codecs' right after the compression codecs. I browsed through >> the code quickly and I think this done when reading/writing a 'cfile' >> (please correct me if I am wrong). If this is correct, this change could be >> 'minimally invasive' (at least for the 'cfile' part) and would not require >> a major overhaul of the Kudu architecture. >> >> - as per the key management aspect, I am not a security expert at all, so >> I am not sure what would be the best approach here - my thought here is >> that in most places Kudu is deployed together with HDFS, so it would be >> 'desirable' if the key management were consistent between the two services; >> on the other hand, I also realize that the basic premises are fundamentally >> different: HDFS encrypts everything at the client level and therefore the >> HDFS engine itself is almost completely unaware that the data it stores is >> actually encrypted (except for a special file hidden attribute, if I >> understand correctly), while in Kudu the storage engine must have both the >> 'public' key (when encrypting) and the 'private' key (when decrypting) >> otherwise it can't take advantage of knowing the 'structure' of the data >> (for instance the Bloom filters wouldn't probably work with the key being >> encrypted). This means for instance that an attacker who is able to gain >> access to the Kudu tablet servers would probably be able to decrypt the >> data. Also one way to achieve something similar to what HDFS does (i.e. >> client-based encryption and data encrypted in-flight) could be perhaps >> using a one-time client certificate generated by the KMS server, but this >> would also require changes to the client code. >> >> Franco >> >> >> ------------------------------ >> *From: *"Todd Lipcon" <[email protected]> >> *To: *[email protected] >> *Sent: *Tuesday, April 25, 2017 3:49:50 PM >> *Subject: *Re: Data encryption in Kudu >> >> Agreed with what Dan said. >> >> I think there are a number of interesting design alternatives to be >> considered, so before coding it would be great to work through a design >> document to explore the alternatives. For example, we could try to apply >> encryption at the 'fs/' layer, which would cover all non-WAL data, but then >> we would lose the ability to specify encryption on a per-column basis. >> There are other requirements that need to be ironed out about whether we'd >> need to support separate encryption keys per column/table/server/etc, >> whether metadata also needs to be encrypted, etc. >> >> -Todd >> >> On Tue, Apr 25, 2017 at 10:38 AM, Dan Burkert <[email protected]> >> wrote: >> >>> Hi Franco, >>> >>> I think you are right that a client-based approach wouldn't work, >>> because we wouldn't want to encrypt at the level of individual cell >>> values. That would get in the way of encoding, compression, predicate >>> evaluation, etc. As you note, adding encryption at the block layer is >>> probably the way to go. Key management is definitely the tricky issue. We >>> do have one advantage over HDFS - because Kudu does logical replication, >>> the encryption key can be scoped to a particular tablet server or tablet >>> replica, it wouldn't need to be shared among all replicas. I haven't done >>> enough research to know if this makes it fundamentally easier to do key >>> management. I would assume at a minimum we would want to integrate with >>> key providers such an HSM. It would be good to have a thorough review of >>> existing solutions in the space, such as TDE >>> <https://en.wikipedia.org/wiki/Transparent_Data_Encryption> and the >>> Hadoop KMS. Is this something you are interested in working on? >>> >>> - Dan >>> >>> On Tue, Apr 25, 2017 at 8:30 AM, David Alves <[email protected]> >>> wrote: >>> >>>> Hi Franco >>>> >>>> Dan, Alexey, Todd are our security experts. >>>> Folks, thoughts on this? >>>> >>>> Best >>>> David >>>> >>>> On Mon, Apr 24, 2017 at 7:08 PM, <[email protected]> wrote: >>>> >>>>> Over the weekend I started looking at what it would take to add data >>>>> encryption to Kudu (besides using filesystem encryption via dm-crypt or >>>>> something like that). >>>>> >>>>> Here are a few notes - please feel free to comment on them and add >>>>> suggestions: >>>>> >>>>> - reading through this mailing list, it looks like this feature has >>>>> been asked a couple of times but last year, but from what I can tell, >>>>> noone >>>>> is currently working on it. >>>>> - a client-based approach to encryption like the one used by HDFS >>>>> wouldn't work (at least out of the box) because for instance encrypting >>>>> the >>>>> primary key at the client would prevent being able to have range filters >>>>> for scans; it might work for the columns that are not part of the primary >>>>> key >>>>> - there's already code in Kudu for several compression codecs (LZ4, >>>>> gzip, etc); I thought it would be possible to add similar code for >>>>> encryption codecs (to be applied after the compression, of course) >>>>> - the WAL log files and delta files should be similarly encrypted too >>>>> - not sure what would be the best way to manage the key - I see that >>>>> in HDFS they use a double key mechanism, where the encryption key for the >>>>> data file is itself encrypted with the allowed user key and this whole >>>>> process is managed by an external Key Management Service >>>>> >>>>> Thanks in advance for your ideas and suggestions, >>>>> Franco >>>>> >>>> >>>> >>> >> >> >> -- >> Todd Lipcon >> Software Engineer, Cloudera >> >> >> > >
