Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
continue of previous mail... The same method rethrows an exception which will lead to failure of an metrics exporter. The method should return some numeric value which indicates the failure. On Wed, Oct 28, 2020 at 3:09 PM Andrey Gura wrote: > > Hi there, > > I accidentally stumbled upon a potential performance problem in this commit. > > CacheGroupMetricImpls.getPagesLeftForReencryption method contains at > least two problems: > > - Relatively major: In order to calculate a value for one metric the > method has O(N) complexity (N is number of partitions). It isn't good. > Better approach is using some precalculated or estimated value during > re-encryption process and just return this value. > - Major: For each partition in this method PageStore.exists() will be > called. This invocation leads to N calls to the file system (may be > cached, may be not, we can't just hope). So with a default affinity > configuration this method will touch the file system 1024 times per > one metrics value calculation. Just increase dramatism and multiply > 1024 on the number of cache groups existing on a node. > > Finally, we have auxiliary functionality (metrics) which could affect > the whole node (and potentially cluster) behavior. > > Please, fix this problem and be more careful in the future. > > On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin wrote: > > > > Hello folks, > > > > thanks to everyone who joined the review, greatly appreciate your > > helpful comments. > > > > If there is no objection, we will merge this patch [1] shortly. > > > > [1] https://github.com/apache/ignite/pull/7941 > > > > пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev : > > > > > > Hi, > > > > > > I'm going to do it. > > > > > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov : > > > > > > > Hello guys, > > > > > > > > I've finished the review and approved the patch. > > > > Anybody else would like to review it? > > > > > > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin : > > > > > > > > > Hello, Maksim! > > > > > > > > > > I am currently working on a review notes from Alexey Plekhanov, will > > > > > let you know when I finish. > > > > > > > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev < > > > > maksim.stepac...@gmail.com > > > > > >: > > > > > > > > > > > > Hi, Pavel. > > > > > > > > > > > > As I see, the ticket [ > > > > https://issues.apache.org/jira/browse/IGNITE-12843 > > > > > ] > > > > > > is "PATCH AVAILABLE". Is this ticket finished? > > > > > > > > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > > > > > > > > > > > > > Hello all. > > > > > > > > > > > > > > I'm working on TDE cache group key rotation [1] and I have a > > > > > > > couple > > > > of > > > > > > > questions about partition re-encryption. > > > > > > > > > > > > > > As described in the wiki [2], the process of re-encryption at the > > > > > > > moment consists of sequentially marking memory pages as dirty, > > > > > > > this > > > > > > > process looks not resource-intensive. > > > > > > > Do you think it is necessary to do this in a multithreaded mode or > > > > > > > single thread is enough? > > > > > > > (We started testing re-encryption on dedicated servers (Xeon > > > > > > > E5-2680 > > > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > > > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a > > > > > > > result, > > > > > > > single-threaded encryption loaded disk within 30%. At the same > > > > > > > time, > > > > > > > the total re-encryption speed was around 60 MB/s, which allows one > > > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems > > > > > > > that > > > > > > > this performance is enough.) > > > > > > > > > > > > > > The second question is about the approach to storing the > > > > re-encryption > > > > > > > status. > > > > > > > At the moment, the re-encryption status includes two parameters - > > > > > > > the > > > > > > > total number of pages in the partition at the time of the start of > > > > > > > re-encryption (int) and the index of the last re-encrypted page > > > > (int). > > > > > > > These 8 bytes are stored in the metapage on the checkpoint (which > > > > > > > ensures that if the checkpoint does not happen, we will continue > > > > > > > the > > > > > > > process from the last page written to disk). > > > > > > > However, if multithread partition scanning does not make sense, > > > > > > > then > > > > > > > it seems that it is possible to change the implementation and > > > > > > > don't > > > > > > > change the metapage structure. Store only the "pointer" of the > > > > > > > partition (and the cache group) in the metastore and scan in > > > > > > > strict > > > > > > > order. > > > > > > > The approach with storing the status in the metapage of the > > > > > > > partition > > > > > > > seems to me more flexible, stable and has a number of advantages > > > > > > > over > > > > > > > the "pointer" approach: > > > > > > > 1. Since we saving the
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Andrey, thanks for your comment. I will fix this problem shortly. ср, 28 окт. 2020 г. в 15:10, Andrey Gura : > > Hi there, > > I accidentally stumbled upon a potential performance problem in this commit. > > CacheGroupMetricImpls.getPagesLeftForReencryption method contains at > least two problems: > > - Relatively major: In order to calculate a value for one metric the > method has O(N) complexity (N is number of partitions). It isn't good. > Better approach is using some precalculated or estimated value during > re-encryption process and just return this value. > - Major: For each partition in this method PageStore.exists() will be > called. This invocation leads to N calls to the file system (may be > cached, may be not, we can't just hope). So with a default affinity > configuration this method will touch the file system 1024 times per > one metrics value calculation. Just increase dramatism and multiply > 1024 on the number of cache groups existing on a node. > > Finally, we have auxiliary functionality (metrics) which could affect > the whole node (and potentially cluster) behavior. > > Please, fix this problem and be more careful in the future. > > On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin wrote: > > > > Hello folks, > > > > thanks to everyone who joined the review, greatly appreciate your > > helpful comments. > > > > If there is no objection, we will merge this patch [1] shortly. > > > > [1] https://github.com/apache/ignite/pull/7941 > > > > пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev : > > > > > > Hi, > > > > > > I'm going to do it. > > > > > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov : > > > > > > > Hello guys, > > > > > > > > I've finished the review and approved the patch. > > > > Anybody else would like to review it? > > > > > > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin : > > > > > > > > > Hello, Maksim! > > > > > > > > > > I am currently working on a review notes from Alexey Plekhanov, will > > > > > let you know when I finish. > > > > > > > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev < > > > > maksim.stepac...@gmail.com > > > > > >: > > > > > > > > > > > > Hi, Pavel. > > > > > > > > > > > > As I see, the ticket [ > > > > https://issues.apache.org/jira/browse/IGNITE-12843 > > > > > ] > > > > > > is "PATCH AVAILABLE". Is this ticket finished? > > > > > > > > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > > > > > > > > > > > > > Hello all. > > > > > > > > > > > > > > I'm working on TDE cache group key rotation [1] and I have a > > > > > > > couple > > > > of > > > > > > > questions about partition re-encryption. > > > > > > > > > > > > > > As described in the wiki [2], the process of re-encryption at the > > > > > > > moment consists of sequentially marking memory pages as dirty, > > > > > > > this > > > > > > > process looks not resource-intensive. > > > > > > > Do you think it is necessary to do this in a multithreaded mode or > > > > > > > single thread is enough? > > > > > > > (We started testing re-encryption on dedicated servers (Xeon > > > > > > > E5-2680 > > > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > > > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a > > > > > > > result, > > > > > > > single-threaded encryption loaded disk within 30%. At the same > > > > > > > time, > > > > > > > the total re-encryption speed was around 60 MB/s, which allows one > > > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems > > > > > > > that > > > > > > > this performance is enough.) > > > > > > > > > > > > > > The second question is about the approach to storing the > > > > re-encryption > > > > > > > status. > > > > > > > At the moment, the re-encryption status includes two parameters - > > > > > > > the > > > > > > > total number of pages in the partition at the time of the start of > > > > > > > re-encryption (int) and the index of the last re-encrypted page > > > > (int). > > > > > > > These 8 bytes are stored in the metapage on the checkpoint (which > > > > > > > ensures that if the checkpoint does not happen, we will continue > > > > > > > the > > > > > > > process from the last page written to disk). > > > > > > > However, if multithread partition scanning does not make sense, > > > > > > > then > > > > > > > it seems that it is possible to change the implementation and > > > > > > > don't > > > > > > > change the metapage structure. Store only the "pointer" of the > > > > > > > partition (and the cache group) in the metastore and scan in > > > > > > > strict > > > > > > > order. > > > > > > > The approach with storing the status in the metapage of the > > > > > > > partition > > > > > > > seems to me more flexible, stable and has a number of advantages > > > > > > > over > > > > > > > the "pointer" approach: > > > > > > > 1. Since we saving the total number of pages at the re-encryption > > > > > > > startup - we will not scan extra pages that may be added to the > > > > > > >
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hi there, I accidentally stumbled upon a potential performance problem in this commit. CacheGroupMetricImpls.getPagesLeftForReencryption method contains at least two problems: - Relatively major: In order to calculate a value for one metric the method has O(N) complexity (N is number of partitions). It isn't good. Better approach is using some precalculated or estimated value during re-encryption process and just return this value. - Major: For each partition in this method PageStore.exists() will be called. This invocation leads to N calls to the file system (may be cached, may be not, we can't just hope). So with a default affinity configuration this method will touch the file system 1024 times per one metrics value calculation. Just increase dramatism and multiply 1024 on the number of cache groups existing on a node. Finally, we have auxiliary functionality (metrics) which could affect the whole node (and potentially cluster) behavior. Please, fix this problem and be more careful in the future. On Fri, Oct 23, 2020 at 12:46 PM Pavel Pereslegin wrote: > > Hello folks, > > thanks to everyone who joined the review, greatly appreciate your > helpful comments. > > If there is no objection, we will merge this patch [1] shortly. > > [1] https://github.com/apache/ignite/pull/7941 > > пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev : > > > > Hi, > > > > I'm going to do it. > > > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov : > > > > > Hello guys, > > > > > > I've finished the review and approved the patch. > > > Anybody else would like to review it? > > > > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin : > > > > > > > Hello, Maksim! > > > > > > > > I am currently working on a review notes from Alexey Plekhanov, will > > > > let you know when I finish. > > > > > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev < > > > maksim.stepac...@gmail.com > > > > >: > > > > > > > > > > Hi, Pavel. > > > > > > > > > > As I see, the ticket [ > > > https://issues.apache.org/jira/browse/IGNITE-12843 > > > > ] > > > > > is "PATCH AVAILABLE". Is this ticket finished? > > > > > > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > > > > > > > > > > > Hello all. > > > > > > > > > > > > I'm working on TDE cache group key rotation [1] and I have a couple > > > of > > > > > > questions about partition re-encryption. > > > > > > > > > > > > As described in the wiki [2], the process of re-encryption at the > > > > > > moment consists of sequentially marking memory pages as dirty, this > > > > > > process looks not resource-intensive. > > > > > > Do you think it is necessary to do this in a multithreaded mode or > > > > > > single thread is enough? > > > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680 > > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a > > > > > > result, > > > > > > single-threaded encryption loaded disk within 30%. At the same time, > > > > > > the total re-encryption speed was around 60 MB/s, which allows one > > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that > > > > > > this performance is enough.) > > > > > > > > > > > > The second question is about the approach to storing the > > > re-encryption > > > > > > status. > > > > > > At the moment, the re-encryption status includes two parameters - > > > > > > the > > > > > > total number of pages in the partition at the time of the start of > > > > > > re-encryption (int) and the index of the last re-encrypted page > > > (int). > > > > > > These 8 bytes are stored in the metapage on the checkpoint (which > > > > > > ensures that if the checkpoint does not happen, we will continue the > > > > > > process from the last page written to disk). > > > > > > However, if multithread partition scanning does not make sense, then > > > > > > it seems that it is possible to change the implementation and don't > > > > > > change the metapage structure. Store only the "pointer" of the > > > > > > partition (and the cache group) in the metastore and scan in strict > > > > > > order. > > > > > > The approach with storing the status in the metapage of the > > > > > > partition > > > > > > seems to me more flexible, stable and has a number of advantages > > > > > > over > > > > > > the "pointer" approach: > > > > > > 1. Since we saving the total number of pages at the re-encryption > > > > > > startup - we will not scan extra pages that may be added to the > > > > > > partition later. > > > > > > 2. We can move partitions between nodes and re-encryption should > > > > > > continue from a certain point on the new node. > > > > > > 3. If a partition is (re)created during cache group re-encryption, > > > > > > it > > > > > > will not be re-encrypted (since its re-encryption status will be > > > reset > > > > > > and all data is encrypted with the latest encryption key after > > > > > > (re)creation. > > > > > > > > > > > > Do
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello folks, thanks to everyone who joined the review, greatly appreciate your helpful comments. If there is no objection, we will merge this patch [1] shortly. [1] https://github.com/apache/ignite/pull/7941 пн, 5 окт. 2020 г. в 15:30, Maksim Stepachev : > > Hi, > > I'm going to do it. > > сб, 3 окт. 2020 г. в 21:47, Alex Plehanov : > > > Hello guys, > > > > I've finished the review and approved the patch. > > Anybody else would like to review it? > > > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin : > > > > > Hello, Maksim! > > > > > > I am currently working on a review notes from Alexey Plekhanov, will > > > let you know when I finish. > > > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev < > > maksim.stepac...@gmail.com > > > >: > > > > > > > > Hi, Pavel. > > > > > > > > As I see, the ticket [ > > https://issues.apache.org/jira/browse/IGNITE-12843 > > > ] > > > > is "PATCH AVAILABLE". Is this ticket finished? > > > > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > > > > > > > > > Hello all. > > > > > > > > > > I'm working on TDE cache group key rotation [1] and I have a couple > > of > > > > > questions about partition re-encryption. > > > > > > > > > > As described in the wiki [2], the process of re-encryption at the > > > > > moment consists of sequentially marking memory pages as dirty, this > > > > > process looks not resource-intensive. > > > > > Do you think it is necessary to do this in a multithreaded mode or > > > > > single thread is enough? > > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680 > > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result, > > > > > single-threaded encryption loaded disk within 30%. At the same time, > > > > > the total re-encryption speed was around 60 MB/s, which allows one > > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that > > > > > this performance is enough.) > > > > > > > > > > The second question is about the approach to storing the > > re-encryption > > > > > status. > > > > > At the moment, the re-encryption status includes two parameters - the > > > > > total number of pages in the partition at the time of the start of > > > > > re-encryption (int) and the index of the last re-encrypted page > > (int). > > > > > These 8 bytes are stored in the metapage on the checkpoint (which > > > > > ensures that if the checkpoint does not happen, we will continue the > > > > > process from the last page written to disk). > > > > > However, if multithread partition scanning does not make sense, then > > > > > it seems that it is possible to change the implementation and don't > > > > > change the metapage structure. Store only the "pointer" of the > > > > > partition (and the cache group) in the metastore and scan in strict > > > > > order. > > > > > The approach with storing the status in the metapage of the partition > > > > > seems to me more flexible, stable and has a number of advantages over > > > > > the "pointer" approach: > > > > > 1. Since we saving the total number of pages at the re-encryption > > > > > startup - we will not scan extra pages that may be added to the > > > > > partition later. > > > > > 2. We can move partitions between nodes and re-encryption should > > > > > continue from a certain point on the new node. > > > > > 3. If a partition is (re)created during cache group re-encryption, it > > > > > will not be re-encrypted (since its re-encryption status will be > > reset > > > > > and all data is encrypted with the latest encryption key after > > > > > (re)creation. > > > > > > > > > > Do you think single-threaded mode is enough? > > > > > Is it better to keep the re-encryption status in the metapage or > > store > > > > > the "pointer" in the metastore? > > > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843 > > > > > [2] > > > > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption > > > > > > > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin : > > > > > > > > > > > > Hello, > > > > > > > > > > > > I'll expand the answer a bit about calculating CRC, the problem is > > > not > > > > > > that it is calculated twice, but that now for encrypted pages, > > > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks > > > > > > the correctness of the encryption key, but from my point of view, > > it > > > > > > should be vice versa - the lower (delegated) FileIO should check > > the > > > > > > physical integrity and EncryptedFileIO should check the correctness > > > of > > > > > > the encryption key. > > > > > > > > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin : > > > > > > > > > > > > > > Hello, > > > > > > > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > > > > > > > filePageStore - what should we do with this? > > > > > > >
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hi, I'm going to do it. сб, 3 окт. 2020 г. в 21:47, Alex Plehanov : > Hello guys, > > I've finished the review and approved the patch. > Anybody else would like to review it? > > пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin : > > > Hello, Maksim! > > > > I am currently working on a review notes from Alexey Plekhanov, will > > let you know when I finish. > > > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev < > maksim.stepac...@gmail.com > > >: > > > > > > Hi, Pavel. > > > > > > As I see, the ticket [ > https://issues.apache.org/jira/browse/IGNITE-12843 > > ] > > > is "PATCH AVAILABLE". Is this ticket finished? > > > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > > > > > > > Hello all. > > > > > > > > I'm working on TDE cache group key rotation [1] and I have a couple > of > > > > questions about partition re-encryption. > > > > > > > > As described in the wiki [2], the process of re-encryption at the > > > > moment consists of sequentially marking memory pages as dirty, this > > > > process looks not resource-intensive. > > > > Do you think it is necessary to do this in a multithreaded mode or > > > > single thread is enough? > > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680 > > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result, > > > > single-threaded encryption loaded disk within 30%. At the same time, > > > > the total re-encryption speed was around 60 MB/s, which allows one > > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that > > > > this performance is enough.) > > > > > > > > The second question is about the approach to storing the > re-encryption > > > > status. > > > > At the moment, the re-encryption status includes two parameters - the > > > > total number of pages in the partition at the time of the start of > > > > re-encryption (int) and the index of the last re-encrypted page > (int). > > > > These 8 bytes are stored in the metapage on the checkpoint (which > > > > ensures that if the checkpoint does not happen, we will continue the > > > > process from the last page written to disk). > > > > However, if multithread partition scanning does not make sense, then > > > > it seems that it is possible to change the implementation and don't > > > > change the metapage structure. Store only the "pointer" of the > > > > partition (and the cache group) in the metastore and scan in strict > > > > order. > > > > The approach with storing the status in the metapage of the partition > > > > seems to me more flexible, stable and has a number of advantages over > > > > the "pointer" approach: > > > > 1. Since we saving the total number of pages at the re-encryption > > > > startup - we will not scan extra pages that may be added to the > > > > partition later. > > > > 2. We can move partitions between nodes and re-encryption should > > > > continue from a certain point on the new node. > > > > 3. If a partition is (re)created during cache group re-encryption, it > > > > will not be re-encrypted (since its re-encryption status will be > reset > > > > and all data is encrypted with the latest encryption key after > > > > (re)creation. > > > > > > > > Do you think single-threaded mode is enough? > > > > Is it better to keep the re-encryption status in the metapage or > store > > > > the "pointer" in the metastore? > > > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843 > > > > [2] > > > > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption > > > > > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin : > > > > > > > > > > Hello, > > > > > > > > > > I'll expand the answer a bit about calculating CRC, the problem is > > not > > > > > that it is calculated twice, but that now for encrypted pages, > > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks > > > > > the correctness of the encryption key, but from my point of view, > it > > > > > should be vice versa - the lower (delegated) FileIO should check > the > > > > > physical integrity and EncryptedFileIO should check the correctness > > of > > > > > the encryption key. > > > > > > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin : > > > > > > > > > > > > Hello, > > > > > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > > > > > > filePageStore - what should we do with this? > > > > > > > > > > > > We need to calculate the CRC of encrypted data, because we may be > > > > > > using the wrong encryption key to decrypt data, in which case we > > will > > > > > > not understand if the physical integrity is violated or the wrong > > > > > > encryption key is used. > > > > > > > > > > > > > 9. Question - How do we optimize when we can check that this > > page is > > > > > > > already encrypted by parallel loading? Maybe we should do this > in > > > > Phase 4? > > > >
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello guys, I've finished the review and approved the patch. Anybody else would like to review it? пн, 28 сент. 2020 г. в 11:38, Pavel Pereslegin : > Hello, Maksim! > > I am currently working on a review notes from Alexey Plekhanov, will > let you know when I finish. > > пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev >: > > > > Hi, Pavel. > > > > As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843 > ] > > is "PATCH AVAILABLE". Is this ticket finished? > > > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > > > > > Hello all. > > > > > > I'm working on TDE cache group key rotation [1] and I have a couple of > > > questions about partition re-encryption. > > > > > > As described in the wiki [2], the process of re-encryption at the > > > moment consists of sequentially marking memory pages as dirty, this > > > process looks not resource-intensive. > > > Do you think it is necessary to do this in a multithreaded mode or > > > single thread is enough? > > > (We started testing re-encryption on dedicated servers (Xeon E5-2680 > > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result, > > > single-threaded encryption loaded disk within 30%. At the same time, > > > the total re-encryption speed was around 60 MB/s, which allows one > > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that > > > this performance is enough.) > > > > > > The second question is about the approach to storing the re-encryption > > > status. > > > At the moment, the re-encryption status includes two parameters - the > > > total number of pages in the partition at the time of the start of > > > re-encryption (int) and the index of the last re-encrypted page (int). > > > These 8 bytes are stored in the metapage on the checkpoint (which > > > ensures that if the checkpoint does not happen, we will continue the > > > process from the last page written to disk). > > > However, if multithread partition scanning does not make sense, then > > > it seems that it is possible to change the implementation and don't > > > change the metapage structure. Store only the "pointer" of the > > > partition (and the cache group) in the metastore and scan in strict > > > order. > > > The approach with storing the status in the metapage of the partition > > > seems to me more flexible, stable and has a number of advantages over > > > the "pointer" approach: > > > 1. Since we saving the total number of pages at the re-encryption > > > startup - we will not scan extra pages that may be added to the > > > partition later. > > > 2. We can move partitions between nodes and re-encryption should > > > continue from a certain point on the new node. > > > 3. If a partition is (re)created during cache group re-encryption, it > > > will not be re-encrypted (since its re-encryption status will be reset > > > and all data is encrypted with the latest encryption key after > > > (re)creation. > > > > > > Do you think single-threaded mode is enough? > > > Is it better to keep the re-encryption status in the metapage or store > > > the "pointer" in the metastore? > > > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843 > > > [2] > > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption > > > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin : > > > > > > > > Hello, > > > > > > > > I'll expand the answer a bit about calculating CRC, the problem is > not > > > > that it is calculated twice, but that now for encrypted pages, > > > > EncryptedFileIO checks physical integrity, and FilePageStore checks > > > > the correctness of the encryption key, but from my point of view, it > > > > should be vice versa - the lower (delegated) FileIO should check the > > > > physical integrity and EncryptedFileIO should check the correctness > of > > > > the encryption key. > > > > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin : > > > > > > > > > > Hello, > > > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > > > > > filePageStore - what should we do with this? > > > > > > > > > > We need to calculate the CRC of encrypted data, because we may be > > > > > using the wrong encryption key to decrypt data, in which case we > will > > > > > not understand if the physical integrity is violated or the wrong > > > > > encryption key is used. > > > > > > > > > > > 9. Question - How do we optimize when we can check that this > page is > > > > > > already encrypted by parallel loading? Maybe we should do this in > > > Phase 4? > > > > > > > > > > To do this, we need to store the encryption key ID in memory (at > > > > > least), but this is not easy to do right now without breaking > binary > > > > > compatibility. > > > > > > > > > > > 7. Question -the current implementation does not use the > throttling > > > that > > > > > > is implemented in PDS. Users should
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello, Maksim! I am currently working on a review notes from Alexey Plekhanov, will let you know when I finish. пн, 28 сент. 2020 г. в 11:04, Maksim Stepachev : > > Hi, Pavel. > > As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843] > is "PATCH AVAILABLE". Is this ticket finished? > > чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > > > Hello all. > > > > I'm working on TDE cache group key rotation [1] and I have a couple of > > questions about partition re-encryption. > > > > As described in the wiki [2], the process of re-encryption at the > > moment consists of sequentially marking memory pages as dirty, this > > process looks not resource-intensive. > > Do you think it is necessary to do this in a multithreaded mode or > > single thread is enough? > > (We started testing re-encryption on dedicated servers (Xeon E5-2680 > > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result, > > single-threaded encryption loaded disk within 30%. At the same time, > > the total re-encryption speed was around 60 MB/s, which allows one > > node to re-encrypt 1 TB of data in about 5 hours, and it seems that > > this performance is enough.) > > > > The second question is about the approach to storing the re-encryption > > status. > > At the moment, the re-encryption status includes two parameters - the > > total number of pages in the partition at the time of the start of > > re-encryption (int) and the index of the last re-encrypted page (int). > > These 8 bytes are stored in the metapage on the checkpoint (which > > ensures that if the checkpoint does not happen, we will continue the > > process from the last page written to disk). > > However, if multithread partition scanning does not make sense, then > > it seems that it is possible to change the implementation and don't > > change the metapage structure. Store only the "pointer" of the > > partition (and the cache group) in the metastore and scan in strict > > order. > > The approach with storing the status in the metapage of the partition > > seems to me more flexible, stable and has a number of advantages over > > the "pointer" approach: > > 1. Since we saving the total number of pages at the re-encryption > > startup - we will not scan extra pages that may be added to the > > partition later. > > 2. We can move partitions between nodes and re-encryption should > > continue from a certain point on the new node. > > 3. If a partition is (re)created during cache group re-encryption, it > > will not be re-encrypted (since its re-encryption status will be reset > > and all data is encrypted with the latest encryption key after > > (re)creation. > > > > Do you think single-threaded mode is enough? > > Is it better to keep the re-encryption status in the metapage or store > > the "pointer" in the metastore? > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12843 > > [2] > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption > > > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin : > > > > > > Hello, > > > > > > I'll expand the answer a bit about calculating CRC, the problem is not > > > that it is calculated twice, but that now for encrypted pages, > > > EncryptedFileIO checks physical integrity, and FilePageStore checks > > > the correctness of the encryption key, but from my point of view, it > > > should be vice versa - the lower (delegated) FileIO should check the > > > physical integrity and EncryptedFileIO should check the correctness of > > > the encryption key. > > > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin : > > > > > > > > Hello, > > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > > > > filePageStore - what should we do with this? > > > > > > > > We need to calculate the CRC of encrypted data, because we may be > > > > using the wrong encryption key to decrypt data, in which case we will > > > > not understand if the physical integrity is violated or the wrong > > > > encryption key is used. > > > > > > > > > 9. Question - How do we optimize when we can check that this page is > > > > > already encrypted by parallel loading? Maybe we should do this in > > Phase 4? > > > > > > > > To do this, we need to store the encryption key ID in memory (at > > > > least), but this is not easy to do right now without breaking binary > > > > compatibility. > > > > > > > > > 7. Question -the current implementation does not use the throttling > > that > > > > > is implemented in PDS. Users should set the throughput such as 5 MB > > per > > > > > second, but not the timeout, packet size, or stream size. > > > > > > > > I've added a simple rate limiter for this. > > > > > > > > > 8. Question - why we add a lot of system properties? > > > > >> Can you, please, list system properties that should be moved to the > > configuration? > > > > > > > > It's about the
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hi, Pavel. As I see, the ticket [https://issues.apache.org/jira/browse/IGNITE-12843] is "PATCH AVAILABLE". Is this ticket finished? чт, 13 авг. 2020 г. в 13:49, Pavel Pereslegin : > Hello all. > > I'm working on TDE cache group key rotation [1] and I have a couple of > questions about partition re-encryption. > > As described in the wiki [2], the process of re-encryption at the > moment consists of sequentially marking memory pages as dirty, this > process looks not resource-intensive. > Do you think it is necessary to do this in a multithreaded mode or > single thread is enough? > (We started testing re-encryption on dedicated servers (Xeon E5-2680 > 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = > CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result, > single-threaded encryption loaded disk within 30%. At the same time, > the total re-encryption speed was around 60 MB/s, which allows one > node to re-encrypt 1 TB of data in about 5 hours, and it seems that > this performance is enough.) > > The second question is about the approach to storing the re-encryption > status. > At the moment, the re-encryption status includes two parameters - the > total number of pages in the partition at the time of the start of > re-encryption (int) and the index of the last re-encrypted page (int). > These 8 bytes are stored in the metapage on the checkpoint (which > ensures that if the checkpoint does not happen, we will continue the > process from the last page written to disk). > However, if multithread partition scanning does not make sense, then > it seems that it is possible to change the implementation and don't > change the metapage structure. Store only the "pointer" of the > partition (and the cache group) in the metastore and scan in strict > order. > The approach with storing the status in the metapage of the partition > seems to me more flexible, stable and has a number of advantages over > the "pointer" approach: > 1. Since we saving the total number of pages at the re-encryption > startup - we will not scan extra pages that may be added to the > partition later. > 2. We can move partitions between nodes and re-encryption should > continue from a certain point on the new node. > 3. If a partition is (re)created during cache group re-encryption, it > will not be re-encrypted (since its re-encryption status will be reset > and all data is encrypted with the latest encryption key after > (re)creation. > > Do you think single-threaded mode is enough? > Is it better to keep the re-encryption status in the metapage or store > the "pointer" in the metastore? > > [1] https://issues.apache.org/jira/browse/IGNITE-12843 > [2] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption > > пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin : > > > > Hello, > > > > I'll expand the answer a bit about calculating CRC, the problem is not > > that it is calculated twice, but that now for encrypted pages, > > EncryptedFileIO checks physical integrity, and FilePageStore checks > > the correctness of the encryption key, but from my point of view, it > > should be vice versa - the lower (delegated) FileIO should check the > > physical integrity and EncryptedFileIO should check the correctness of > > the encryption key. > > > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin : > > > > > > Hello, > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > > > filePageStore - what should we do with this? > > > > > > We need to calculate the CRC of encrypted data, because we may be > > > using the wrong encryption key to decrypt data, in which case we will > > > not understand if the physical integrity is violated or the wrong > > > encryption key is used. > > > > > > > 9. Question - How do we optimize when we can check that this page is > > > > already encrypted by parallel loading? Maybe we should do this in > Phase 4? > > > > > > To do this, we need to store the encryption key ID in memory (at > > > least), but this is not easy to do right now without breaking binary > > > compatibility. > > > > > > > 7. Question -the current implementation does not use the throttling > that > > > > is implemented in PDS. Users should set the throughput such as 5 MB > per > > > > second, but not the timeout, packet size, or stream size. > > > > > > I've added a simple rate limiter for this. > > > > > > > 8. Question - why we add a lot of system properties? > > > >> Can you, please, list system properties that should be moved to the > configuration? > > > > > > It's about the background re-encryption properties, for now, it is: > > > - re-encryption speed limit (in megabytes per second) > > > - threads count used for re-encryption > > > - count of pages in batch, processed under checkpoint lock > > > - flag to completely disable background re-encryption > > > > > > > 11. We should remember about complicated test scenarios with failover > > >
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello all. I'm working on TDE cache group key rotation [1] and I have a couple of questions about partition re-encryption. As described in the wiki [2], the process of re-encryption at the moment consists of sequentially marking memory pages as dirty, this process looks not resource-intensive. Do you think it is necessary to do this in a multithreaded mode or single thread is enough? (We started testing re-encryption on dedicated servers (Xeon E5-2680 2.4Ghz, SSD Huawei ES3600P 3.2TB, ThrottlingPolicy = CHECKPOINT_BUFFER_ONLY) with no speed limit and no load, as a result, single-threaded encryption loaded disk within 30%. At the same time, the total re-encryption speed was around 60 MB/s, which allows one node to re-encrypt 1 TB of data in about 5 hours, and it seems that this performance is enough.) The second question is about the approach to storing the re-encryption status. At the moment, the re-encryption status includes two parameters - the total number of pages in the partition at the time of the start of re-encryption (int) and the index of the last re-encrypted page (int). These 8 bytes are stored in the metapage on the checkpoint (which ensures that if the checkpoint does not happen, we will continue the process from the last page written to disk). However, if multithread partition scanning does not make sense, then it seems that it is possible to change the implementation and don't change the metapage structure. Store only the "pointer" of the partition (and the cache group) in the metastore and scan in strict order. The approach with storing the status in the metapage of the partition seems to me more flexible, stable and has a number of advantages over the "pointer" approach: 1. Since we saving the total number of pages at the re-encryption startup - we will not scan extra pages that may be added to the partition later. 2. We can move partitions between nodes and re-encryption should continue from a certain point on the new node. 3. If a partition is (re)created during cache group re-encryption, it will not be re-encrypted (since its re-encryption status will be reset and all data is encrypted with the latest encryption key after (re)creation. Do you think single-threaded mode is enough? Is it better to keep the re-encryption status in the metapage or store the "pointer" in the metastore? [1] https://issues.apache.org/jira/browse/IGNITE-12843 [2] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase3.Cachekeyrotation.-Backgroundre-encryption пт, 31 июл. 2020 г. в 11:09, Pavel Pereslegin : > > Hello, > > I'll expand the answer a bit about calculating CRC, the problem is not > that it is calculated twice, but that now for encrypted pages, > EncryptedFileIO checks physical integrity, and FilePageStore checks > the correctness of the encryption key, but from my point of view, it > should be vice versa - the lower (delegated) FileIO should check the > physical integrity and EncryptedFileIO should check the correctness of > the encryption key. > > пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin : > > > > Hello, > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > > filePageStore - what should we do with this? > > > > We need to calculate the CRC of encrypted data, because we may be > > using the wrong encryption key to decrypt data, in which case we will > > not understand if the physical integrity is violated or the wrong > > encryption key is used. > > > > > 9. Question - How do we optimize when we can check that this page is > > > already encrypted by parallel loading? Maybe we should do this in Phase 4? > > > > To do this, we need to store the encryption key ID in memory (at > > least), but this is not easy to do right now without breaking binary > > compatibility. > > > > > 7. Question -the current implementation does not use the throttling that > > > is implemented in PDS. Users should set the throughput such as 5 MB per > > > second, but not the timeout, packet size, or stream size. > > > > I've added a simple rate limiter for this. > > > > > 8. Question - why we add a lot of system properties? > > >> Can you, please, list system properties that should be moved to the > > >> configuration? > > > > It's about the background re-encryption properties, for now, it is: > > - re-encryption speed limit (in megabytes per second) > > - threads count used for re-encryption > > - count of pages in batch, processed under checkpoint lock > > - flag to completely disable background re-encryption > > > > > 11. We should remember about complicated test scenarios with failover > > > > PR contains tests for re-encryption (and key rotation) on unstable > > topology (with baseline change and without it). I'll expand them if I > > missed some cases. > > > > > 13. Will re-encryption continue after the cluster is completely stopped? > > > > Yes, as I mentioned earlier, we save the re-encryption status in the > > meta page of each re-encrypted partition and
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello, I'll expand the answer a bit about calculating CRC, the problem is not that it is calculated twice, but that now for encrypted pages, EncryptedFileIO checks physical integrity, and FilePageStore checks the correctness of the encryption key, but from my point of view, it should be vice versa - the lower (delegated) FileIO should check the physical integrity and EncryptedFileIO should check the correctness of the encryption key. пт, 31 июл. 2020 г. в 10:40, Pavel Pereslegin : > > Hello, > > > 10. Question - CRC is read in two places encryptionFileIO and > > filePageStore - what should we do with this? > > We need to calculate the CRC of encrypted data, because we may be > using the wrong encryption key to decrypt data, in which case we will > not understand if the physical integrity is violated or the wrong > encryption key is used. > > > 9. Question - How do we optimize when we can check that this page is > > already encrypted by parallel loading? Maybe we should do this in Phase 4? > > To do this, we need to store the encryption key ID in memory (at > least), but this is not easy to do right now without breaking binary > compatibility. > > > 7. Question -the current implementation does not use the throttling that > > is implemented in PDS. Users should set the throughput such as 5 MB per > > second, but not the timeout, packet size, or stream size. > > I've added a simple rate limiter for this. > > > 8. Question - why we add a lot of system properties? > >> Can you, please, list system properties that should be moved to the > >> configuration? > > It's about the background re-encryption properties, for now, it is: > - re-encryption speed limit (in megabytes per second) > - threads count used for re-encryption > - count of pages in batch, processed under checkpoint lock > - flag to completely disable background re-encryption > > > 11. We should remember about complicated test scenarios with failover > > PR contains tests for re-encryption (and key rotation) on unstable > topology (with baseline change and without it). I'll expand them if I > missed some cases. > > > 13. Will re-encryption continue after the cluster is completely stopped? > > Yes, as I mentioned earlier, we save the re-encryption status in the > meta page of each re-encrypted partition and trigger re-encryption on > node startup if needed (more detailed description on the wiki). > > Thanks a lot for your comments, I am still working on PR and expanding > wiki documentation. I'll let you know when it will be ready for the > review. > > вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk : > > > > Hello Nikolay, > > > > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > > filePageStore - what should we do with this? > > > > > > filePageStore checks CRC of the encrypted page. This required to confirm > > > the page not corrupted on the disk. > > > encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the > > > encrypted data). > > > This required to be sure the decrypted page contains correct data and not > > > replaced with some malicious content. > > > > > > > I still do not see why we need CRC twice, can you please elaborate on this > > statement? If an attacker is able to replace the contents of an encrypted > > page, it means that they have access to the encryption key. What will > > prevent them from calculating the CRC of malicious content and then > > encrypting it?
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello, > 10. Question - CRC is read in two places encryptionFileIO and > filePageStore - what should we do with this? We need to calculate the CRC of encrypted data, because we may be using the wrong encryption key to decrypt data, in which case we will not understand if the physical integrity is violated or the wrong encryption key is used. > 9. Question - How do we optimize when we can check that this page is > already encrypted by parallel loading? Maybe we should do this in Phase 4? To do this, we need to store the encryption key ID in memory (at least), but this is not easy to do right now without breaking binary compatibility. > 7. Question -the current implementation does not use the throttling that > is implemented in PDS. Users should set the throughput such as 5 MB per > second, but not the timeout, packet size, or stream size. I've added a simple rate limiter for this. > 8. Question - why we add a lot of system properties? >> Can you, please, list system properties that should be moved to the >> configuration? It's about the background re-encryption properties, for now, it is: - re-encryption speed limit (in megabytes per second) - threads count used for re-encryption - count of pages in batch, processed under checkpoint lock - flag to completely disable background re-encryption > 11. We should remember about complicated test scenarios with failover PR contains tests for re-encryption (and key rotation) on unstable topology (with baseline change and without it). I'll expand them if I missed some cases. > 13. Will re-encryption continue after the cluster is completely stopped? Yes, as I mentioned earlier, we save the re-encryption status in the meta page of each re-encrypted partition and trigger re-encryption on node startup if needed (more detailed description on the wiki). Thanks a lot for your comments, I am still working on PR and expanding wiki documentation. I'll let you know when it will be ready for the review. вт, 28 июл. 2020 г. в 19:14, Alexey Goncharuk : > > Hello Nikolay, > > > > > 10. Question - CRC is read in two places encryptionFileIO and > > filePageStore - what should we do with this? > > > > filePageStore checks CRC of the encrypted page. This required to confirm > > the page not corrupted on the disk. > > encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the > > encrypted data). > > This required to be sure the decrypted page contains correct data and not > > replaced with some malicious content. > > > > I still do not see why we need CRC twice, can you please elaborate on this > statement? If an attacker is able to replace the contents of an encrypted > page, it means that they have access to the encryption key. What will > prevent them from calculating the CRC of malicious content and then > encrypting it?
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello Nikolay, > > 10. Question - CRC is read in two places encryptionFileIO and > filePageStore - what should we do with this? > > filePageStore checks CRC of the encrypted page. This required to confirm > the page not corrupted on the disk. > encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the > encrypted data). > This required to be sure the decrypted page contains correct data and not > replaced with some malicious content. > I still do not see why we need CRC twice, can you please elaborate on this statement? If an attacker is able to replace the contents of an encrypted page, it means that they have access to the encryption key. What will prevent them from calculating the CRC of malicious content and then encrypting it?
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello, Maksim. Thanks for the summary. From my point of view, we should focus on Phase3 implementation and then do the refactoring for some specific SPI implementation. > 8. Question - why we add a lot of system properties? Can you, please, list system properties that should be moved to the configuration? > 10. Question - CRC is read in two places encryptionFileIO and filePageStore - > what should we do with this? filePageStore checks CRC of the encrypted page. This required to confirm the page not corrupted on the disk. encryptionFileIO checks CRC of the decrypted page(CRC itself stored in the encrypted data). This required to be sure the decrypted page contains correct data and not replaced with some malicious content. Here is the list of items that are not related to Phase3 implementation. Please, tell me what do you think: > 2. We should try to run the existing test suites in encryption mode. We did it during TDE.Phase1 testing. > 3. SPI requires an additional method such as getKeyDigest > 4. Recommendation - the encryption processor should be divided into external > subclasses > 5. Recommendation - we should not use tuples and triples, because this is a > marker of a design problem. > 6. Strict recommendation - please don't put context everywhere Actually, this is a question of taste and obviously not related to the current discussion. > 24 июля 2020 г., в 14:27, Maksim Stepachev > написал(а): > > Hello everyone, yesterday we discussed the implementation of TDE over a > conference call. I added a summary of this call here: > > 1. The wiki documentation should be expanded. It should describe the > steps - how it works under the hood. What are the domain objects in the > implementation? > 2. We should try to run the existing test suites in encryption mode. > Encryption should not affect any PDS or other tests. > 3. SPI requires an additional method such as getKeyDigest, because the > current implementation of GridEncryptionManager#masterKeyDigest() looks > strange. We reset the master key to calculate the digest. This will not > work well if we want to use VOLT as a key provider implementation. > 4. Recommendation - the encryption processor should be divided into > external subclasses, and we should use the OOP decomposition pattern for > it. Right now, this class has more than 2000 lines and does not support > SOLID. This is similar to inline unrelated logic with a single class. > 5. Recommendation - we should not use tuples and triples, because this > is a marker of a design problem. > 6. Strict recommendation - please don't put context everywhere. it > should only be used in the parent class. You can pass the necessary > dependencies through the constructor, as in the DI pattern. > 7. Question -the current implementation does not use the throttling that > is implemented in PDS. Users should set the throughput such as 5 MB per > second, but not the timeout, packet size, or stream size. > 8. Question - why we add a lot of system properties? Why we didn’t add a > configuration for it? > 9. Question - How do we optimize when we can check that this page is > already encrypted by parallel loading? Maybe we should do this in Phase 4? > 10. Question - CRC is read in two places encryptionFileIO and > filePageStore - what should we do with this? > 11. We should remember about complicated test scenarios with failover > like node left when encryption started and joined after it finished. In the > process, the baseline changed node left before / after / in the middle of > this process. And etc. > 12. How to use a sandbox to protect our cluster of master and user key > stealing via compute? > 13. Will re-encryption continue after the cluster is completely stopped? > > If I forgot some points, you can add them to the message. > > > вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin : > >> Hello, Maksim. >> >> For implementation, I chose so-called "in place background >> re-encryption" design. >> >> The first step is to rotate the key for writing data, it only works on >> the active cluster, at the moment.. >> The second step is re-encryption (to remove previous encryption key). >> If node was restarted reencryption starts after metastorage becomes >> ready for read/write. Each "re-encrypted" partition (including index) >> has an attribute on the meta page that indicates whether background >> re-encryption should be continued. >> >> I updated the description in wiki [1]. >> Some more details in jira [2]. >> Draft PR [3]. >> >> [1] >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384 >> [2] https://issues.apache.org/jira/browse/IGNITE-12843 >> [3] https://github.com/apache/ignite/pull/7941 >> >> вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev : >>> >>> Hi! >>> >>> Do you have any updates about this issue? What types of implementations >>> have you chosen (in-place, offline, or in the
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello everyone, yesterday we discussed the implementation of TDE over a conference call. I added a summary of this call here: 1. The wiki documentation should be expanded. It should describe the steps - how it works under the hood. What are the domain objects in the implementation? 2. We should try to run the existing test suites in encryption mode. Encryption should not affect any PDS or other tests. 3. SPI requires an additional method such as getKeyDigest, because the current implementation of GridEncryptionManager#masterKeyDigest() looks strange. We reset the master key to calculate the digest. This will not work well if we want to use VOLT as a key provider implementation. 4. Recommendation - the encryption processor should be divided into external subclasses, and we should use the OOP decomposition pattern for it. Right now, this class has more than 2000 lines and does not support SOLID. This is similar to inline unrelated logic with a single class. 5. Recommendation - we should not use tuples and triples, because this is a marker of a design problem. 6. Strict recommendation - please don't put context everywhere. it should only be used in the parent class. You can pass the necessary dependencies through the constructor, as in the DI pattern. 7. Question -the current implementation does not use the throttling that is implemented in PDS. Users should set the throughput such as 5 MB per second, but not the timeout, packet size, or stream size. 8. Question - why we add a lot of system properties? Why we didn’t add a configuration for it? 9. Question - How do we optimize when we can check that this page is already encrypted by parallel loading? Maybe we should do this in Phase 4? 10. Question - CRC is read in two places encryptionFileIO and filePageStore - what should we do with this? 11. We should remember about complicated test scenarios with failover like node left when encryption started and joined after it finished. In the process, the baseline changed node left before / after / in the middle of this process. And etc. 12. How to use a sandbox to protect our cluster of master and user key stealing via compute? 13. Will re-encryption continue after the cluster is completely stopped? If I forgot some points, you can add them to the message. вт, 7 июл. 2020 г. в 17:40, Pavel Pereslegin : > Hello, Maksim. > > For implementation, I chose so-called "in place background > re-encryption" design. > > The first step is to rotate the key for writing data, it only works on > the active cluster, at the moment.. > The second step is re-encryption (to remove previous encryption key). > If node was restarted reencryption starts after metastorage becomes > ready for read/write. Each "re-encrypted" partition (including index) > has an attribute on the meta page that indicates whether background > re-encryption should be continued. > > I updated the description in wiki [1]. > Some more details in jira [2]. > Draft PR [3]. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384 > [2] https://issues.apache.org/jira/browse/IGNITE-12843 > [3] https://github.com/apache/ignite/pull/7941 > > вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev : > > > > Hi! > > > > Do you have any updates about this issue? What types of implementations > > have you chosen (in-place, offline, or in the background)? I know that we > > want to add a partition defragmentation function, we can add a hole to > > integrate the re-encryption scheme. Could you update your IEP with your > > plans? > > > > пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin : > > > > > Nikolay, Alexei, > > > > > > thanks for your suggestions. > > > > > > Offline re-encryption does not seem so simple, we need to read/replace > > > the existing encryption keys on all nodes (therefore, we should be > > > able to read/write metastore/WAL and exchange data between the > > > baseline nodes). Re-encryption in maintenance mode (for example, in a > > > stable read-only cluster) will be simple, but it still looks very > > > inconvenient, at least because users will need to interrupt all > > > operations. > > > > > > The main advantage of online "in place" re-encryption is that we'll > > > support multiple keys for reading, and this procedure does not > > > directly depend on background re-encryption. > > > > > > So, the first step is similar to rotating the master key when the new > > > key was set for writing on all nodes - that’s it, the cache group key > > > rotation is complete (this is what PCI DSS requires - encrypt new > > > updates with new keys). > > > The second step is to re-encrypt the existing data, As I said > > > previously I thought about scanning all partition pages in some > > > background mode (store progress on the metapage to continue after > > > restart), but rebalance approach should also work here if I figure out > > > how to automate
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello, Maksim. For implementation, I chose so-called "in place background re-encryption" design. The first step is to rotate the key for writing data, it only works on the active cluster, at the moment.. The second step is re-encryption (to remove previous encryption key). If node was restarted reencryption starts after metastorage becomes ready for read/write. Each "re-encrypted" partition (including index) has an attribute on the meta page that indicates whether background re-encryption should be continued. I updated the description in wiki [1]. Some more details in jira [2]. Draft PR [3]. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384 [2] https://issues.apache.org/jira/browse/IGNITE-12843 [3] https://github.com/apache/ignite/pull/7941 вт, 7 июл. 2020 г. в 13:49, Maksim Stepachev : > > Hi! > > Do you have any updates about this issue? What types of implementations > have you chosen (in-place, offline, or in the background)? I know that we > want to add a partition defragmentation function, we can add a hole to > integrate the re-encryption scheme. Could you update your IEP with your > plans? > > пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin : > > > Nikolay, Alexei, > > > > thanks for your suggestions. > > > > Offline re-encryption does not seem so simple, we need to read/replace > > the existing encryption keys on all nodes (therefore, we should be > > able to read/write metastore/WAL and exchange data between the > > baseline nodes). Re-encryption in maintenance mode (for example, in a > > stable read-only cluster) will be simple, but it still looks very > > inconvenient, at least because users will need to interrupt all > > operations. > > > > The main advantage of online "in place" re-encryption is that we'll > > support multiple keys for reading, and this procedure does not > > directly depend on background re-encryption. > > > > So, the first step is similar to rotating the master key when the new > > key was set for writing on all nodes - that’s it, the cache group key > > rotation is complete (this is what PCI DSS requires - encrypt new > > updates with new keys). > > The second step is to re-encrypt the existing data, As I said > > previously I thought about scanning all partition pages in some > > background mode (store progress on the metapage to continue after > > restart), but rebalance approach should also work here if I figure out > > how to automate this process. > > > > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov < > > alexey.scherbak...@gmail.com>: > > > > > > > > > > > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov : > > >> > > >> > This willl takes us to the re-encryption using full rebalancing > > >> > > >> Rebalance will require 2x efforts for reencryption > > >> > > >> 1. Read and send data from supplier node. > > >> 2. Reencrypt and write data on demander node. > > >> > > >> Instead of > > >> > > >> 1. Read, reencrypt and write data on «demander» node. > > > > > > > > > Usually, reading and sending is not a bottleneck. And don't forget we > > can run out of WAL history and fall back to full rebalancing with partition > > eviction eliminating all efforts from offline re-encryption. > > > > > > On the other side, for a grid having many nodes one-by-one re-encryption > > can take a long time. > > > It should also be possible to re-encrypt all data as fast as possible > > if, for example, if a load can be switched to another grid, where offline > > encryption will come in handy. > > > > > > So, I suggest to implement offline re-encryption and online > > re-encryption using rebalancing as a first step. > > > > > > Next step can be online in-place re-encryption. It's important to > > measure business impact from it on online grid. > > > > > >> > > >> > > >> > > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov < > > alexey.scherbak...@gmail.com> написал(а): > > >> > > > >> > For me, the one big disadvantage for offline re-encryption is the > > >> > possibility to run out of WAL history. > > >> > If an re-encryption takes a long time we will get full rebalancing > > with > > >> > partition eviction. > > >> > This willl takes us to the re-encryption using full rebalancing, > > proposed > > >> > by me earlier. > > >> > > > >> > > > >> > > > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov : > > >> > > > >> >>> And definitely this approach is much simplier to implement > > >> >> > > >> >> I agree. > > >> >> > > >> >> If we allow to made nodes offline for reencryption then we can > > implement a > > >> >> fully offline procedure: > > >> >> > > >> >> 1. Stop node. > > >> >> 2. Execute some control.sh command that will reencrypt all data > > without > > >> >> starting node > > >> >> 3. Start node. > > >> >> > > >> >> Pavel, can you, please, write it one more time - what disadvantages > > in > > >> >> offline procedure? > > >> >> > > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov < > > alexey.scherbak...@gmail.com> > > >> >> написал(а): > > >> >>> > > >> >>>
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hi! Do you have any updates about this issue? What types of implementations have you chosen (in-place, offline, or in the background)? I know that we want to add a partition defragmentation function, we can add a hole to integrate the re-encryption scheme. Could you update your IEP with your plans? пн, 25 мая 2020 г. в 12:50, Pavel Pereslegin : > Nikolay, Alexei, > > thanks for your suggestions. > > Offline re-encryption does not seem so simple, we need to read/replace > the existing encryption keys on all nodes (therefore, we should be > able to read/write metastore/WAL and exchange data between the > baseline nodes). Re-encryption in maintenance mode (for example, in a > stable read-only cluster) will be simple, but it still looks very > inconvenient, at least because users will need to interrupt all > operations. > > The main advantage of online "in place" re-encryption is that we'll > support multiple keys for reading, and this procedure does not > directly depend on background re-encryption. > > So, the first step is similar to rotating the master key when the new > key was set for writing on all nodes - that’s it, the cache group key > rotation is complete (this is what PCI DSS requires - encrypt new > updates with new keys). > The second step is to re-encrypt the existing data, As I said > previously I thought about scanning all partition pages in some > background mode (store progress on the metapage to continue after > restart), but rebalance approach should also work here if I figure out > how to automate this process. > > пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov < > alexey.scherbak...@gmail.com>: > > > > > > > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov : > >> > >> > This willl takes us to the re-encryption using full rebalancing > >> > >> Rebalance will require 2x efforts for reencryption > >> > >> 1. Read and send data from supplier node. > >> 2. Reencrypt and write data on demander node. > >> > >> Instead of > >> > >> 1. Read, reencrypt and write data on «demander» node. > > > > > > Usually, reading and sending is not a bottleneck. And don't forget we > can run out of WAL history and fall back to full rebalancing with partition > eviction eliminating all efforts from offline re-encryption. > > > > On the other side, for a grid having many nodes one-by-one re-encryption > can take a long time. > > It should also be possible to re-encrypt all data as fast as possible > if, for example, if a load can be switched to another grid, where offline > encryption will come in handy. > > > > So, I suggest to implement offline re-encryption and online > re-encryption using rebalancing as a first step. > > > > Next step can be online in-place re-encryption. It's important to > measure business impact from it on online grid. > > > >> > >> > >> > >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov < > alexey.scherbak...@gmail.com> написал(а): > >> > > >> > For me, the one big disadvantage for offline re-encryption is the > >> > possibility to run out of WAL history. > >> > If an re-encryption takes a long time we will get full rebalancing > with > >> > partition eviction. > >> > This willl takes us to the re-encryption using full rebalancing, > proposed > >> > by me earlier. > >> > > >> > > >> > > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov : > >> > > >> >>> And definitely this approach is much simplier to implement > >> >> > >> >> I agree. > >> >> > >> >> If we allow to made nodes offline for reencryption then we can > implement a > >> >> fully offline procedure: > >> >> > >> >> 1. Stop node. > >> >> 2. Execute some control.sh command that will reencrypt all data > without > >> >> starting node > >> >> 3. Start node. > >> >> > >> >> Pavel, can you, please, write it one more time - what disadvantages > in > >> >> offline procedure? > >> >> > >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov < > alexey.scherbak...@gmail.com> > >> >> написал(а): > >> >>> > >> >>> And definitely this approach is much simplier to implement because > all > >> >>> corner cases are handled by rebalancing code. > >> >>> > >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov < > >> >> alexey.scherbak...@gmail.com > >> : > >> >>> > >> I mean: serving supply requests. > >> > >> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < > >> alexey.scherbak...@gmail.com>: > >> > >> > Nikolay, > >> > > >> > Can you explain why such restriction is necessary ? > >> > Most likely having a currently re-encrypting node serving only > demand > >> > requests will have least preformance impact on a grid. > >> > > >> > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov >: > >> > > >> >> Hello, Alexei. > >> >> > >> >> I think we want to implement this feature without nodes restart. > >> >> In the ideal scenario all nodes will stay alive and respond to > the > >> >> user > >> >> requests. > >> >> > >> >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov < > >> >>
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Nikolay, Alexei, thanks for your suggestions. Offline re-encryption does not seem so simple, we need to read/replace the existing encryption keys on all nodes (therefore, we should be able to read/write metastore/WAL and exchange data between the baseline nodes). Re-encryption in maintenance mode (for example, in a stable read-only cluster) will be simple, but it still looks very inconvenient, at least because users will need to interrupt all operations. The main advantage of online "in place" re-encryption is that we'll support multiple keys for reading, and this procedure does not directly depend on background re-encryption. So, the first step is similar to rotating the master key when the new key was set for writing on all nodes - that’s it, the cache group key rotation is complete (this is what PCI DSS requires - encrypt new updates with new keys). The second step is to re-encrypt the existing data, As I said previously I thought about scanning all partition pages in some background mode (store progress on the metapage to continue after restart), but rebalance approach should also work here if I figure out how to automate this process. пн, 25 мая 2020 г. в 12:22, Alexei Scherbakov : > > > > пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov : >> >> > This willl takes us to the re-encryption using full rebalancing >> >> Rebalance will require 2x efforts for reencryption >> >> 1. Read and send data from supplier node. >> 2. Reencrypt and write data on demander node. >> >> Instead of >> >> 1. Read, reencrypt and write data on «demander» node. > > > Usually, reading and sending is not a bottleneck. And don't forget we can run > out of WAL history and fall back to full rebalancing with partition eviction > eliminating all efforts from offline re-encryption. > > On the other side, for a grid having many nodes one-by-one re-encryption can > take a long time. > It should also be possible to re-encrypt all data as fast as possible if, for > example, if a load can be switched to another grid, where offline encryption > will come in handy. > > So, I suggest to implement offline re-encryption and online re-encryption > using rebalancing as a first step. > > Next step can be online in-place re-encryption. It's important to measure > business impact from it on online grid. > >> >> >> >> > 25 мая 2020 г., в 11:46, Alexei Scherbakov >> > написал(а): >> > >> > For me, the one big disadvantage for offline re-encryption is the >> > possibility to run out of WAL history. >> > If an re-encryption takes a long time we will get full rebalancing with >> > partition eviction. >> > This willl takes us to the re-encryption using full rebalancing, proposed >> > by me earlier. >> > >> > >> > >> > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov : >> > >> >>> And definitely this approach is much simplier to implement >> >> >> >> I agree. >> >> >> >> If we allow to made nodes offline for reencryption then we can implement a >> >> fully offline procedure: >> >> >> >> 1. Stop node. >> >> 2. Execute some control.sh command that will reencrypt all data without >> >> starting node >> >> 3. Start node. >> >> >> >> Pavel, can you, please, write it one more time - what disadvantages in >> >> offline procedure? >> >> >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov >> >> написал(а): >> >>> >> >>> And definitely this approach is much simplier to implement because all >> >>> corner cases are handled by rebalancing code. >> >>> >> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov < >> >> alexey.scherbak...@gmail.com >> : >> >>> >> I mean: serving supply requests. >> >> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < >> alexey.scherbak...@gmail.com>: >> >> > Nikolay, >> > >> > Can you explain why such restriction is necessary ? >> > Most likely having a currently re-encrypting node serving only demand >> > requests will have least preformance impact on a grid. >> > >> > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : >> > >> >> Hello, Alexei. >> >> >> >> I think we want to implement this feature without nodes restart. >> >> In the ideal scenario all nodes will stay alive and respond to the >> >> user >> >> requests. >> >> >> >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov < >> >> alexey.scherbak...@gmail.com> написал(а): >> >>> >> >>> Pavel Pereslegin, >> >>> >> >>> I see another opportunity. >> >>> We can use rebalancing to re-encrypt node data with a new key. >> >>> It's a trivial procedure for me: stop a node, clear database, change >> >> a >> >> key, >> >>> start node and wait for rebalancing to complete. >> >>> Data will be re-encrypted during rebalancing. >> >>> >> >>> Did I miss something ? >> >>> >> >>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov : >> >>> >> Folks, >> >> Just keeping you informed: I and my colleagues are highly interested >> >> in TDE >>
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
пн, 25 мая 2020 г. в 12:00, Nikolay Izhikov : > > This willl takes us to the re-encryption using full rebalancing > > Rebalance will require 2x efforts for reencryption > > 1. Read and send data from supplier node. > 2. Reencrypt and write data on demander node. > > Instead of > > 1. Read, reencrypt and write data on «demander» node. > Usually, reading and sending is not a bottleneck. And don't forget we can run out of WAL history and fall back to full rebalancing with partition eviction eliminating all efforts from offline re-encryption. On the other side, for a grid having many nodes one-by-one re-encryption can take a long time. It should also be possible to re-encrypt all data as fast as possible if, for example, if a load can be switched to another grid, where offline encryption will come in handy. So, I suggest to implement offline re-encryption and online re-encryption using rebalancing as a first step. Next step can be online in-place re-encryption. It's important to measure business impact from it on online grid. > > > > 25 мая 2020 г., в 11:46, Alexei Scherbakov > написал(а): > > > > For me, the one big disadvantage for offline re-encryption is the > > possibility to run out of WAL history. > > If an re-encryption takes a long time we will get full rebalancing with > > partition eviction. > > This willl takes us to the re-encryption using full rebalancing, proposed > > by me earlier. > > > > > > > > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov : > > > >>> And definitely this approach is much simplier to implement > >> > >> I agree. > >> > >> If we allow to made nodes offline for reencryption then we can > implement a > >> fully offline procedure: > >> > >> 1. Stop node. > >> 2. Execute some control.sh command that will reencrypt all data without > >> starting node > >> 3. Start node. > >> > >> Pavel, can you, please, write it one more time - what disadvantages in > >> offline procedure? > >> > >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov < > alexey.scherbak...@gmail.com> > >> написал(а): > >>> > >>> And definitely this approach is much simplier to implement because all > >>> corner cases are handled by rebalancing code. > >>> > >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov < > >> alexey.scherbak...@gmail.com > : > >>> > I mean: serving supply requests. > > пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < > alexey.scherbak...@gmail.com>: > > > Nikolay, > > > > Can you explain why such restriction is necessary ? > > Most likely having a currently re-encrypting node serving only demand > > requests will have least preformance impact on a grid. > > > > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : > > > >> Hello, Alexei. > >> > >> I think we want to implement this feature without nodes restart. > >> In the ideal scenario all nodes will stay alive and respond to the > >> user > >> requests. > >> > >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov < > >> alexey.scherbak...@gmail.com> написал(а): > >>> > >>> Pavel Pereslegin, > >>> > >>> I see another opportunity. > >>> We can use rebalancing to re-encrypt node data with a new key. > >>> It's a trivial procedure for me: stop a node, clear database, > change > >> a > >> key, > >>> start node and wait for rebalancing to complete. > >>> Data will be re-encrypted during rebalancing. > >>> > >>> Did I miss something ? > >>> > >>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov : > >>> > Folks, > > Just keeping you informed: I and my colleagues are highly > interested > >> in TDE > in general and keys rotations specifically, but we don't have > enough > >> time > so far. > We'll dive into this feature and participate in reviews next > month. > > -- > Best Regards, > Ivan Rakov > > On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin < > xxt...@gmail.com > >>> > wrote: > > > Hello, Alexey. > > > >> is the encryption key for the data the same on all nodes in the > cluster? > > Yes, each encrypted cache group has its own encryption key, the > key > >> is > > the same on all nodes. > > > >> Clearly, during the re-encryption there will exist pages > >> encrypted with both new and old keys at the same time. > > Yes, there will be pages encrypted with different keys at the > same > >> time. > > Currently, we only store one key for one cache group. To rotate a > >> key, > > at a certain point in time it is necessary to support several > keys > >> (at > > least for reading the WAL). > > For the "in place" strategy, we'll store the encryption key > >> identifier > > on each encrypted page (we currently have some unused space on > > encrypted
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
> This willl takes us to the re-encryption using full rebalancing Rebalance will require 2x efforts for reencryption 1. Read and send data from supplier node. 2. Reencrypt and write data on demander node. Instead of 1. Read, reencrypt and write data on «demander» node. > 25 мая 2020 г., в 11:46, Alexei Scherbakov > написал(а): > > For me, the one big disadvantage for offline re-encryption is the > possibility to run out of WAL history. > If an re-encryption takes a long time we will get full rebalancing with > partition eviction. > This willl takes us to the re-encryption using full rebalancing, proposed > by me earlier. > > > > пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov : > >>> And definitely this approach is much simplier to implement >> >> I agree. >> >> If we allow to made nodes offline for reencryption then we can implement a >> fully offline procedure: >> >> 1. Stop node. >> 2. Execute some control.sh command that will reencrypt all data without >> starting node >> 3. Start node. >> >> Pavel, can you, please, write it one more time - what disadvantages in >> offline procedure? >> >>> 25 мая 2020 г., в 11:20, Alexei Scherbakov >> написал(а): >>> >>> And definitely this approach is much simplier to implement because all >>> corner cases are handled by rebalancing code. >>> >>> пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov < >> alexey.scherbak...@gmail.com : >>> I mean: serving supply requests. пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < alexey.scherbak...@gmail.com>: > Nikolay, > > Can you explain why such restriction is necessary ? > Most likely having a currently re-encrypting node serving only demand > requests will have least preformance impact on a grid. > > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : > >> Hello, Alexei. >> >> I think we want to implement this feature without nodes restart. >> In the ideal scenario all nodes will stay alive and respond to the >> user >> requests. >> >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov < >> alexey.scherbak...@gmail.com> написал(а): >>> >>> Pavel Pereslegin, >>> >>> I see another opportunity. >>> We can use rebalancing to re-encrypt node data with a new key. >>> It's a trivial procedure for me: stop a node, clear database, change >> a >> key, >>> start node and wait for rebalancing to complete. >>> Data will be re-encrypted during rebalancing. >>> >>> Did I miss something ? >>> >>> пт, 22 мая 2020 г. в 16:14, Ivan Rakov : >>> Folks, Just keeping you informed: I and my colleagues are highly interested >> in TDE in general and keys rotations specifically, but we don't have enough >> time so far. We'll dive into this feature and participate in reviews next month. -- Best Regards, Ivan Rakov On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin >> wrote: > Hello, Alexey. > >> is the encryption key for the data the same on all nodes in the cluster? > Yes, each encrypted cache group has its own encryption key, the key >> is > the same on all nodes. > >> Clearly, during the re-encryption there will exist pages >> encrypted with both new and old keys at the same time. > Yes, there will be pages encrypted with different keys at the same >> time. > Currently, we only store one key for one cache group. To rotate a >> key, > at a certain point in time it is necessary to support several keys >> (at > least for reading the WAL). > For the "in place" strategy, we'll store the encryption key >> identifier > on each encrypted page (we currently have some unused space on > encrypted page, so I don't expect any memory overhead here). Thus, >> we > will have several keys for reading and one key for writing. I >> assume > that the old key will be automatically deleted when a specific WAL > segment is deleted (and re-encryption is finished). > >> Will a node continue to re-encrypt the data after it restarts? > Yes. > >> If a node goes down during the re-encryption, but the rest of the >> cluster finishes re-encryption, will we consider the procedure complete? > I'm not sure, but it looks like the key rotation is complete when >> we > set the new key on all nodes so that the updates will be encrypted > with the new key (as required by PCI DSS). > Status of re-encryption can be obtained separately (locally or >> cluster > wide). > > I forgot to mention that with “in place” re-encryption it will be > impossible to quickly cancel re-encryption, because by
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
For me, the one big disadvantage for offline re-encryption is the possibility to run out of WAL history. If an re-encryption takes a long time we will get full rebalancing with partition eviction. This willl takes us to the re-encryption using full rebalancing, proposed by me earlier. пн, 25 мая 2020 г. в 11:27, Nikolay Izhikov : > > And definitely this approach is much simplier to implement > > I agree. > > If we allow to made nodes offline for reencryption then we can implement a > fully offline procedure: > > 1. Stop node. > 2. Execute some control.sh command that will reencrypt all data without > starting node > 3. Start node. > > Pavel, can you, please, write it one more time - what disadvantages in > offline procedure? > > > 25 мая 2020 г., в 11:20, Alexei Scherbakov > написал(а): > > > > And definitely this approach is much simplier to implement because all > > corner cases are handled by rebalancing code. > > > > пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov < > alexey.scherbak...@gmail.com > >> : > > > >> I mean: serving supply requests. > >> > >> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < > >> alexey.scherbak...@gmail.com>: > >> > >>> Nikolay, > >>> > >>> Can you explain why such restriction is necessary ? > >>> Most likely having a currently re-encrypting node serving only demand > >>> requests will have least preformance impact on a grid. > >>> > >>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : > >>> > Hello, Alexei. > > I think we want to implement this feature without nodes restart. > In the ideal scenario all nodes will stay alive and respond to the > user > requests. > > > 24 мая 2020 г., в 15:24, Alexei Scherbakov < > alexey.scherbak...@gmail.com> написал(а): > > > > Pavel Pereslegin, > > > > I see another opportunity. > > We can use rebalancing to re-encrypt node data with a new key. > > It's a trivial procedure for me: stop a node, clear database, change > a > key, > > start node and wait for rebalancing to complete. > > Data will be re-encrypted during rebalancing. > > > > Did I miss something ? > > > > пт, 22 мая 2020 г. в 16:14, Ivan Rakov : > > > >> Folks, > >> > >> Just keeping you informed: I and my colleagues are highly interested > in TDE > >> in general and keys rotations specifically, but we don't have enough > time > >> so far. > >> We'll dive into this feature and participate in reviews next month. > >> > >> -- > >> Best Regards, > >> Ivan Rakov > >> > >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin > > >> wrote: > >> > >>> Hello, Alexey. > >>> > is the encryption key for the data the same on all nodes in the > >> cluster? > >>> Yes, each encrypted cache group has its own encryption key, the key > is > >>> the same on all nodes. > >>> > Clearly, during the re-encryption there will exist pages > encrypted with both new and old keys at the same time. > >>> Yes, there will be pages encrypted with different keys at the same > time. > >>> Currently, we only store one key for one cache group. To rotate a > key, > >>> at a certain point in time it is necessary to support several keys > (at > >>> least for reading the WAL). > >>> For the "in place" strategy, we'll store the encryption key > identifier > >>> on each encrypted page (we currently have some unused space on > >>> encrypted page, so I don't expect any memory overhead here). Thus, > we > >>> will have several keys for reading and one key for writing. I > assume > >>> that the old key will be automatically deleted when a specific WAL > >>> segment is deleted (and re-encryption is finished). > >>> > Will a node continue to re-encrypt the data after it restarts? > >>> Yes. > >>> > If a node goes down during the re-encryption, but the rest of the > cluster finishes re-encryption, will we consider the procedure > >> complete? > >>> I'm not sure, but it looks like the key rotation is complete when > we > >>> set the new key on all nodes so that the updates will be encrypted > >>> with the new key (as required by PCI DSS). > >>> Status of re-encryption can be obtained separately (locally or > cluster > >>> wide). > >>> > >>> I forgot to mention that with “in place” re-encryption it will be > >>> impossible to quickly cancel re-encryption, because by canceling we > >>> mean re-encryption with the old key. > >>> > How do you see the whole key rotation procedure will work? > >>> Initial design for re-encryption with "partition copying" is > described > >>> here [1]. I'll prepare detailed design for "in place" re-encryption > if > >>> we'll go this way. In short, send the new encryption key > cluster-wide, > >>> each
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
> And definitely this approach is much simplier to implement I agree. If we allow to made nodes offline for reencryption then we can implement a fully offline procedure: 1. Stop node. 2. Execute some control.sh command that will reencrypt all data without starting node 3. Start node. Pavel, can you, please, write it one more time - what disadvantages in offline procedure? > 25 мая 2020 г., в 11:20, Alexei Scherbakov > написал(а): > > And definitely this approach is much simplier to implement because all > corner cases are handled by rebalancing code. > > пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov > : > >> I mean: serving supply requests. >> >> пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < >> alexey.scherbak...@gmail.com>: >> >>> Nikolay, >>> >>> Can you explain why such restriction is necessary ? >>> Most likely having a currently re-encrypting node serving only demand >>> requests will have least preformance impact on a grid. >>> >>> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : >>> Hello, Alexei. I think we want to implement this feature without nodes restart. In the ideal scenario all nodes will stay alive and respond to the user requests. > 24 мая 2020 г., в 15:24, Alexei Scherbakov < alexey.scherbak...@gmail.com> написал(а): > > Pavel Pereslegin, > > I see another opportunity. > We can use rebalancing to re-encrypt node data with a new key. > It's a trivial procedure for me: stop a node, clear database, change a key, > start node and wait for rebalancing to complete. > Data will be re-encrypted during rebalancing. > > Did I miss something ? > > пт, 22 мая 2020 г. в 16:14, Ivan Rakov : > >> Folks, >> >> Just keeping you informed: I and my colleagues are highly interested in TDE >> in general and keys rotations specifically, but we don't have enough time >> so far. >> We'll dive into this feature and participate in reviews next month. >> >> -- >> Best Regards, >> Ivan Rakov >> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin >> wrote: >> >>> Hello, Alexey. >>> is the encryption key for the data the same on all nodes in the >> cluster? >>> Yes, each encrypted cache group has its own encryption key, the key is >>> the same on all nodes. >>> Clearly, during the re-encryption there will exist pages encrypted with both new and old keys at the same time. >>> Yes, there will be pages encrypted with different keys at the same time. >>> Currently, we only store one key for one cache group. To rotate a key, >>> at a certain point in time it is necessary to support several keys (at >>> least for reading the WAL). >>> For the "in place" strategy, we'll store the encryption key identifier >>> on each encrypted page (we currently have some unused space on >>> encrypted page, so I don't expect any memory overhead here). Thus, we >>> will have several keys for reading and one key for writing. I assume >>> that the old key will be automatically deleted when a specific WAL >>> segment is deleted (and re-encryption is finished). >>> Will a node continue to re-encrypt the data after it restarts? >>> Yes. >>> If a node goes down during the re-encryption, but the rest of the cluster finishes re-encryption, will we consider the procedure >> complete? >>> I'm not sure, but it looks like the key rotation is complete when we >>> set the new key on all nodes so that the updates will be encrypted >>> with the new key (as required by PCI DSS). >>> Status of re-encryption can be obtained separately (locally or cluster >>> wide). >>> >>> I forgot to mention that with “in place” re-encryption it will be >>> impossible to quickly cancel re-encryption, because by canceling we >>> mean re-encryption with the old key. >>> How do you see the whole key rotation procedure will work? >>> Initial design for re-encryption with "partition copying" is described >>> here [1]. I'll prepare detailed design for "in place" re-encryption if >>> we'll go this way. In short, send the new encryption key cluster-wide, >>> each node adds a new key and starts background re-encryption. >>> >>> [1] >>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign >>> . >>> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk < alexey.goncha...@gmail.com >>> : Pavel, Anton, How do you see the whole key rotation procedure will work? Clearly, >>> during the re-encryption there will exist pages encrypted with both new and >> old keys at the same
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
> Can you explain why such restriction is necessary ? Reencryption should have a minimum impact on the cluster. > Most likely having a currently re-encrypting node serving only demand > requests will have least preformance impact on a grid. Current design assumes that reencryption will started on all noes simultaneously. Makes sense? > 25 мая 2020 г., в 11:16, Alexei Scherbakov > написал(а): > > I mean: serving supply requests. > > пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov > : > >> Nikolay, >> >> Can you explain why such restriction is necessary ? >> Most likely having a currently re-encrypting node serving only demand >> requests will have least preformance impact on a grid. >> >> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : >> >>> Hello, Alexei. >>> >>> I think we want to implement this feature without nodes restart. >>> In the ideal scenario all nodes will stay alive and respond to the user >>> requests. >>> 24 мая 2020 г., в 15:24, Alexei Scherbakov < >>> alexey.scherbak...@gmail.com> написал(а): Pavel Pereslegin, I see another opportunity. We can use rebalancing to re-encrypt node data with a new key. It's a trivial procedure for me: stop a node, clear database, change a >>> key, start node and wait for rebalancing to complete. Data will be re-encrypted during rebalancing. Did I miss something ? пт, 22 мая 2020 г. в 16:14, Ivan Rakov : > Folks, > > Just keeping you informed: I and my colleagues are highly interested >>> in TDE > in general and keys rotations specifically, but we don't have enough >>> time > so far. > We'll dive into this feature and participate in reviews next month. > > -- > Best Regards, > Ivan Rakov > > On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin > wrote: > >> Hello, Alexey. >> >>> is the encryption key for the data the same on all nodes in the > cluster? >> Yes, each encrypted cache group has its own encryption key, the key is >> the same on all nodes. >> >>> Clearly, during the re-encryption there will exist pages >>> encrypted with both new and old keys at the same time. >> Yes, there will be pages encrypted with different keys at the same >>> time. >> Currently, we only store one key for one cache group. To rotate a key, >> at a certain point in time it is necessary to support several keys (at >> least for reading the WAL). >> For the "in place" strategy, we'll store the encryption key identifier >> on each encrypted page (we currently have some unused space on >> encrypted page, so I don't expect any memory overhead here). Thus, we >> will have several keys for reading and one key for writing. I assume >> that the old key will be automatically deleted when a specific WAL >> segment is deleted (and re-encryption is finished). >> >>> Will a node continue to re-encrypt the data after it restarts? >> Yes. >> >>> If a node goes down during the re-encryption, but the rest of the >>> cluster finishes re-encryption, will we consider the procedure > complete? >> I'm not sure, but it looks like the key rotation is complete when we >> set the new key on all nodes so that the updates will be encrypted >> with the new key (as required by PCI DSS). >> Status of re-encryption can be obtained separately (locally or cluster >> wide). >> >> I forgot to mention that with “in place” re-encryption it will be >> impossible to quickly cancel re-encryption, because by canceling we >> mean re-encryption with the old key. >> >>> How do you see the whole key rotation procedure will work? >> Initial design for re-encryption with "partition copying" is described >> here [1]. I'll prepare detailed design for "in place" re-encryption if >> we'll go this way. In short, send the new encryption key cluster-wide, >> each node adds a new key and starts background re-encryption. >> >> [1] >> > >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign >> . >> >> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk < >>> alexey.goncha...@gmail.com >> : >>> >>> Pavel, Anton, >>> >>> How do you see the whole key rotation procedure will work? Clearly, >> during >>> the re-encryption there will exist pages encrypted with both new and > old >>> keys at the same time. Will a node continue to re-encrypt the data > after >> it >>> restarts? If a node goes down during the re-encryption, but the rest >>> of >> the >>> cluster finishes re-encryption, will we consider the procedure > complete? >> By >>> the way, is the encryption key for the data the same on all nodes in > the >>> cluster? >>> >>> чт, 14 мая
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
And definitely this approach is much simplier to implement because all corner cases are handled by rebalancing code. пн, 25 мая 2020 г. в 11:16, Alexei Scherbakov : > I mean: serving supply requests. > > пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov < > alexey.scherbak...@gmail.com>: > >> Nikolay, >> >> Can you explain why such restriction is necessary ? >> Most likely having a currently re-encrypting node serving only demand >> requests will have least preformance impact on a grid. >> >> пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : >> >>> Hello, Alexei. >>> >>> I think we want to implement this feature without nodes restart. >>> In the ideal scenario all nodes will stay alive and respond to the user >>> requests. >>> >>> > 24 мая 2020 г., в 15:24, Alexei Scherbakov < >>> alexey.scherbak...@gmail.com> написал(а): >>> > >>> > Pavel Pereslegin, >>> > >>> > I see another opportunity. >>> > We can use rebalancing to re-encrypt node data with a new key. >>> > It's a trivial procedure for me: stop a node, clear database, change a >>> key, >>> > start node and wait for rebalancing to complete. >>> > Data will be re-encrypted during rebalancing. >>> > >>> > Did I miss something ? >>> > >>> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov : >>> > >>> >> Folks, >>> >> >>> >> Just keeping you informed: I and my colleagues are highly interested >>> in TDE >>> >> in general and keys rotations specifically, but we don't have enough >>> time >>> >> so far. >>> >> We'll dive into this feature and participate in reviews next month. >>> >> >>> >> -- >>> >> Best Regards, >>> >> Ivan Rakov >>> >> >>> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin >>> >> wrote: >>> >> >>> >>> Hello, Alexey. >>> >>> >>> is the encryption key for the data the same on all nodes in the >>> >> cluster? >>> >>> Yes, each encrypted cache group has its own encryption key, the key >>> is >>> >>> the same on all nodes. >>> >>> >>> Clearly, during the re-encryption there will exist pages >>> encrypted with both new and old keys at the same time. >>> >>> Yes, there will be pages encrypted with different keys at the same >>> time. >>> >>> Currently, we only store one key for one cache group. To rotate a >>> key, >>> >>> at a certain point in time it is necessary to support several keys >>> (at >>> >>> least for reading the WAL). >>> >>> For the "in place" strategy, we'll store the encryption key >>> identifier >>> >>> on each encrypted page (we currently have some unused space on >>> >>> encrypted page, so I don't expect any memory overhead here). Thus, we >>> >>> will have several keys for reading and one key for writing. I assume >>> >>> that the old key will be automatically deleted when a specific WAL >>> >>> segment is deleted (and re-encryption is finished). >>> >>> >>> Will a node continue to re-encrypt the data after it restarts? >>> >>> Yes. >>> >>> >>> If a node goes down during the re-encryption, but the rest of the >>> cluster finishes re-encryption, will we consider the procedure >>> >> complete? >>> >>> I'm not sure, but it looks like the key rotation is complete when we >>> >>> set the new key on all nodes so that the updates will be encrypted >>> >>> with the new key (as required by PCI DSS). >>> >>> Status of re-encryption can be obtained separately (locally or >>> cluster >>> >>> wide). >>> >>> >>> >>> I forgot to mention that with “in place” re-encryption it will be >>> >>> impossible to quickly cancel re-encryption, because by canceling we >>> >>> mean re-encryption with the old key. >>> >>> >>> How do you see the whole key rotation procedure will work? >>> >>> Initial design for re-encryption with "partition copying" is >>> described >>> >>> here [1]. I'll prepare detailed design for "in place" re-encryption >>> if >>> >>> we'll go this way. In short, send the new encryption key >>> cluster-wide, >>> >>> each node adds a new key and starts background re-encryption. >>> >>> >>> >>> [1] >>> >>> >>> >> >>> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign >>> >>> . >>> >>> >>> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk < >>> alexey.goncha...@gmail.com >>> >>> : >>> >>> Pavel, Anton, >>> >>> How do you see the whole key rotation procedure will work? Clearly, >>> >>> during >>> the re-encryption there will exist pages encrypted with both new and >>> >> old >>> keys at the same time. Will a node continue to re-encrypt the data >>> >> after >>> >>> it >>> restarts? If a node goes down during the re-encryption, but the >>> rest of >>> >>> the >>> cluster finishes re-encryption, will we consider the procedure >>> >> complete? >>> >>> By >>> the way, is the encryption key for the data the same on all nodes in >>> >> the >>> cluster? >>> >>> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : >>> >>> > +1 to "In place re-encryption". >>> > >>> > - It
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
I mean: serving supply requests. пн, 25 мая 2020 г. в 11:15, Alexei Scherbakov : > Nikolay, > > Can you explain why such restriction is necessary ? > Most likely having a currently re-encrypting node serving only demand > requests will have least preformance impact on a grid. > > пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : > >> Hello, Alexei. >> >> I think we want to implement this feature without nodes restart. >> In the ideal scenario all nodes will stay alive and respond to the user >> requests. >> >> > 24 мая 2020 г., в 15:24, Alexei Scherbakov < >> alexey.scherbak...@gmail.com> написал(а): >> > >> > Pavel Pereslegin, >> > >> > I see another opportunity. >> > We can use rebalancing to re-encrypt node data with a new key. >> > It's a trivial procedure for me: stop a node, clear database, change a >> key, >> > start node and wait for rebalancing to complete. >> > Data will be re-encrypted during rebalancing. >> > >> > Did I miss something ? >> > >> > пт, 22 мая 2020 г. в 16:14, Ivan Rakov : >> > >> >> Folks, >> >> >> >> Just keeping you informed: I and my colleagues are highly interested >> in TDE >> >> in general and keys rotations specifically, but we don't have enough >> time >> >> so far. >> >> We'll dive into this feature and participate in reviews next month. >> >> >> >> -- >> >> Best Regards, >> >> Ivan Rakov >> >> >> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin >> >> wrote: >> >> >> >>> Hello, Alexey. >> >>> >> is the encryption key for the data the same on all nodes in the >> >> cluster? >> >>> Yes, each encrypted cache group has its own encryption key, the key is >> >>> the same on all nodes. >> >>> >> Clearly, during the re-encryption there will exist pages >> encrypted with both new and old keys at the same time. >> >>> Yes, there will be pages encrypted with different keys at the same >> time. >> >>> Currently, we only store one key for one cache group. To rotate a key, >> >>> at a certain point in time it is necessary to support several keys (at >> >>> least for reading the WAL). >> >>> For the "in place" strategy, we'll store the encryption key identifier >> >>> on each encrypted page (we currently have some unused space on >> >>> encrypted page, so I don't expect any memory overhead here). Thus, we >> >>> will have several keys for reading and one key for writing. I assume >> >>> that the old key will be automatically deleted when a specific WAL >> >>> segment is deleted (and re-encryption is finished). >> >>> >> Will a node continue to re-encrypt the data after it restarts? >> >>> Yes. >> >>> >> If a node goes down during the re-encryption, but the rest of the >> cluster finishes re-encryption, will we consider the procedure >> >> complete? >> >>> I'm not sure, but it looks like the key rotation is complete when we >> >>> set the new key on all nodes so that the updates will be encrypted >> >>> with the new key (as required by PCI DSS). >> >>> Status of re-encryption can be obtained separately (locally or cluster >> >>> wide). >> >>> >> >>> I forgot to mention that with “in place” re-encryption it will be >> >>> impossible to quickly cancel re-encryption, because by canceling we >> >>> mean re-encryption with the old key. >> >>> >> How do you see the whole key rotation procedure will work? >> >>> Initial design for re-encryption with "partition copying" is described >> >>> here [1]. I'll prepare detailed design for "in place" re-encryption if >> >>> we'll go this way. In short, send the new encryption key cluster-wide, >> >>> each node adds a new key and starts background re-encryption. >> >>> >> >>> [1] >> >>> >> >> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign >> >>> . >> >>> >> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk < >> alexey.goncha...@gmail.com >> >>> : >> >> Pavel, Anton, >> >> How do you see the whole key rotation procedure will work? Clearly, >> >>> during >> the re-encryption there will exist pages encrypted with both new and >> >> old >> keys at the same time. Will a node continue to re-encrypt the data >> >> after >> >>> it >> restarts? If a node goes down during the re-encryption, but the rest >> of >> >>> the >> cluster finishes re-encryption, will we consider the procedure >> >> complete? >> >>> By >> the way, is the encryption key for the data the same on all nodes in >> >> the >> cluster? >> >> чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : >> >> > +1 to "In place re-encryption". >> > >> > - It has a simple design. >> > - Clusters under load may require just load to re-encrypt the data. >> > (Friendly to load). >> > - Easy to throttle. >> > - Easy to continue. >> > - Design compatible with the multi-key architecture. >> > - It can be optimized to use own WAL buffer and to re-encrypt pages >> >>> without >> > restoring them
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Nikolay, Can you explain why such restriction is necessary ? Most likely having a currently re-encrypting node serving only demand requests will have least preformance impact on a grid. пн, 25 мая 2020 г. в 11:08, Nikolay Izhikov : > Hello, Alexei. > > I think we want to implement this feature without nodes restart. > In the ideal scenario all nodes will stay alive and respond to the user > requests. > > > 24 мая 2020 г., в 15:24, Alexei Scherbakov > написал(а): > > > > Pavel Pereslegin, > > > > I see another opportunity. > > We can use rebalancing to re-encrypt node data with a new key. > > It's a trivial procedure for me: stop a node, clear database, change a > key, > > start node and wait for rebalancing to complete. > > Data will be re-encrypted during rebalancing. > > > > Did I miss something ? > > > > пт, 22 мая 2020 г. в 16:14, Ivan Rakov : > > > >> Folks, > >> > >> Just keeping you informed: I and my colleagues are highly interested in > TDE > >> in general and keys rotations specifically, but we don't have enough > time > >> so far. > >> We'll dive into this feature and participate in reviews next month. > >> > >> -- > >> Best Regards, > >> Ivan Rakov > >> > >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin > >> wrote: > >> > >>> Hello, Alexey. > >>> > is the encryption key for the data the same on all nodes in the > >> cluster? > >>> Yes, each encrypted cache group has its own encryption key, the key is > >>> the same on all nodes. > >>> > Clearly, during the re-encryption there will exist pages > encrypted with both new and old keys at the same time. > >>> Yes, there will be pages encrypted with different keys at the same > time. > >>> Currently, we only store one key for one cache group. To rotate a key, > >>> at a certain point in time it is necessary to support several keys (at > >>> least for reading the WAL). > >>> For the "in place" strategy, we'll store the encryption key identifier > >>> on each encrypted page (we currently have some unused space on > >>> encrypted page, so I don't expect any memory overhead here). Thus, we > >>> will have several keys for reading and one key for writing. I assume > >>> that the old key will be automatically deleted when a specific WAL > >>> segment is deleted (and re-encryption is finished). > >>> > Will a node continue to re-encrypt the data after it restarts? > >>> Yes. > >>> > If a node goes down during the re-encryption, but the rest of the > cluster finishes re-encryption, will we consider the procedure > >> complete? > >>> I'm not sure, but it looks like the key rotation is complete when we > >>> set the new key on all nodes so that the updates will be encrypted > >>> with the new key (as required by PCI DSS). > >>> Status of re-encryption can be obtained separately (locally or cluster > >>> wide). > >>> > >>> I forgot to mention that with “in place” re-encryption it will be > >>> impossible to quickly cancel re-encryption, because by canceling we > >>> mean re-encryption with the old key. > >>> > How do you see the whole key rotation procedure will work? > >>> Initial design for re-encryption with "partition copying" is described > >>> here [1]. I'll prepare detailed design for "in place" re-encryption if > >>> we'll go this way. In short, send the new encryption key cluster-wide, > >>> each node adds a new key and starts background re-encryption. > >>> > >>> [1] > >>> > >> > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > >>> . > >>> > >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk < > alexey.goncha...@gmail.com > >>> : > > Pavel, Anton, > > How do you see the whole key rotation procedure will work? Clearly, > >>> during > the re-encryption there will exist pages encrypted with both new and > >> old > keys at the same time. Will a node continue to re-encrypt the data > >> after > >>> it > restarts? If a node goes down during the re-encryption, but the rest > of > >>> the > cluster finishes re-encryption, will we consider the procedure > >> complete? > >>> By > the way, is the encryption key for the data the same on all nodes in > >> the > cluster? > > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : > > > +1 to "In place re-encryption". > > > > - It has a simple design. > > - Clusters under load may require just load to re-encrypt the data. > > (Friendly to load). > > - Easy to throttle. > > - Easy to continue. > > - Design compatible with the multi-key architecture. > > - It can be optimized to use own WAL buffer and to re-encrypt pages > >>> without > > restoring them to on-heap. > > > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin > >>> wrote: > > > >> Hello Igniters. > >> > >> Recently, master key rotation for Apache Ignite Transparent Data > >> Encryption was implemented [1],
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello, Alexei. I think we want to implement this feature without nodes restart. In the ideal scenario all nodes will stay alive and respond to the user requests. > 24 мая 2020 г., в 15:24, Alexei Scherbakov > написал(а): > > Pavel Pereslegin, > > I see another opportunity. > We can use rebalancing to re-encrypt node data with a new key. > It's a trivial procedure for me: stop a node, clear database, change a key, > start node and wait for rebalancing to complete. > Data will be re-encrypted during rebalancing. > > Did I miss something ? > > пт, 22 мая 2020 г. в 16:14, Ivan Rakov : > >> Folks, >> >> Just keeping you informed: I and my colleagues are highly interested in TDE >> in general and keys rotations specifically, but we don't have enough time >> so far. >> We'll dive into this feature and participate in reviews next month. >> >> -- >> Best Regards, >> Ivan Rakov >> >> On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin >> wrote: >> >>> Hello, Alexey. >>> is the encryption key for the data the same on all nodes in the >> cluster? >>> Yes, each encrypted cache group has its own encryption key, the key is >>> the same on all nodes. >>> Clearly, during the re-encryption there will exist pages encrypted with both new and old keys at the same time. >>> Yes, there will be pages encrypted with different keys at the same time. >>> Currently, we only store one key for one cache group. To rotate a key, >>> at a certain point in time it is necessary to support several keys (at >>> least for reading the WAL). >>> For the "in place" strategy, we'll store the encryption key identifier >>> on each encrypted page (we currently have some unused space on >>> encrypted page, so I don't expect any memory overhead here). Thus, we >>> will have several keys for reading and one key for writing. I assume >>> that the old key will be automatically deleted when a specific WAL >>> segment is deleted (and re-encryption is finished). >>> Will a node continue to re-encrypt the data after it restarts? >>> Yes. >>> If a node goes down during the re-encryption, but the rest of the cluster finishes re-encryption, will we consider the procedure >> complete? >>> I'm not sure, but it looks like the key rotation is complete when we >>> set the new key on all nodes so that the updates will be encrypted >>> with the new key (as required by PCI DSS). >>> Status of re-encryption can be obtained separately (locally or cluster >>> wide). >>> >>> I forgot to mention that with “in place” re-encryption it will be >>> impossible to quickly cancel re-encryption, because by canceling we >>> mean re-encryption with the old key. >>> How do you see the whole key rotation procedure will work? >>> Initial design for re-encryption with "partition copying" is described >>> here [1]. I'll prepare detailed design for "in place" re-encryption if >>> we'll go this way. In short, send the new encryption key cluster-wide, >>> each node adds a new key and starts background re-encryption. >>> >>> [1] >>> >> https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign >>> . >>> >>> вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk >> : Pavel, Anton, How do you see the whole key rotation procedure will work? Clearly, >>> during the re-encryption there will exist pages encrypted with both new and >> old keys at the same time. Will a node continue to re-encrypt the data >> after >>> it restarts? If a node goes down during the re-encryption, but the rest of >>> the cluster finishes re-encryption, will we consider the procedure >> complete? >>> By the way, is the encryption key for the data the same on all nodes in >> the cluster? чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : > +1 to "In place re-encryption". > > - It has a simple design. > - Clusters under load may require just load to re-encrypt the data. > (Friendly to load). > - Easy to throttle. > - Easy to continue. > - Design compatible with the multi-key architecture. > - It can be optimized to use own WAL buffer and to re-encrypt pages >>> without > restoring them to on-heap. > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin >>> wrote: > >> Hello Igniters. >> >> Recently, master key rotation for Apache Ignite Transparent Data >> Encryption was implemented [1], but some security standards (PCI >> DSS >> at least) require rotation of all encryption keys [2]. Currently, >> encryption occurs when reading/writing pages to disk, cache >>> encryption >> keys are stored in metastore. >> >> I'm going to contribute cache encryption key rotation and want to >> consult what is the best way to re-encrypting existing data, I see >>> two >> different strategies. >> >> 1. In place re-encryption: >> Using the old key,
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Pavel Pereslegin, I see another opportunity. We can use rebalancing to re-encrypt node data with a new key. It's a trivial procedure for me: stop a node, clear database, change a key, start node and wait for rebalancing to complete. Data will be re-encrypted during rebalancing. Did I miss something ? пт, 22 мая 2020 г. в 16:14, Ivan Rakov : > Folks, > > Just keeping you informed: I and my colleagues are highly interested in TDE > in general and keys rotations specifically, but we don't have enough time > so far. > We'll dive into this feature and participate in reviews next month. > > -- > Best Regards, > Ivan Rakov > > On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin > wrote: > > > Hello, Alexey. > > > > > is the encryption key for the data the same on all nodes in the > cluster? > > Yes, each encrypted cache group has its own encryption key, the key is > > the same on all nodes. > > > > > Clearly, during the re-encryption there will exist pages > > > encrypted with both new and old keys at the same time. > > Yes, there will be pages encrypted with different keys at the same time. > > Currently, we only store one key for one cache group. To rotate a key, > > at a certain point in time it is necessary to support several keys (at > > least for reading the WAL). > > For the "in place" strategy, we'll store the encryption key identifier > > on each encrypted page (we currently have some unused space on > > encrypted page, so I don't expect any memory overhead here). Thus, we > > will have several keys for reading and one key for writing. I assume > > that the old key will be automatically deleted when a specific WAL > > segment is deleted (and re-encryption is finished). > > > > > Will a node continue to re-encrypt the data after it restarts? > > Yes. > > > > > If a node goes down during the re-encryption, but the rest of the > > > cluster finishes re-encryption, will we consider the procedure > complete? > > I'm not sure, but it looks like the key rotation is complete when we > > set the new key on all nodes so that the updates will be encrypted > > with the new key (as required by PCI DSS). > > Status of re-encryption can be obtained separately (locally or cluster > > wide). > > > > I forgot to mention that with “in place” re-encryption it will be > > impossible to quickly cancel re-encryption, because by canceling we > > mean re-encryption with the old key. > > > > > How do you see the whole key rotation procedure will work? > > Initial design for re-encryption with "partition copying" is described > > here [1]. I'll prepare detailed design for "in place" re-encryption if > > we'll go this way. In short, send the new encryption key cluster-wide, > > each node adds a new key and starts background re-encryption. > > > > [1] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > > . > > > > вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk >: > > > > > > Pavel, Anton, > > > > > > How do you see the whole key rotation procedure will work? Clearly, > > during > > > the re-encryption there will exist pages encrypted with both new and > old > > > keys at the same time. Will a node continue to re-encrypt the data > after > > it > > > restarts? If a node goes down during the re-encryption, but the rest of > > the > > > cluster finishes re-encryption, will we consider the procedure > complete? > > By > > > the way, is the encryption key for the data the same on all nodes in > the > > > cluster? > > > > > > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : > > > > > > > +1 to "In place re-encryption". > > > > > > > > - It has a simple design. > > > > - Clusters under load may require just load to re-encrypt the data. > > > > (Friendly to load). > > > > - Easy to throttle. > > > > - Easy to continue. > > > > - Design compatible with the multi-key architecture. > > > > - It can be optimized to use own WAL buffer and to re-encrypt pages > > without > > > > restoring them to on-heap. > > > > > > > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin > > wrote: > > > > > > > > > Hello Igniters. > > > > > > > > > > Recently, master key rotation for Apache Ignite Transparent Data > > > > > Encryption was implemented [1], but some security standards (PCI > DSS > > > > > at least) require rotation of all encryption keys [2]. Currently, > > > > > encryption occurs when reading/writing pages to disk, cache > > encryption > > > > > keys are stored in metastore. > > > > > > > > > > I'm going to contribute cache encryption key rotation and want to > > > > > consult what is the best way to re-encrypting existing data, I see > > two > > > > > different strategies. > > > > > > > > > > 1. In place re-encryption: > > > > > Using the old key, sequentially read all the pages from the > > datastore, > > > > > mark as dirty and log them into the WAL. After checkpoint pages > will > > > > > be stored to disk encrypted with the new key (as usual, along with >
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Folks, Just keeping you informed: I and my colleagues are highly interested in TDE in general and keys rotations specifically, but we don't have enough time so far. We'll dive into this feature and participate in reviews next month. -- Best Regards, Ivan Rakov On Sun, May 17, 2020 at 10:51 PM Pavel Pereslegin wrote: > Hello, Alexey. > > > is the encryption key for the data the same on all nodes in the cluster? > Yes, each encrypted cache group has its own encryption key, the key is > the same on all nodes. > > > Clearly, during the re-encryption there will exist pages > > encrypted with both new and old keys at the same time. > Yes, there will be pages encrypted with different keys at the same time. > Currently, we only store one key for one cache group. To rotate a key, > at a certain point in time it is necessary to support several keys (at > least for reading the WAL). > For the "in place" strategy, we'll store the encryption key identifier > on each encrypted page (we currently have some unused space on > encrypted page, so I don't expect any memory overhead here). Thus, we > will have several keys for reading and one key for writing. I assume > that the old key will be automatically deleted when a specific WAL > segment is deleted (and re-encryption is finished). > > > Will a node continue to re-encrypt the data after it restarts? > Yes. > > > If a node goes down during the re-encryption, but the rest of the > > cluster finishes re-encryption, will we consider the procedure complete? > I'm not sure, but it looks like the key rotation is complete when we > set the new key on all nodes so that the updates will be encrypted > with the new key (as required by PCI DSS). > Status of re-encryption can be obtained separately (locally or cluster > wide). > > I forgot to mention that with “in place” re-encryption it will be > impossible to quickly cancel re-encryption, because by canceling we > mean re-encryption with the old key. > > > How do you see the whole key rotation procedure will work? > Initial design for re-encryption with "partition copying" is described > here [1]. I'll prepare detailed design for "in place" re-encryption if > we'll go this way. In short, send the new encryption key cluster-wide, > each node adds a new key and starts background re-encryption. > > [1] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > . > > вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk : > > > > Pavel, Anton, > > > > How do you see the whole key rotation procedure will work? Clearly, > during > > the re-encryption there will exist pages encrypted with both new and old > > keys at the same time. Will a node continue to re-encrypt the data after > it > > restarts? If a node goes down during the re-encryption, but the rest of > the > > cluster finishes re-encryption, will we consider the procedure complete? > By > > the way, is the encryption key for the data the same on all nodes in the > > cluster? > > > > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : > > > > > +1 to "In place re-encryption". > > > > > > - It has a simple design. > > > - Clusters under load may require just load to re-encrypt the data. > > > (Friendly to load). > > > - Easy to throttle. > > > - Easy to continue. > > > - Design compatible with the multi-key architecture. > > > - It can be optimized to use own WAL buffer and to re-encrypt pages > without > > > restoring them to on-heap. > > > > > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin > wrote: > > > > > > > Hello Igniters. > > > > > > > > Recently, master key rotation for Apache Ignite Transparent Data > > > > Encryption was implemented [1], but some security standards (PCI DSS > > > > at least) require rotation of all encryption keys [2]. Currently, > > > > encryption occurs when reading/writing pages to disk, cache > encryption > > > > keys are stored in metastore. > > > > > > > > I'm going to contribute cache encryption key rotation and want to > > > > consult what is the best way to re-encrypting existing data, I see > two > > > > different strategies. > > > > > > > > 1. In place re-encryption: > > > > Using the old key, sequentially read all the pages from the > datastore, > > > > mark as dirty and log them into the WAL. After checkpoint pages will > > > > be stored to disk encrypted with the new key (as usual, along with > > > > updates). This strategy requires store the identifier (number) of the > > > > encryption key into the encrypted page. > > > > pros: > > > > - can work in the background with minimal performance impact (this > > > > impact can be managed). > > > > cons: > > > > - page duplication in the WAL may affect performance and historical > > > > rebalance. > > > > > > > > 2. Copy partition with re-encryption. > > > > This strategy is similar to partition snapshotting [3] - create > > > > partition copy encrypted with the new key and then replace the > > > > original partition
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello, Alexey. > is the encryption key for the data the same on all nodes in the cluster? Yes, each encrypted cache group has its own encryption key, the key is the same on all nodes. > Clearly, during the re-encryption there will exist pages > encrypted with both new and old keys at the same time. Yes, there will be pages encrypted with different keys at the same time. Currently, we only store one key for one cache group. To rotate a key, at a certain point in time it is necessary to support several keys (at least for reading the WAL). For the "in place" strategy, we'll store the encryption key identifier on each encrypted page (we currently have some unused space on encrypted page, so I don't expect any memory overhead here). Thus, we will have several keys for reading and one key for writing. I assume that the old key will be automatically deleted when a specific WAL segment is deleted (and re-encryption is finished). > Will a node continue to re-encrypt the data after it restarts? Yes. > If a node goes down during the re-encryption, but the rest of the > cluster finishes re-encryption, will we consider the procedure complete? I'm not sure, but it looks like the key rotation is complete when we set the new key on all nodes so that the updates will be encrypted with the new key (as required by PCI DSS). Status of re-encryption can be obtained separately (locally or cluster wide). I forgot to mention that with “in place” re-encryption it will be impossible to quickly cancel re-encryption, because by canceling we mean re-encryption with the old key. > How do you see the whole key rotation procedure will work? Initial design for re-encryption with "partition copying" is described here [1]. I'll prepare detailed design for "in place" re-encryption if we'll go this way. In short, send the new encryption key cluster-wide, each node adds a new key and starts background re-encryption. [1] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign. вс, 17 мая 2020 г. в 18:35, Alexey Goncharuk : > > Pavel, Anton, > > How do you see the whole key rotation procedure will work? Clearly, during > the re-encryption there will exist pages encrypted with both new and old > keys at the same time. Will a node continue to re-encrypt the data after it > restarts? If a node goes down during the re-encryption, but the rest of the > cluster finishes re-encryption, will we consider the procedure complete? By > the way, is the encryption key for the data the same on all nodes in the > cluster? > > чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : > > > +1 to "In place re-encryption". > > > > - It has a simple design. > > - Clusters under load may require just load to re-encrypt the data. > > (Friendly to load). > > - Easy to throttle. > > - Easy to continue. > > - Design compatible with the multi-key architecture. > > - It can be optimized to use own WAL buffer and to re-encrypt pages without > > restoring them to on-heap. > > > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin wrote: > > > > > Hello Igniters. > > > > > > Recently, master key rotation for Apache Ignite Transparent Data > > > Encryption was implemented [1], but some security standards (PCI DSS > > > at least) require rotation of all encryption keys [2]. Currently, > > > encryption occurs when reading/writing pages to disk, cache encryption > > > keys are stored in metastore. > > > > > > I'm going to contribute cache encryption key rotation and want to > > > consult what is the best way to re-encrypting existing data, I see two > > > different strategies. > > > > > > 1. In place re-encryption: > > > Using the old key, sequentially read all the pages from the datastore, > > > mark as dirty and log them into the WAL. After checkpoint pages will > > > be stored to disk encrypted with the new key (as usual, along with > > > updates). This strategy requires store the identifier (number) of the > > > encryption key into the encrypted page. > > > pros: > > > - can work in the background with minimal performance impact (this > > > impact can be managed). > > > cons: > > > - page duplication in the WAL may affect performance and historical > > > rebalance. > > > > > > 2. Copy partition with re-encryption. > > > This strategy is similar to partition snapshotting [3] - create > > > partition copy encrypted with the new key and then replace the > > > original partition file with the new one (see details [4]). > > > pros: > > > - should work faster than "in place" re-encryption. > > > cons: > > > - re-encryption in active cluster (and on unstable topology) can be > > > difficult to implement. > > > > > > (See more detailed comparison [5]) > > > > > > Re-encryption of existing data is a long and rare procedure (It is > > > recommended to change the key every 6 months, but at least once every > > > 2 years). Thus, re-encryption can be implemented for maintenance mode > > > (for example, on a
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Pavel, Anton, How do you see the whole key rotation procedure will work? Clearly, during the re-encryption there will exist pages encrypted with both new and old keys at the same time. Will a node continue to re-encrypt the data after it restarts? If a node goes down during the re-encryption, but the rest of the cluster finishes re-encryption, will we consider the procedure complete? By the way, is the encryption key for the data the same on all nodes in the cluster? чт, 14 мая 2020 г. в 11:30, Anton Vinogradov : > +1 to "In place re-encryption". > > - It has a simple design. > - Clusters under load may require just load to re-encrypt the data. > (Friendly to load). > - Easy to throttle. > - Easy to continue. > - Design compatible with the multi-key architecture. > - It can be optimized to use own WAL buffer and to re-encrypt pages without > restoring them to on-heap. > > On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin wrote: > > > Hello Igniters. > > > > Recently, master key rotation for Apache Ignite Transparent Data > > Encryption was implemented [1], but some security standards (PCI DSS > > at least) require rotation of all encryption keys [2]. Currently, > > encryption occurs when reading/writing pages to disk, cache encryption > > keys are stored in metastore. > > > > I'm going to contribute cache encryption key rotation and want to > > consult what is the best way to re-encrypting existing data, I see two > > different strategies. > > > > 1. In place re-encryption: > > Using the old key, sequentially read all the pages from the datastore, > > mark as dirty and log them into the WAL. After checkpoint pages will > > be stored to disk encrypted with the new key (as usual, along with > > updates). This strategy requires store the identifier (number) of the > > encryption key into the encrypted page. > > pros: > > - can work in the background with minimal performance impact (this > > impact can be managed). > > cons: > > - page duplication in the WAL may affect performance and historical > > rebalance. > > > > 2. Copy partition with re-encryption. > > This strategy is similar to partition snapshotting [3] - create > > partition copy encrypted with the new key and then replace the > > original partition file with the new one (see details [4]). > > pros: > > - should work faster than "in place" re-encryption. > > cons: > > - re-encryption in active cluster (and on unstable topology) can be > > difficult to implement. > > > > (See more detailed comparison [5]) > > > > Re-encryption of existing data is a long and rare procedure (It is > > recommended to change the key every 6 months, but at least once every > > 2 years). Thus, re-encryption can be implemented for maintenance mode > > (for example, on a stable topology in a read-only cluster) and in such > > case the approach with partition copying seems simpler and faster. > > > > So, what do you think - do we need "online" re-encryption and which of > > the proposed options is best suited for this? > > > > [1] https://issues.apache.org/jira/browse/IGNITE-12186 > > [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf > > [3] > > > https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy > > [4] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > > . > > [5] > > > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison > > >
Re: [DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
+1 to "In place re-encryption". - It has a simple design. - Clusters under load may require just load to re-encrypt the data. (Friendly to load). - Easy to throttle. - Easy to continue. - Design compatible with the multi-key architecture. - It can be optimized to use own WAL buffer and to re-encrypt pages without restoring them to on-heap. On Thu, May 14, 2020 at 1:54 AM Pavel Pereslegin wrote: > Hello Igniters. > > Recently, master key rotation for Apache Ignite Transparent Data > Encryption was implemented [1], but some security standards (PCI DSS > at least) require rotation of all encryption keys [2]. Currently, > encryption occurs when reading/writing pages to disk, cache encryption > keys are stored in metastore. > > I'm going to contribute cache encryption key rotation and want to > consult what is the best way to re-encrypting existing data, I see two > different strategies. > > 1. In place re-encryption: > Using the old key, sequentially read all the pages from the datastore, > mark as dirty and log them into the WAL. After checkpoint pages will > be stored to disk encrypted with the new key (as usual, along with > updates). This strategy requires store the identifier (number) of the > encryption key into the encrypted page. > pros: > - can work in the background with minimal performance impact (this > impact can be managed). > cons: > - page duplication in the WAL may affect performance and historical > rebalance. > > 2. Copy partition with re-encryption. > This strategy is similar to partition snapshotting [3] - create > partition copy encrypted with the new key and then replace the > original partition file with the new one (see details [4]). > pros: > - should work faster than "in place" re-encryption. > cons: > - re-encryption in active cluster (and on unstable topology) can be > difficult to implement. > > (See more detailed comparison [5]) > > Re-encryption of existing data is a long and rare procedure (It is > recommended to change the key every 6 months, but at least once every > 2 years). Thus, re-encryption can be implemented for maintenance mode > (for example, on a stable topology in a read-only cluster) and in such > case the approach with partition copying seems simpler and faster. > > So, what do you think - do we need "online" re-encryption and which of > the proposed options is best suited for this? > > [1] https://issues.apache.org/jira/browse/IGNITE-12186 > [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf > [3] > https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy > [4] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign > . > [5] > https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison >
[DISCUSS] Best way to re-encrypt existing data (TDE cache key rotation).
Hello Igniters. Recently, master key rotation for Apache Ignite Transparent Data Encryption was implemented [1], but some security standards (PCI DSS at least) require rotation of all encryption keys [2]. Currently, encryption occurs when reading/writing pages to disk, cache encryption keys are stored in metastore. I'm going to contribute cache encryption key rotation and want to consult what is the best way to re-encrypting existing data, I see two different strategies. 1. In place re-encryption: Using the old key, sequentially read all the pages from the datastore, mark as dirty and log them into the WAL. After checkpoint pages will be stored to disk encrypted with the new key (as usual, along with updates). This strategy requires store the identifier (number) of the encryption key into the encrypted page. pros: - can work in the background with minimal performance impact (this impact can be managed). cons: - page duplication in the WAL may affect performance and historical rebalance. 2. Copy partition with re-encryption. This strategy is similar to partition snapshotting [3] - create partition copy encrypted with the new key and then replace the original partition file with the new one (see details [4]). pros: - should work faster than "in place" re-encryption. cons: - re-encryption in active cluster (and on unstable topology) can be difficult to implement. (See more detailed comparison [5]) Re-encryption of existing data is a long and rare procedure (It is recommended to change the key every 6 months, but at least once every 2 years). Thus, re-encryption can be implemented for maintenance mode (for example, on a stable topology in a read-only cluster) and in such case the approach with partition copying seems simpler and faster. So, what do you think - do we need "online" re-encryption and which of the proposed options is best suited for this? [1] https://issues.apache.org/jira/browse/IGNITE-12186 [2] https://www.pcisecuritystandards.org/documents/PCI_DSS_v3-2-1.pdf [3] https://cwiki.apache.org/confluence/display/IGNITE/IEP-43%3A+Cluster+snapshots#IEP-43:Clustersnapshots-Partitionscopystrategy [4] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Copywithre-encryptiondesign. [5] https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=95652384#TDE.Phase-3.Cachekeyrotation.-Comparison